LMDB is the database of choice when using Caffe with large datasets. This is a tutorial of how to create an LMDB database from Python. First, let’s look at the pros and cons of using LMDB over HDF5.

Reasons to use HDF5:

  • Simple format to read/write.

Reasons to use LMDB:

  • LMDB uses memory-mapped files, giving much better I/O performance.
  • Works well with really large datasets. The HDF5 files are always read entirely into memory, so you can’t have any HDF5 file exceed your memory capacity. You can easily split your data into several HDF5 files though (just put several paths to h5files in your text file). Then again, compared to LMDB’s page caching the I/O performance won’t be nearly as good.

LMDB from Python

You will need the Python package lmdb as well as Caffe’s python package (make pycaffe in Caffe). LMDB provides key-value storage, where each <key, value> pair will be a sample in our dataset. The key will simply be a string version of an ID value, and the value will be a serialized version of the Datum class in Caffe (which are built using protobuf).

import numpy as np
import lmdb
import caffe N = 1000 # Let's pretend this is interesting data
X = np.zeros((N, 3, 32, 32), dtype=np.uint8)
y = np.zeros(N, dtype=np.int64) # We need to prepare the database for the size. We'll set it 10 times
# greater than what we theoretically need. There is little drawback to
# setting this too big. If you still run into problem after raising
# this, you might want to try saving fewer entries in a single
# transaction.
map_size = X.nbytes * 10 env = lmdb.open('mylmdb', map_size=map_size) with env.begin(write=True) as txn:
# txn is a Transaction object
for i in range(N):
datum = caffe.proto.caffe_pb2.Datum()
datum.channels = X.shape[1]
datum.height = X.shape[2]
datum.width = X.shape[3]
datum.data = X[i].tobytes() # or .tostring() if numpy < 1.9
datum.label = int(y[i])
str_id = '{:08}'.format(i) # The encode is only essential in Python 3
txn.put(str_id.encode('ascii'), datum.SerializeToString())

  

You can also open up and inspect an existing LMDB database from Python:

import numpy as np
import lmdb
import caffe env = lmdb.open('mylmdb', readonly=True)
with env.begin() as txn:
raw_datum = txn.get(b'00000000') datum = caffe.proto.caffe_pb2.Datum()
datum.ParseFromString(raw_datum) flat_x = np.fromstring(datum.data, dtype=np.uint8)
x = flat_x.reshape(datum.channels, datum.height, datum.width)
y = datum.label

  

Iterating <key, value> pairs is also easy:

with env.begin() as txn:
cursor = txn.cursor()
for key, value in cursor:
print(key, value)

  

Creating an LMDB database in Python的更多相关文章

  1. Initialization of deep networks

    Initialization of deep networks 24 Feb 2015Gustav Larsson As we all know, the solution to a non-conv ...

  2. 非图片格式如何转成lmdb格式--caffe

    链接 LMDB is the database of choice when using Caffe with large datasets. This is a tutorial of how to ...

  3. Movidius的深度学习入门

    1.Ubuntu虚拟机上安装NC SDK cd /home/shine/Downloads/ mkdir NC_SDK git clone https://github.com/movidius/nc ...

  4. Python框架、库以及软件资源汇总

    转自:http://developer.51cto.com/art/201507/483510.htm 很多来自世界各地的程序员不求回报的写代码为别人造轮子.贡献代码.开发框架.开放源代码使得分散在世 ...

  5. Awesome Python

    Awesome Python  A curated list of awesome Python frameworks, libraries, software and resources. Insp ...

  6. Machine and Deep Learning with Python

    Machine and Deep Learning with Python Education Tutorials and courses Supervised learning superstiti ...

  7. Huge CSV and XML Files in Python, Error: field larger than field limit (131072)

    Huge CSV and XML Files in Python January 22, 2009. Filed under python twitter facebook pinterest lin ...

  8. (原)caffe中通过图像生成lmdb格式的数据

    转载请注明出处: http://www.cnblogs.com/darkknightzh/p/5909121.html 参考网址: http://www.cnblogs.com/wangxiaocvp ...

  9. Caffe︱构建lmdb数据集、binaryproto均值文件及各类难辨的文件路径名设置细解

    Lmdb生成的过程简述 1.整理并约束尺寸,文件夹.图片放在不同的文件夹之下,注意图片的size需要规约到统一的格式,不然计算均值文件的时候会报错. 2.将内容生成列表放入txt文件中.两个txt文件 ...

随机推荐

  1. 【转】Android 模拟器横屏竖屏切换设置

    http://blog.csdn.net/zanfeng/article/details/18355305# Android 模拟器横屏竖屏切换设置时间:2012-07-04   来源:设计与开发   ...

  2. 移动端iscroll实现日期选择

    哎,说多了都是泪: 引入相关JS文件 <script type="text/javascript" src="js/jquery-1.9.1.min.js" ...

  3. Vue 组件(上)转载

    一.定义 组件:应用界面上一个个的区块. 自定义的HTML元素. 二.创建和注册 Vue.extend() 扩展,创建组件构造器 Vue.component()注册组件,2个参数,1为标签,2是组件构 ...

  4. 【转】Linux 虚拟内存和物理内存的理解

    http://www.cnblogs.com/dyllove98/archive/2013/06/12/3132940.html 首先,让我们看下虚拟内存: 第一层理解 1.         每个进程 ...

  5. day64 django django零碎知识点整理

    本文转载自紫金葫芦,哪吒,liwenzhou.cnblog博客地址 简单了解mvc框架和MTV框架, mvc是一种简单的软件架构模式: m----model,模型 v---view,视图 c---co ...

  6. day 47 htm-part2

    列表 无序列表====所谓无序就是显示出来的效果没有编号排序 <ul type='disc'> <li>第一项</li> <li>第二项</li& ...

  7. ZOJ 1109 Language of FatMouse 【Trie树】

    <题目链接> 题目大意: 刚开始每行输入两个单词,第二个单词存入单词库,并且每行第二个单词映射着对应的第一个单词.然后每行输入一个单词,如果单词库中有相同的单词,则输出它对应的那个单词,否 ...

  8. ROWNUM = 1 to replace count(*)

    For a long time, I have been using the EXISTS clause to determine if at least one record exists in a ...

  9. 页面嵌入iframe那些事儿

    一.用iframe如何把别人的页面嵌入自己的页面? <iframe src="http://blog.sina.com.cn/abc" frameBorder=0 scrol ...

  10. Touch事件详解及区别,触屏滑动距离计算

    移动端有四个关于触摸的事件,分别是touchstart.touchmove.touchend.touchcancel(比较少用), 它们的触发顺序是touchstart-->touchmove- ...