LMDB is the database of choice when using Caffe with large datasets. This is a tutorial of how to create an LMDB database from Python. First, let’s look at the pros and cons of using LMDB over HDF5.

Reasons to use HDF5:

  • Simple format to read/write.

Reasons to use LMDB:

  • LMDB uses memory-mapped files, giving much better I/O performance.
  • Works well with really large datasets. The HDF5 files are always read entirely into memory, so you can’t have any HDF5 file exceed your memory capacity. You can easily split your data into several HDF5 files though (just put several paths to h5files in your text file). Then again, compared to LMDB’s page caching the I/O performance won’t be nearly as good.

LMDB from Python

You will need the Python package lmdb as well as Caffe’s python package (make pycaffe in Caffe). LMDB provides key-value storage, where each <key, value> pair will be a sample in our dataset. The key will simply be a string version of an ID value, and the value will be a serialized version of the Datum class in Caffe (which are built using protobuf).

import numpy as np
import lmdb
import caffe N = 1000 # Let's pretend this is interesting data
X = np.zeros((N, 3, 32, 32), dtype=np.uint8)
y = np.zeros(N, dtype=np.int64) # We need to prepare the database for the size. We'll set it 10 times
# greater than what we theoretically need. There is little drawback to
# setting this too big. If you still run into problem after raising
# this, you might want to try saving fewer entries in a single
# transaction.
map_size = X.nbytes * 10 env = lmdb.open('mylmdb', map_size=map_size) with env.begin(write=True) as txn:
# txn is a Transaction object
for i in range(N):
datum = caffe.proto.caffe_pb2.Datum()
datum.channels = X.shape[1]
datum.height = X.shape[2]
datum.width = X.shape[3]
datum.data = X[i].tobytes() # or .tostring() if numpy < 1.9
datum.label = int(y[i])
str_id = '{:08}'.format(i) # The encode is only essential in Python 3
txn.put(str_id.encode('ascii'), datum.SerializeToString())

  

You can also open up and inspect an existing LMDB database from Python:

import numpy as np
import lmdb
import caffe env = lmdb.open('mylmdb', readonly=True)
with env.begin() as txn:
raw_datum = txn.get(b'00000000') datum = caffe.proto.caffe_pb2.Datum()
datum.ParseFromString(raw_datum) flat_x = np.fromstring(datum.data, dtype=np.uint8)
x = flat_x.reshape(datum.channels, datum.height, datum.width)
y = datum.label

  

Iterating <key, value> pairs is also easy:

with env.begin() as txn:
cursor = txn.cursor()
for key, value in cursor:
print(key, value)

  

Creating an LMDB database in Python的更多相关文章

  1. Initialization of deep networks

    Initialization of deep networks 24 Feb 2015Gustav Larsson As we all know, the solution to a non-conv ...

  2. 非图片格式如何转成lmdb格式--caffe

    链接 LMDB is the database of choice when using Caffe with large datasets. This is a tutorial of how to ...

  3. Movidius的深度学习入门

    1.Ubuntu虚拟机上安装NC SDK cd /home/shine/Downloads/ mkdir NC_SDK git clone https://github.com/movidius/nc ...

  4. Python框架、库以及软件资源汇总

    转自:http://developer.51cto.com/art/201507/483510.htm 很多来自世界各地的程序员不求回报的写代码为别人造轮子.贡献代码.开发框架.开放源代码使得分散在世 ...

  5. Awesome Python

    Awesome Python  A curated list of awesome Python frameworks, libraries, software and resources. Insp ...

  6. Machine and Deep Learning with Python

    Machine and Deep Learning with Python Education Tutorials and courses Supervised learning superstiti ...

  7. Huge CSV and XML Files in Python, Error: field larger than field limit (131072)

    Huge CSV and XML Files in Python January 22, 2009. Filed under python twitter facebook pinterest lin ...

  8. (原)caffe中通过图像生成lmdb格式的数据

    转载请注明出处: http://www.cnblogs.com/darkknightzh/p/5909121.html 参考网址: http://www.cnblogs.com/wangxiaocvp ...

  9. Caffe︱构建lmdb数据集、binaryproto均值文件及各类难辨的文件路径名设置细解

    Lmdb生成的过程简述 1.整理并约束尺寸,文件夹.图片放在不同的文件夹之下,注意图片的size需要规约到统一的格式,不然计算均值文件的时候会报错. 2.将内容生成列表放入txt文件中.两个txt文件 ...

随机推荐

  1. 如何删除github里面的项目

    第一步 :首先登陆github 第二步:如下图选择 第三步:选择如下图 第四步:点击你要删除的项目,点击settings 第五步:把页面向下拉,找到如图按钮并点击 第六步:需要确认输入你要删除的项目名 ...

  2. youDao

    2018-09-22Journeys end in lovers' meeting.漂泊止于爱人的相遇. All extremes of feeling are allied with madness ...

  3. 04. Pandas 3| 数值计算与统计、合并连接去重分组透视表文件读取

    1.数值计算和统计基础 常用数学.统计方法 数值计算和统计基础 基本参数:axis.skipna df.mean(axis=1,skipna=False)  -->> axis=1是按行来 ...

  4. 20165319 2017-2018-2《Java程序设计》课程总结

    一.每周作业链接汇总 预备作业一:我期望的师生关系 20165319 我所期望的师生关系 预备作业二:学习基础和C语言基础调查 20165319 学习基础和C语言基础调查 摘要: 技能学习经验 c语言 ...

  5. bind注意事项(传引用参数的时候)

    默认情况下,bind的那些不是占位符的参数被拷贝到bind返回的可调用对象中. 当需要把对象传到bind中的参数中时,需要使用ref或者cref. 例如: #include<iostream&g ...

  6. 基于C语言的Socket网络编程搭建简易的Web服务器(socket实现的内部原理)

    首先编写我们服务器上需要的c文件WebServer.c 涉及到的函数API: int copy(FILE *read_f, FILE * write_f) ----- 文件内容复制的方法 int Do ...

  7. html-盒子模型及pading和margin相关

    margin: <!DOCTYPE html> <html lang="en"> <head> <meta charset="U ...

  8. Hdu-1098解题报告

    Hdu-1098解题报告 题目链接:http://acm.hdu.edu.cn/showproblem.php?pid=1098 题意:已知存在一个等式f(x)=5*x^13+13*x^5+k*a*x ...

  9. 多个SDK控制管理

    需求:制作一个公共组件,可以实现多个SDK想用哪个用哪个,集中管理 组织方式: 架构形式 注意点: 1.sdk必须通过maven库来compile,因为jar会打到aar中:所以library和主mo ...

  10. 数据库相关--net start mysql 服务无法启动(win7系统)解决

    系统:win7 旗舰版 64位 MySQL:8.0.11 家里台式机上不久之前安装了MySQL,一段时间没碰过后,突然启动不了了(我有一头小毛驴,我从来也不骑,有一天我心血来潮骑它去赶集) 先是在系统 ...