LMDB is the database of choice when using Caffe with large datasets. This is a tutorial of how to create an LMDB database from Python. First, let’s look at the pros and cons of using LMDB over HDF5.

Reasons to use HDF5:

  • Simple format to read/write.

Reasons to use LMDB:

  • LMDB uses memory-mapped files, giving much better I/O performance.
  • Works well with really large datasets. The HDF5 files are always read entirely into memory, so you can’t have any HDF5 file exceed your memory capacity. You can easily split your data into several HDF5 files though (just put several paths to h5files in your text file). Then again, compared to LMDB’s page caching the I/O performance won’t be nearly as good.

LMDB from Python

You will need the Python package lmdb as well as Caffe’s python package (make pycaffe in Caffe). LMDB provides key-value storage, where each <key, value> pair will be a sample in our dataset. The key will simply be a string version of an ID value, and the value will be a serialized version of the Datum class in Caffe (which are built using protobuf).

import numpy as np
import lmdb
import caffe N = 1000 # Let's pretend this is interesting data
X = np.zeros((N, 3, 32, 32), dtype=np.uint8)
y = np.zeros(N, dtype=np.int64) # We need to prepare the database for the size. We'll set it 10 times
# greater than what we theoretically need. There is little drawback to
# setting this too big. If you still run into problem after raising
# this, you might want to try saving fewer entries in a single
# transaction.
map_size = X.nbytes * 10 env = lmdb.open('mylmdb', map_size=map_size) with env.begin(write=True) as txn:
# txn is a Transaction object
for i in range(N):
datum = caffe.proto.caffe_pb2.Datum()
datum.channels = X.shape[1]
datum.height = X.shape[2]
datum.width = X.shape[3]
datum.data = X[i].tobytes() # or .tostring() if numpy < 1.9
datum.label = int(y[i])
str_id = '{:08}'.format(i) # The encode is only essential in Python 3
txn.put(str_id.encode('ascii'), datum.SerializeToString())

  

You can also open up and inspect an existing LMDB database from Python:

import numpy as np
import lmdb
import caffe env = lmdb.open('mylmdb', readonly=True)
with env.begin() as txn:
raw_datum = txn.get(b'00000000') datum = caffe.proto.caffe_pb2.Datum()
datum.ParseFromString(raw_datum) flat_x = np.fromstring(datum.data, dtype=np.uint8)
x = flat_x.reshape(datum.channels, datum.height, datum.width)
y = datum.label

  

Iterating <key, value> pairs is also easy:

with env.begin() as txn:
cursor = txn.cursor()
for key, value in cursor:
print(key, value)

  

Creating an LMDB database in Python的更多相关文章

  1. Initialization of deep networks

    Initialization of deep networks 24 Feb 2015Gustav Larsson As we all know, the solution to a non-conv ...

  2. 非图片格式如何转成lmdb格式--caffe

    链接 LMDB is the database of choice when using Caffe with large datasets. This is a tutorial of how to ...

  3. Movidius的深度学习入门

    1.Ubuntu虚拟机上安装NC SDK cd /home/shine/Downloads/ mkdir NC_SDK git clone https://github.com/movidius/nc ...

  4. Python框架、库以及软件资源汇总

    转自:http://developer.51cto.com/art/201507/483510.htm 很多来自世界各地的程序员不求回报的写代码为别人造轮子.贡献代码.开发框架.开放源代码使得分散在世 ...

  5. Awesome Python

    Awesome Python  A curated list of awesome Python frameworks, libraries, software and resources. Insp ...

  6. Machine and Deep Learning with Python

    Machine and Deep Learning with Python Education Tutorials and courses Supervised learning superstiti ...

  7. Huge CSV and XML Files in Python, Error: field larger than field limit (131072)

    Huge CSV and XML Files in Python January 22, 2009. Filed under python twitter facebook pinterest lin ...

  8. (原)caffe中通过图像生成lmdb格式的数据

    转载请注明出处: http://www.cnblogs.com/darkknightzh/p/5909121.html 参考网址: http://www.cnblogs.com/wangxiaocvp ...

  9. Caffe︱构建lmdb数据集、binaryproto均值文件及各类难辨的文件路径名设置细解

    Lmdb生成的过程简述 1.整理并约束尺寸,文件夹.图片放在不同的文件夹之下,注意图片的size需要规约到统一的格式,不然计算均值文件的时候会报错. 2.将内容生成列表放入txt文件中.两个txt文件 ...

随机推荐

  1. day4.字符串练习题

    有变量 name = “alex leNb”,完成如下操作 1. 移除name变量对应的值两边的空格,并输出处理结果 print(name.strip()) 2. 移除name变量左边的’al’并输出 ...

  2. L3-007 天梯地图 (30 分) dijkstra

    本题要求你实现一个天梯赛专属在线地图,队员输入自己学校所在地和赛场地点后,该地图应该推荐两条路线:一条是最快到达路线:一条是最短距离的路线.题目保证对任意的查询请求,地图上都至少存在一条可达路线. 输 ...

  3. 056 Java搭建kafka环境

    1.使用Java项目搭建 2.新目录 3.添加项目支持 4.添加mavem与scala 5.修改pom <?xml version="1.0" encoding=" ...

  4. TMS320DM642学习----第一篇(硬件连接)

    DSP设备型号:SEED-DTK-VPM642(目前实验室用途:视频处理,图像处理方向,预计搭载目标跟踪以及云台防抖等算法) 官网链接:http://www.seeddsp.com/index.php ...

  5. HDU 5036 Explosion (传递闭包+bitset优化)

    <题目链接> 题目大意: 一个人要打开或者用炸弹砸开所有的门,每个门后面有一些钥匙,一个钥匙对应一个门,告诉每个门里面有哪些门的钥匙.如果要打开所有的门,问需要用的炸弹数量为多少. 解题分 ...

  6. Codeforces 992C Nastya and a Wardrobe (思维)

    <题目链接> 题目大意: 你开始有X个裙子 你有K+1次增长机会 前K次会100%的增长一倍 但是增长后有50%的机会会减少一个 给你X,K(0<=X,K<=1e18), 问你 ...

  7. snmp 里面oid对应的信息 MIB

    系统参数(1.3.6.1.2.1.1) OID 描述 备注 请求方式 .1.3.6.1.2.1.1.1.0 获取系统基本信息 SysDesc GET .1.3.6.1.2.1.1.3.0 监控时间 s ...

  8. springboot2.0 redis EnableCaching的配置和使用

    一.前言 关于EnableCaching最简单使用,个人感觉只需提供一个CacheManager的一个实例就好了.springboot为我们提供了cache相关的自动配置.引入cache模块,如下. ...

  9. python 多版本共存

    py2和3都安装结束后 接下来就是检查环境变量,缺少的我们需要添加. 在path中找以下4个变量 1.c:\Python27 2.c:\Python27\Scripts 3.c:\Python36 4 ...

  10. SpringBoot+Mybatis+MySql学习

    介绍一下SpringBoot整合mybatis,数据库选用的是mysql. 首先创建数据库 CREATE DATABASE test; 建表以及插入初始数据(sql是从navicat中导出的) SET ...