Creating an LMDB database in Python

LMDB is the database of choice when using Caffe with large datasets. This is a tutorial of how to create an LMDB database from Python. First, let’s look at the pros and cons of using LMDB over HDF5.

Reasons to use HDF5:

Simple format to read/write.

Reasons to use LMDB:

LMDB uses memory-mapped files, giving much better I/O performance.
Works well with really large datasets. The HDF5 files are always read entirely into memory, so you can’t have any HDF5 file exceed your memory capacity. You can easily split your data into several HDF5 files though (just put several paths to h5files in your text file). Then again, compared to LMDB’s page caching the I/O performance won’t be nearly as good.

LMDB from Python

You will need the Python package lmdb as well as Caffe’s python package (make pycaffe in Caffe). LMDB provides key-value storage, where each <key, value> pair will be a sample in our dataset. The key will simply be a string version of an ID value, and the value will be a serialized version of the Datum class in Caffe (which are built using protobuf).

import numpy as np

import lmdb

import caffe

N = 1000

# Let's pretend this is interesting data

X = np.zeros((N, 3, 32, 32), dtype=np.uint8)

y = np.zeros(N, dtype=np.int64)

# We need to prepare the database for the size. We'll set it 10 times

# greater than what we theoretically need. There is little drawback to

# setting this too big. If you still run into problem after raising

# this, you might want to try saving fewer entries in a single

# transaction.

map_size = X.nbytes * 10

env = lmdb.open('mylmdb', map_size=map_size)

with env.begin(write=True) as txn:

    # txn is a Transaction object

    for i in range(N):

        datum = caffe.proto.caffe_pb2.Datum()

        datum.channels = X.shape[1]

        datum.height = X.shape[2]

        datum.width = X.shape[3]

        datum.data = X[i].tobytes()  # or .tostring() if numpy < 1.9

        datum.label = int(y[i])

        str_id = '{:08}'.format(i)

        # The encode is only essential in Python 3

        txn.put(str_id.encode('ascii'), datum.SerializeToString())

You can also open up and inspect an existing LMDB database from Python:

import numpy as np

import lmdb

import caffe

env = lmdb.open('mylmdb', readonly=True)

with env.begin() as txn:

    raw_datum = txn.get(b'00000000')

datum = caffe.proto.caffe_pb2.Datum()

datum.ParseFromString(raw_datum)

flat_x = np.fromstring(datum.data, dtype=np.uint8)

x = flat_x.reshape(datum.channels, datum.height, datum.width)

y = datum.label

Iterating <key, value> pairs is also easy:

with env.begin() as txn:

    cursor = txn.cursor()

    for key, value in cursor:

        print(key, value)

Creating an LMDB database in Python的更多相关文章

Initialization of deep networks
Initialization of deep networks 24 Feb 2015Gustav Larsson As we all know, the solution to a non-conv ...
非图片格式如何转成lmdb格式--caffe
链接 LMDB is the database of choice when using Caffe with large datasets. This is a tutorial of how to ...
Movidius的深度学习入门
1.Ubuntu虚拟机上安装NC SDK cd /home/shine/Downloads/ mkdir NC_SDK git clone https://github.com/movidius/nc ...
Python框架、库以及软件资源汇总
转自:http://developer.51cto.com/art/201507/483510.htm 很多来自世界各地的程序员不求回报的写代码为别人造轮子.贡献代码.开发框架.开放源代码使得分散在世 ...
Awesome Python
Awesome Python A curated list of awesome Python frameworks, libraries, software and resources. Insp ...
Machine and Deep Learning with Python
Machine and Deep Learning with Python Education Tutorials and courses Supervised learning superstiti ...
Huge CSV and XML Files in Python, Error: field larger than field limit (131072)
Huge CSV and XML Files in Python January 22, 2009. Filed under python twitter facebook pinterest lin ...
（原）caffe中通过图像生成lmdb格式的数据
转载请注明出处: http://www.cnblogs.com/darkknightzh/p/5909121.html 参考网址: http://www.cnblogs.com/wangxiaocvp ...
Caffe︱构建lmdb数据集、binaryproto均值文件及各类难辨的文件路径名设置细解
Lmdb生成的过程简述 1.整理并约束尺寸,文件夹.图片放在不同的文件夹之下,注意图片的size需要规约到统一的格式,不然计算均值文件的时候会报错. 2.将内容生成列表放入txt文件中.两个txt文件 ...

随机推荐

CodeForces 516B Drazil and Tiles 其他
原文链接http://www.cnblogs.com/zhouzhendong/p/8990658.html 题目传送门 - CodeForces 516B 题意给出一个$n\times m$的矩形 ...
CentOS 7.2配置Apache服务httpd小伙伴们可以参考一下
这篇文章主要为大家详细介绍了CentOS 7.2配置Apache服务 httpd上篇,具有一定的参考价值,感兴趣的小伙伴们可以参考一下一.Perl + mod_perl 安装mod_perl使Per ...
Practice| 面向对象
实参与形参的传递机制 * 实参给形参赋值: * 1.基本数据类型: * 实参给形参的数据值,形参的修改和实参无关 * 2.引用数据类型 * 实参给形参的地址值,如果这个地址值修改“属性”会影响实参,但 ...
java分页实现
虽然现在有很多好用的框架,对分页进行支持,很简单的就把分页的效果做出来,但是如果自己手写是一个怎样的流程的?今天就来说说它,手动实现分页效果. 一.分页的思路首先我们得知道写分页代码时的思路,保持思 ...
linux学习笔记 4建立用户
一般用法 #useradd mysql 含义创建 mysql用户特殊用法 1> #useradd -d /usr/cjh -m cjh 含义:创建cjh用户产生一个主目录 /usr/cj ...
Xamarin Essentials教程获取路径文件系统FileSystem
Xamarin Essentials教程获取路径文件系统FileSystem 文件系统用于管理设备内的各类文件.通过文件系统,应用程序可以创建永久文件和临时文件,也可以获取预先打包的文件,如预设数据库 ...
64. 合并排序数组.md
描述合并两个排序的整数数组A和B变成一个新的数组. 你可以假设A具有足够的空间(A数组的大小大于或等于m+n)去添加B中的元素. 您在真实的面试中是否遇到过这个题? 样例给出 A = [1, 2, ...
Python3之弹性力学——应力张量1
题目已知某点的应力张量为: \[ \left[ \begin{array}{ccc} \sigma_{x} &\tau_{xy} &\tau_{xz}\\ \tau_{yx} &am ...
2189 ACM 母函数素数
题目:http://acm.hdu.edu.cn/showproblem.php?pid=2189 思路:先找出150以内的素数,然后再用母函数或01背包计算复习母函数的代码:https://ww ...
PostgreSQL数组使用
原文:https://my.oschina.net/Kenyon/blog/133974 1.数组的定义不一样的维度元素长度定义在数据库中的实际存储都是一样的,数组元素的长度和类型必须要保持一致, ...

Creating an LMDB database in Python

LMDB from Python

Creating an LMDB database in Python的更多相关文章

随机推荐

热门专题