LMDB is the database of choice when using Caffe with large datasets. This is a tutorial of how to create an LMDB database from Python. First, let’s look at the pros and cons of using LMDB over HDF5.

Reasons to use HDF5:

  • Simple format to read/write.

Reasons to use LMDB:

  • LMDB uses memory-mapped files, giving much better I/O performance.
  • Works well with really large datasets. The HDF5 files are always read entirely into memory, so you can’t have any HDF5 file exceed your memory capacity. You can easily split your data into several HDF5 files though (just put several paths to h5files in your text file). Then again, compared to LMDB’s page caching the I/O performance won’t be nearly as good.

LMDB from Python

You will need the Python package lmdb as well as Caffe’s python package (make pycaffe in Caffe). LMDB provides key-value storage, where each <key, value> pair will be a sample in our dataset. The key will simply be a string version of an ID value, and the value will be a serialized version of the Datum class in Caffe (which are built using protobuf).

import numpy as np
import lmdb
import caffe N = 1000 # Let's pretend this is interesting data
X = np.zeros((N, 3, 32, 32), dtype=np.uint8)
y = np.zeros(N, dtype=np.int64) # We need to prepare the database for the size. We'll set it 10 times
# greater than what we theoretically need. There is little drawback to
# setting this too big. If you still run into problem after raising
# this, you might want to try saving fewer entries in a single
# transaction.
map_size = X.nbytes * 10 env = lmdb.open('mylmdb', map_size=map_size) with env.begin(write=True) as txn:
# txn is a Transaction object
for i in range(N):
datum = caffe.proto.caffe_pb2.Datum()
datum.channels = X.shape[1]
datum.height = X.shape[2]
datum.width = X.shape[3]
datum.data = X[i].tobytes() # or .tostring() if numpy < 1.9
datum.label = int(y[i])
str_id = '{:08}'.format(i) # The encode is only essential in Python 3
txn.put(str_id.encode('ascii'), datum.SerializeToString())

  

You can also open up and inspect an existing LMDB database from Python:

import numpy as np
import lmdb
import caffe env = lmdb.open('mylmdb', readonly=True)
with env.begin() as txn:
raw_datum = txn.get(b'00000000') datum = caffe.proto.caffe_pb2.Datum()
datum.ParseFromString(raw_datum) flat_x = np.fromstring(datum.data, dtype=np.uint8)
x = flat_x.reshape(datum.channels, datum.height, datum.width)
y = datum.label

  

Iterating <key, value> pairs is also easy:

with env.begin() as txn:
cursor = txn.cursor()
for key, value in cursor:
print(key, value)

  

Creating an LMDB database in Python的更多相关文章

  1. Initialization of deep networks

    Initialization of deep networks 24 Feb 2015Gustav Larsson As we all know, the solution to a non-conv ...

  2. 非图片格式如何转成lmdb格式--caffe

    链接 LMDB is the database of choice when using Caffe with large datasets. This is a tutorial of how to ...

  3. Movidius的深度学习入门

    1.Ubuntu虚拟机上安装NC SDK cd /home/shine/Downloads/ mkdir NC_SDK git clone https://github.com/movidius/nc ...

  4. Python框架、库以及软件资源汇总

    转自:http://developer.51cto.com/art/201507/483510.htm 很多来自世界各地的程序员不求回报的写代码为别人造轮子.贡献代码.开发框架.开放源代码使得分散在世 ...

  5. Awesome Python

    Awesome Python  A curated list of awesome Python frameworks, libraries, software and resources. Insp ...

  6. Machine and Deep Learning with Python

    Machine and Deep Learning with Python Education Tutorials and courses Supervised learning superstiti ...

  7. Huge CSV and XML Files in Python, Error: field larger than field limit (131072)

    Huge CSV and XML Files in Python January 22, 2009. Filed under python twitter facebook pinterest lin ...

  8. (原)caffe中通过图像生成lmdb格式的数据

    转载请注明出处: http://www.cnblogs.com/darkknightzh/p/5909121.html 参考网址: http://www.cnblogs.com/wangxiaocvp ...

  9. Caffe︱构建lmdb数据集、binaryproto均值文件及各类难辨的文件路径名设置细解

    Lmdb生成的过程简述 1.整理并约束尺寸,文件夹.图片放在不同的文件夹之下,注意图片的size需要规约到统一的格式,不然计算均值文件的时候会报错. 2.将内容生成列表放入txt文件中.两个txt文件 ...

随机推荐

  1. 解决背景图文字盖住html里面的dom元素

    width:100%; background: url('../images/res.jpg') no-repeat 0 0px; background-attachment:fixed; backg ...

  2. python基础篇_002_基础数据类型

    Python基础数据类型 1.int # int 用于计算 num = 3 # int 与其他数据类型转换 int_to_str = str(num) # 数字加引号 print(int_to_str ...

  3. shell常用的系统变量

    $#:   命令行参数的个数 $n :   当前程序的第n个参数,n=1,2,-,9 $0:    当前程序的名称 $?:    执行上一个指令或函数的返回值 $*:    以"参数1,参数 ...

  4. ecplise打不开提示Eclipse中...No java virtual machine was found...

    解决办法: 在eclipse.ini文件中最前面添加这两行: -vm C:\Program Files\Java\jdk1.8.0_191\bin\javaw.exe 上面那个路径是你的java jd ...

  5. JavaScript 作用域的误区

    了解JavaScript的同学可能知道,JavaScript语言由于设计原因,导致语言本身存在很多先天性的不足,当然这并非设计者有意的,js语言最初是被设计来作为网页交互的脚本语言,依照现有的js语法 ...

  6. 利用nginx,腾讯云免费证书制作https

    之前一直在研究,https怎么弄.最近看到了腾讯云提供的免费得ssl证书,寻思把网站弄成https. 首先先去腾讯云购买一个免费得证书. 点击后填写内容, 然后下载证书 解压证书就可以看到,提供四种方 ...

  7. BZOJ.1812.[IOI2005]Riv 河流(树形背包)

    BZOJ 洛谷 这个数据范围..考虑暴力一些把各种信息都记下来.不妨直接令\(f[i][j][k][0/1]\)表示当前为点\(i\),离\(i\)最近的建了伐木场的\(i\)的祖先为\(j\),\( ...

  8. WordPress UpdraftPlus插件 Google Drive 备份

    本文连接地址: http://blog.tuzhuke.info/?p=168 本文作者:tuzhuke 完成时间:2015-04-10 使用wordpress 搭建自己的博客网站,但是对于租用的服务 ...

  9. VUE学习第一天,安装

    vue生命周期好文章: http://www.zhimengzhe.com/Javascriptjiaocheng/236707.html

  10. CentOS7.5 通过wget下载文件到指定目录

    在Linux命令行下面下载文件,通过wget是比较普遍简单的,比如在CentOS7 里面也一样. 我们先来看下自己的CentOS7 系统有没有安装wget: [root@test redis]# rp ...