LMDB is the database of choice when using Caffe with large datasets. This is a tutorial of how to create an LMDB database from Python. First, let’s look at the pros and cons of using LMDB over HDF5.

Reasons to use HDF5:

  • Simple format to read/write.

Reasons to use LMDB:

  • LMDB uses memory-mapped files, giving much better I/O performance.
  • Works well with really large datasets. The HDF5 files are always read entirely into memory, so you can’t have any HDF5 file exceed your memory capacity. You can easily split your data into several HDF5 files though (just put several paths to h5files in your text file). Then again, compared to LMDB’s page caching the I/O performance won’t be nearly as good.

LMDB from Python

You will need the Python package lmdb as well as Caffe’s python package (make pycaffe in Caffe). LMDB provides key-value storage, where each <key, value> pair will be a sample in our dataset. The key will simply be a string version of an ID value, and the value will be a serialized version of the Datum class in Caffe (which are built using protobuf).

import numpy as np
import lmdb
import caffe N = 1000 # Let's pretend this is interesting data
X = np.zeros((N, 3, 32, 32), dtype=np.uint8)
y = np.zeros(N, dtype=np.int64) # We need to prepare the database for the size. We'll set it 10 times
# greater than what we theoretically need. There is little drawback to
# setting this too big. If you still run into problem after raising
# this, you might want to try saving fewer entries in a single
# transaction.
map_size = X.nbytes * 10 env = lmdb.open('mylmdb', map_size=map_size) with env.begin(write=True) as txn:
# txn is a Transaction object
for i in range(N):
datum = caffe.proto.caffe_pb2.Datum()
datum.channels = X.shape[1]
datum.height = X.shape[2]
datum.width = X.shape[3]
datum.data = X[i].tobytes() # or .tostring() if numpy < 1.9
datum.label = int(y[i])
str_id = '{:08}'.format(i) # The encode is only essential in Python 3
txn.put(str_id.encode('ascii'), datum.SerializeToString())

  

You can also open up and inspect an existing LMDB database from Python:

import numpy as np
import lmdb
import caffe env = lmdb.open('mylmdb', readonly=True)
with env.begin() as txn:
raw_datum = txn.get(b'00000000') datum = caffe.proto.caffe_pb2.Datum()
datum.ParseFromString(raw_datum) flat_x = np.fromstring(datum.data, dtype=np.uint8)
x = flat_x.reshape(datum.channels, datum.height, datum.width)
y = datum.label

  

Iterating <key, value> pairs is also easy:

with env.begin() as txn:
cursor = txn.cursor()
for key, value in cursor:
print(key, value)

  

Creating an LMDB database in Python的更多相关文章

  1. Initialization of deep networks

    Initialization of deep networks 24 Feb 2015Gustav Larsson As we all know, the solution to a non-conv ...

  2. 非图片格式如何转成lmdb格式--caffe

    链接 LMDB is the database of choice when using Caffe with large datasets. This is a tutorial of how to ...

  3. Movidius的深度学习入门

    1.Ubuntu虚拟机上安装NC SDK cd /home/shine/Downloads/ mkdir NC_SDK git clone https://github.com/movidius/nc ...

  4. Python框架、库以及软件资源汇总

    转自:http://developer.51cto.com/art/201507/483510.htm 很多来自世界各地的程序员不求回报的写代码为别人造轮子.贡献代码.开发框架.开放源代码使得分散在世 ...

  5. Awesome Python

    Awesome Python  A curated list of awesome Python frameworks, libraries, software and resources. Insp ...

  6. Machine and Deep Learning with Python

    Machine and Deep Learning with Python Education Tutorials and courses Supervised learning superstiti ...

  7. Huge CSV and XML Files in Python, Error: field larger than field limit (131072)

    Huge CSV and XML Files in Python January 22, 2009. Filed under python twitter facebook pinterest lin ...

  8. (原)caffe中通过图像生成lmdb格式的数据

    转载请注明出处: http://www.cnblogs.com/darkknightzh/p/5909121.html 参考网址: http://www.cnblogs.com/wangxiaocvp ...

  9. Caffe︱构建lmdb数据集、binaryproto均值文件及各类难辨的文件路径名设置细解

    Lmdb生成的过程简述 1.整理并约束尺寸,文件夹.图片放在不同的文件夹之下,注意图片的size需要规约到统一的格式,不然计算均值文件的时候会报错. 2.将内容生成列表放入txt文件中.两个txt文件 ...

随机推荐

  1. Javascript 中调参数的脚本onclick="select(this)" this 怎么解释

    解释1. this,指当前的onclick所在的节点本身. 比如: <div onclick='select(this)"></div> 则当点击div时,this就 ...

  2. AtCoder Regular Contest 101 (ARC101) D - Median of Medians 二分答案 树状数组

    原文链接https://www.cnblogs.com/zhouzhendong/p/ARC101D.html 题目传送门 - ARC101D 题意 给定一个序列 A . 定义一个序列 A 的中位数为 ...

  3. P2279 [HNOI2003]消防局的设立 贪心or树形dp

    题目描述 2020年,人类在火星上建立了一个庞大的基地群,总共有n个基地.起初为了节约材料,人类只修建了n-1条道路来连接这些基地,并且每两个基地都能够通过道路到达,所以所有的基地形成了一个巨大的树状 ...

  4. LoadRunner录制协议的选择

    1.选择协议(提高测试结果的准确性) New Single Protocol Script:单协议脚本,选择一种主协议进行测试 New Multiple Protocol Script:多协议脚本,选 ...

  5. easyui+themeleaf 分页查询实现

    <!DOCTYPE html> <html xmlns:th="http://www.w3.org/1999/xhtml"> <head> &l ...

  6. 浏览器 User-Agent 大全

    一.基础知识 Http Header之User-Agent User Agent中文名为用户代理,是Http协议中的一部分,属于头域的组成部分,User Agent也简称UA.它是一个特殊字符串头,是 ...

  7. Best Cow Fences POJ - 2018 (二分)

    Farmer John's farm consists of a long row of N (1 <= N <= 100,000)fields. Each field contains ...

  8. 无可奈何的开始了jquery的“奇淫技巧”

    转载请注明出处: https://home.cnblogs.com/u/zhiyong-ITNote/ 修改一个已有的项目,主要是前端方面,一般的项目后端都是处理好了的,不需要改也不能改,除非特殊需求 ...

  9. 基于Ardalis.GuardClauses守卫组件的拓展

    在我们写程序的时候,经常会需要判断数据的是空值还是null值,基本上十个方法函数,八个要做这样的判断,因此我们很有必要拓展出来一个类来做监控,在这里我们使用一个简单地,可拓展的第三方组件:Ardali ...

  10. Windows下更改MySQL 数据库文件存放位置

    更改默认的mysql数据库目录 将 C:\Documents and Settings\All Users\Application Data\MySQL\MySQL Server 5.1\data 改 ...