Creating an LMDB database in Python
LMDB is the database of choice when using Caffe with large datasets. This is a tutorial of how to create an LMDB database from Python. First, let’s look at the pros and cons of using LMDB over HDF5.
Reasons to use HDF5:
- Simple format to read/write.
Reasons to use LMDB:
- LMDB uses memory-mapped files, giving much better I/O performance.
- Works well with really large datasets. The HDF5 files are always read entirely into memory, so you can’t have any HDF5 file exceed your memory capacity. You can easily split your data into several HDF5 files though (just put several paths to
h5files in your text file). Then again, compared to LMDB’s page caching the I/O performance won’t be nearly as good.
LMDB from Python
You will need the Python package lmdb as well as Caffe’s python package (make pycaffe in Caffe). LMDB provides key-value storage, where each <key, value> pair will be a sample in our dataset. The key will simply be a string version of an ID value, and the value will be a serialized version of the Datum class in Caffe (which are built using protobuf).
import numpy as np
import lmdb
import caffe N = 1000 # Let's pretend this is interesting data
X = np.zeros((N, 3, 32, 32), dtype=np.uint8)
y = np.zeros(N, dtype=np.int64) # We need to prepare the database for the size. We'll set it 10 times
# greater than what we theoretically need. There is little drawback to
# setting this too big. If you still run into problem after raising
# this, you might want to try saving fewer entries in a single
# transaction.
map_size = X.nbytes * 10 env = lmdb.open('mylmdb', map_size=map_size) with env.begin(write=True) as txn:
# txn is a Transaction object
for i in range(N):
datum = caffe.proto.caffe_pb2.Datum()
datum.channels = X.shape[1]
datum.height = X.shape[2]
datum.width = X.shape[3]
datum.data = X[i].tobytes() # or .tostring() if numpy < 1.9
datum.label = int(y[i])
str_id = '{:08}'.format(i) # The encode is only essential in Python 3
txn.put(str_id.encode('ascii'), datum.SerializeToString())
You can also open up and inspect an existing LMDB database from Python:
import numpy as np
import lmdb
import caffe env = lmdb.open('mylmdb', readonly=True)
with env.begin() as txn:
raw_datum = txn.get(b'00000000') datum = caffe.proto.caffe_pb2.Datum()
datum.ParseFromString(raw_datum) flat_x = np.fromstring(datum.data, dtype=np.uint8)
x = flat_x.reshape(datum.channels, datum.height, datum.width)
y = datum.label
Iterating <key, value> pairs is also easy:
with env.begin() as txn:
cursor = txn.cursor()
for key, value in cursor:
print(key, value)
Creating an LMDB database in Python的更多相关文章
- Initialization of deep networks
Initialization of deep networks 24 Feb 2015Gustav Larsson As we all know, the solution to a non-conv ...
- 非图片格式如何转成lmdb格式--caffe
链接 LMDB is the database of choice when using Caffe with large datasets. This is a tutorial of how to ...
- Movidius的深度学习入门
1.Ubuntu虚拟机上安装NC SDK cd /home/shine/Downloads/ mkdir NC_SDK git clone https://github.com/movidius/nc ...
- Python框架、库以及软件资源汇总
转自:http://developer.51cto.com/art/201507/483510.htm 很多来自世界各地的程序员不求回报的写代码为别人造轮子.贡献代码.开发框架.开放源代码使得分散在世 ...
- Awesome Python
Awesome Python A curated list of awesome Python frameworks, libraries, software and resources. Insp ...
- Machine and Deep Learning with Python
Machine and Deep Learning with Python Education Tutorials and courses Supervised learning superstiti ...
- Huge CSV and XML Files in Python, Error: field larger than field limit (131072)
Huge CSV and XML Files in Python January 22, 2009. Filed under python twitter facebook pinterest lin ...
- (原)caffe中通过图像生成lmdb格式的数据
转载请注明出处: http://www.cnblogs.com/darkknightzh/p/5909121.html 参考网址: http://www.cnblogs.com/wangxiaocvp ...
- Caffe︱构建lmdb数据集、binaryproto均值文件及各类难辨的文件路径名设置细解
Lmdb生成的过程简述 1.整理并约束尺寸,文件夹.图片放在不同的文件夹之下,注意图片的size需要规约到统一的格式,不然计算均值文件的时候会报错. 2.将内容生成列表放入txt文件中.两个txt文件 ...
随机推荐
- MySQL主从数据同步延时分析
一.MySQL数据库主从同步延迟 要了解MySQL数据库主从同步延迟原理,我们 ...
- @+id/和android:id有什么区别?
Any View object may have an integer ID associated with it, to uniquely identify the View within the ...
- vue中的provide/inject的学习使用
irst:定义一个parent component ? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 <template> <div> ...
- 51Nod1362 搬箱子 排列组合,中国剩余定理
原文链接https://www.cnblogs.com/zhouzhendong/p/51Nod1362.html 题目传送门 - 51Nod1362 题意 题解 首先考虑枚举斜着走了几次.假设走了 ...
- 空间数据可视化:1. 3D_Bar图表| 空间柱状图
1.Sublime的使用 中文版的配置 https://jingyan.baidu.com/article/ca2d939d1e83feeb6c31cefc.html (百度经验) sublime里边 ...
- ASP.NET MVC 路由篇二
轉載 http://www.cnblogs.com/yaozhenfa/p/asp_net_mvc_route_2.html 7.解决与物理路径的冲突 当发送一个请求至ASP.NET MVC时,其实会 ...
- 1e9个兵临城下
https://ac.nowcoder.com/acm/contest/321#question 代码写得蛮丑的..真的很丑 像是高中教的veen图,并涉及到容斥原理. #include<cst ...
- 【python】函数式编程
No1: 函数式编程:即函数可以作为参数传递,也可以作为返回值 No2: map()函数接收两个参数,一个是函数,一个是Iterable,map将传入的函数依次作用到序列的每个元素,并把结果作为新的 ...
- C++学习之 —— 输入输出
案例:输入任意空格和数字,输出其中的数字之和. #include <iostream> using namespace std; int main() { ; cout << ...
- JavaScript基础笔记(七)DOM
DOM DOM可以将任何HTML或者XML文档描述成一个由多层节点构成的结构. 一.节点层次 一)Node类型 DOM1定义了一个Node接口,该接口将由DOM中所有节点类型实现. 每一个节点都有一个 ...