Creating an LMDB database in Python
LMDB is the database of choice when using Caffe with large datasets. This is a tutorial of how to create an LMDB database from Python. First, let’s look at the pros and cons of using LMDB over HDF5.
Reasons to use HDF5:
- Simple format to read/write.
Reasons to use LMDB:
- LMDB uses memory-mapped files, giving much better I/O performance.
- Works well with really large datasets. The HDF5 files are always read entirely into memory, so you can’t have any HDF5 file exceed your memory capacity. You can easily split your data into several HDF5 files though (just put several paths to
h5files in your text file). Then again, compared to LMDB’s page caching the I/O performance won’t be nearly as good.
LMDB from Python
You will need the Python package lmdb as well as Caffe’s python package (make pycaffe in Caffe). LMDB provides key-value storage, where each <key, value> pair will be a sample in our dataset. The key will simply be a string version of an ID value, and the value will be a serialized version of the Datum class in Caffe (which are built using protobuf).
import numpy as np
import lmdb
import caffe N = 1000 # Let's pretend this is interesting data
X = np.zeros((N, 3, 32, 32), dtype=np.uint8)
y = np.zeros(N, dtype=np.int64) # We need to prepare the database for the size. We'll set it 10 times
# greater than what we theoretically need. There is little drawback to
# setting this too big. If you still run into problem after raising
# this, you might want to try saving fewer entries in a single
# transaction.
map_size = X.nbytes * 10 env = lmdb.open('mylmdb', map_size=map_size) with env.begin(write=True) as txn:
# txn is a Transaction object
for i in range(N):
datum = caffe.proto.caffe_pb2.Datum()
datum.channels = X.shape[1]
datum.height = X.shape[2]
datum.width = X.shape[3]
datum.data = X[i].tobytes() # or .tostring() if numpy < 1.9
datum.label = int(y[i])
str_id = '{:08}'.format(i) # The encode is only essential in Python 3
txn.put(str_id.encode('ascii'), datum.SerializeToString())
You can also open up and inspect an existing LMDB database from Python:
import numpy as np
import lmdb
import caffe env = lmdb.open('mylmdb', readonly=True)
with env.begin() as txn:
raw_datum = txn.get(b'00000000') datum = caffe.proto.caffe_pb2.Datum()
datum.ParseFromString(raw_datum) flat_x = np.fromstring(datum.data, dtype=np.uint8)
x = flat_x.reshape(datum.channels, datum.height, datum.width)
y = datum.label
Iterating <key, value> pairs is also easy:
with env.begin() as txn:
cursor = txn.cursor()
for key, value in cursor:
print(key, value)
Creating an LMDB database in Python的更多相关文章
- Initialization of deep networks
Initialization of deep networks 24 Feb 2015Gustav Larsson As we all know, the solution to a non-conv ...
- 非图片格式如何转成lmdb格式--caffe
链接 LMDB is the database of choice when using Caffe with large datasets. This is a tutorial of how to ...
- Movidius的深度学习入门
1.Ubuntu虚拟机上安装NC SDK cd /home/shine/Downloads/ mkdir NC_SDK git clone https://github.com/movidius/nc ...
- Python框架、库以及软件资源汇总
转自:http://developer.51cto.com/art/201507/483510.htm 很多来自世界各地的程序员不求回报的写代码为别人造轮子.贡献代码.开发框架.开放源代码使得分散在世 ...
- Awesome Python
Awesome Python A curated list of awesome Python frameworks, libraries, software and resources. Insp ...
- Machine and Deep Learning with Python
Machine and Deep Learning with Python Education Tutorials and courses Supervised learning superstiti ...
- Huge CSV and XML Files in Python, Error: field larger than field limit (131072)
Huge CSV and XML Files in Python January 22, 2009. Filed under python twitter facebook pinterest lin ...
- (原)caffe中通过图像生成lmdb格式的数据
转载请注明出处: http://www.cnblogs.com/darkknightzh/p/5909121.html 参考网址: http://www.cnblogs.com/wangxiaocvp ...
- Caffe︱构建lmdb数据集、binaryproto均值文件及各类难辨的文件路径名设置细解
Lmdb生成的过程简述 1.整理并约束尺寸,文件夹.图片放在不同的文件夹之下,注意图片的size需要规约到统一的格式,不然计算均值文件的时候会报错. 2.将内容生成列表放入txt文件中.两个txt文件 ...
随机推荐
- 2018牛客网暑假ACM多校训练赛(第七场)I Tree Subset Diameter 动态规划 长链剖分 线段树
原文链接https://www.cnblogs.com/zhouzhendong/p/NowCoder-2018-Summer-Round7-I.html 题目传送门 - https://www.n ...
- BZOJ4003 [JLOI2015]城池攻占 左偏树 可并堆
欢迎访问~原文出处——博客园-zhouzhendong 去博客园看该题解 题目传送门 - BZOJ4003 题意概括 题意有点复杂,直接放原题了. 小铭铭最近获得了一副新的桌游,游戏中需要用 m 个骑 ...
- SSM整合——完全版
1, 2, 3, 4,项目建立好后: 覆盖pom.xml,地址在:https://blog.csdn.net/mark_lirenhe/article/details/80875266 alt+F5= ...
- 三色抽卡游戏 博弈论nim
你的对手太坏了!在每年的年度三色抽卡游戏锦标赛上,你的对手总是能打败你,他的秘诀是什么? 在每局三色抽卡游戏中,有n个卡组,每个卡组里所有卡片的颜色都相同,且颜色只会是红(R).绿(G).蓝(B)中的 ...
- day 35 协程与gil概念
博客链接: http://www.cnblogs.com/linhaifeng/articles/7429894.html 今日概要: 1 生产者消费者模型(补充) 2 GIL(进程与线程的应用场景) ...
- c#中的委託,匿名方法和lambda表達式
(原博)http://www.cnblogs.com/niyw/archive/2010/10/07/1845232.html 简介 在.NET中,委托,匿名方法和Lambda表达式很容易发生混淆.我 ...
- 基于Docker的服务器搭建
-----------基于Docker的多种服务器搭建----------- 开发环境 本机上的虚拟机 Centos7.4 Docker1.13.1 Openssl1.1.1 1 Nginx 1.1 ...
- CentOS7 中文man(cman)配置方法
1. 下载中文man包 http://pkgs.fedoraproject.org/repo/pkgs/man-pages-zh-CN/manpages-zh-1.5.1.tar.gz/13275fd ...
- Service插件化解决方案
--摘自<android插件化开发指南> 1.ActivityThread最终是通过Instrumentation启动一个Activity的.而ActivityThread启动Servic ...
- Codeforces 948D Perfect Security 【01字典树】
<题目链接> 题目大意: 给定两个长度为n的序列,可以改变第二个序列中数的顺序,使得两个序列相同位置的数异或之后得到的新序列的字典序最小. 解题分析: 用01字典树来解决异或最值问题.因为 ...