SQLite is 35% Faster Than The Filesystem

比方说你要在C++/PHP里实现一个函数Image get_image(string id)，不同的图片有1万张(用户头像)，你可以把它们存在一个目录/文件夹里，然后fopen()再fread。

你也可以把它们存在一个SQLite数据库images.db里，调用SQLite来读取，"35% Faster Than The Filesystem"。你还可以把它们放在一个大文件images.dat里，自己在前面放个索引……为啥要造轮子，还不如SQLite圆？

PHP自5.3.0起默认启用SQLite3 扩展。$db=new PDO("sqlite:images.db"); $query=$db->query(...) C++ const string&好。

动手试下好了。官网上的sqlite-tools-win32-x86-3370000.zip才1.9MB。解压后sqlite3.exe 1MB。下面的例子中sqlite>是提示符，不用输入:
sqlite> .open images.db
sqlite> .read gen-test-db.sql
sqlite> .tables
kv
sqlite> SELECT COUNT(*) FROM kv;
20
sqlite> .quit
这你就创建了数据库images.db，其中有个表kv，它有20行，每行k, v两个字段。注意SELECT要以;结尾。

gen-test-db.sql的内容:
DROP TABLE IF EXISTS kv;
VACUUM;
BEGIN;
CREATE TABLE kv(k INTEGER PRIMARY KEY, v BLOB);
WITH RECURSIVE c(x) AS (VALUES(1) UNION ALL SELECT x+1 FROM c WHERE x<20)
INSERT INTO kv(k,v) SELECT x, randomblob(16) FROM c;
COMMIT;

不要慌。创建table可以那么复杂，也可以CREATE TABLE kv(k INTEGER PRIMARY KEY, v BLOB);这么简单。BLOB: Binary Large Object. 与'Tom'这样的字符串相比，一串10110的图片大多了。randomblob(16)返回个长度为16个字节的blob。上面的.sql是为了创建个测试数据库。a blob cream: 一滴奶油。

35% Faster是用kvtest.c测试得到的结果。它用SELECT v FROM kv WHERE k=?1来获取v. key, value. ?1是什么东西？我们可以每次都用[\r|\n]这样的正则表达式来匹配字符串，但当正则表达式很复杂时效率低。可以先把它编译compile下，每次匹配时用compile好的。SQL也有类似的问题。sqlite3_prepare_v2()干类似的事情，返回个指向语句statement的指针，而且参数是可变的。sqlite3_bind_int(pStmt, 1, iKey)把?1换成iKey。即kvtest.c先prepare statement，然后在循环里不断换参数，把数据库读一遍测速度。kvtest.c里BLOB的大小和读取的顺序都是随机的。

下面是细节/高级信息：

六级/考研单词: summary, data, differentiate, overhead, pad, multiple, parcel, hardware, random, fluctuate, deviate, forum, author, seldom, gray, versus, consult, extract, preliminary, threshold, website, investigate, compile, directory, instruct, desire, verify, import, desktop, accord, hierarchy, alternate, regardless, omit, disable, update, default, interact, laptop, yoga, galaxy, discard, norm, bind, parameter, evaluate, bypass, farther, tertiary, amaze, accomplish, buffer, flush, persist, commit, durable, consecutive, prone, corrupt, crash, magnitude, probable, overflow, affection, decrease, advice, install, script

SQLite reads and writes small blobs (for example, thumbnail images) 35% faster than the same blobs can be read from or written to individual files on disk using fread() or fwrite(). Furthermore, a single SQLite database holding 10-kilobyte blobs uses about 20% less disk space than storing the blobs in individual files.

The performance difference arises (we believe) because when working from an SQLite database, the open() and close() system calls are invoked only once, whereas open() and close() are invoked once for each blob when using blobs stored in individual files. It appears that the overhead of calling open() and close() is greater than the overhead of using the database. The size reduction arises from the fact that individual files are padded out to the next multiple of the filesystem block size, whereas the blobs are packed more tightly into an SQLite database.

The measurements in this article were made during the week of 2017-06-05 using a version of SQLite in between 3.19.2 and 3.20.0. You may expect future versions of SQLite to perform even better.

The 35% figure is based on running tests on every machine that the author has easily at hand. Some reviewers of this article report that SQLite has higher latency than direct I/O on their systems. We do not yet understand the difference. We also see indications that SQLite does not perform as well as direct I/O when experiments are run using a cold filesystem cache.

I/O performance is measured using the kvtest.c program from the SQLite source tree. To compile this test program, first gather the kvtest.c source file into a directory with the SQLite amalgamation source files "sqlite3.c" and "sqlite3.h". Then on unix, run a command like the following:

gcc -Os -I. -DSQLITE_DIRECT_OVERFLOW_READ kvtest.c sqlite3.c -o kvtest -ldl -lpthread

Or on Windows with MSVC:

cl -I. -DSQLITE_DIRECT_OVERFLOW_READ kvtest.c sqlite3.c

Instructions for compiling for Android are shown below.

Use the resulting "kvtest" program to generate a test database with 100,000 random uncompressible blobs, each with a random size between 8,000 and 12,000 bytes using a command like this:

./kvtest init test1.db --count 100k --size 10k --variance 2k

If desired, you can verify the new database by running this command:

./kvtest stat test1.db

Next, make copies of all the blobs into individual files in a directory using a command like this:

./kvtest export test1.db test1.dir

At this point, you can measure the amount of disk space used by the test1.db database and the space used by the test1.dir directory and all of its content. On a standard Ubuntu Linux desktop, the database file will be 1,024,512,000 bytes in size and the test1.dir directory will use 1,228,800,000 bytes of space (according to "du -k"), about 20% more than the database.

The "test1.dir" directory created above puts all the blobs into a single folder. It was conjectured that some operating systems would perform poorly when a single directory contains 100,000 objects. [Windows: 嗯？] To test this, the kvtest program can also store the blobs in a hierarchy of folders with no more than 100 files and/or subdirectories per folder. The alternative on-disk representation of the blobs can be created using the --tree command-line option to the "export" command, like this:

./kvtest export test1.db test1.tree --tree

The test1.dir directory will contain 100,000 files with names like "000000", "000001", "000002" and so forth but the test1.tree directory will contain the same files in subdirectories like "00/00/00", "00/00/01", and so on. The test1.dir and test1.test directories take up approximately the same amount of space, though test1.test is very slightly larger due to the extra directory entries.

Measure the performance for reading blobs from the database and from individual files using these commands:

./kvtest run test1.db --count 100k --blob-api
./kvtest run test1.dir --count 100k --blob-api
./kvtest run test1.tree --count 100k --blob-api

Depending on your hardware and operating system, you should see that reads from the test1.db database file are about 35% faster than reads from individual files in the test1.dir or test1.tree folders. Results can vary significantly from one run to the next due to caching, so it is advisable to run tests multiple times and take an average or a worst case or a best case, depending on your requirements.

The --blob-api option on the database read test causes kvtest to use the sqlite3_blob_read() feature of SQLite to load the content of the blobs, rather than running pure SQL statements. This helps SQLite to run a little faster on read tests. You can omit that option to compare the performance of SQLite running SQL statements. In that case, the SQLite still out-performs direct reads, though by not as much as when using sqlite3_blob_read(). The --blob-api option is ignored for tests that read from individual disk files.

Measure write performance by adding the --update option. This causes the blobs are overwritten in place with another random blob of exactly the same size.

./kvtest run test1.db --count 100k --update
./kvtest run test1.dir --count 100k --update
./kvtest run test1.tree --count 100k --update

The writing test above is not completely fair, since SQLite is doing power-safe transactions whereas the direct-to-disk writing is not. To put the tests on a more equal footing, add either the --nosync option to the SQLite writes to disable calling fsync() or FlushFileBuffers() to force content to disk, or using the --fsync option for the direct-to-disk tests to force them to invoke fsync() or FlushFileBuffers() when updating disk files.

By default, kvtest runs the database I/O measurements all within a single transaction. Use the --multitrans option to run each blob read or write in a separate transaction. The --multitrans option makes SQLite much slower, and uncompetitive with direct disk I/O. This option proves, yet again, that to get the most performance out of SQLite, you should group as much database interaction as possible within a single transaction.

There are many other testing options, which can be seen by running the command:

./kvtest help

The chart below [请去官网，我没抄图] shows data collected using kvtest.c on five different systems:

Win7: A circa-2009 Dell Inspiron laptop, Pentium dual-core at 2.30GHz, 4GiB RAM, Windows7.
Win10: A 2016 Lenovo YOGA 910, Intel i7-7500 at 2.70GHz, 16GiB RAM, Windows10.
Mac: A 2015 MacBook Pro, 3.1GHz intel Core i7, 16GiB RAM, MacOS 10.12.5
Ubuntu: Desktop built from Intel i7-4770K at 3.50GHz, 32GiB RAM, Ubuntu 16.04.2 LTS
Android: Galaxy S3, ARMv7, 2GiB RAM

All machines use SSD except Win7 which has a hard-drive. The test database is 100K blobs with sizes uniformly distributed between 8K and 12K, for a total of about 1 gigabyte of content. The database page size is 4KiB. The -DSQLITE_DIRECT_OVERFLOW_READ compile-time option was used for all of these tests. Tests were run multiple times. The first run was used to warm up the cache and its timings were discarded.

The chart below shows average time to read a blob directly from the filesystem versus the time needed to read the same blob from the SQLite database. The actual timings vary considerably from one system to another (the Ubuntu desktop is much faster than the Galaxy S3 phone, for example). This chart shows the ratio of the times needed to read blobs from a file divided by the time needed to from the database. The left-most column in the chart is the normalized time to read from the database, for reference.

In this chart, an SQL statement ("SELECT v FROM kv WHERE k=?1") is prepared once. Then for each blob, the blob key value is bound to the ?1 parameter and the statement is evaluated to extract the blob content. 看到?1我就开始上网搜了，折腾半天，现在看到"the blob key value is bound to the ?1"，一口老血喷出老远…… 气死我了，不写了下面这个对于玩NDK的有参考价值：

The kvtest program is compiled and run on Android as follows. First install the Android SDK and NDK. Then prepare a script named "android-gcc" that looks approximately like this:

#!/bin/sh
#
NDK=/home/drh/Android/Sdk/ndk-bundle
SYSROOT=$NDK/platforms/android-16/arch-arm
ABIN=$NDK/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/bin
GCC=$ABIN/arm-linux-androideabi-gcc
$GCC --sysroot=$SYSROOT -fPIC -pie $*

Make that script executable and put it on your $PATH. Then compile the kvtest program as follows:

android-gcc -Os -I. kvtest.c sqlite3.c -o kvtest-android

Next, move the resulting kvtest-android executable to the Android device:

adb push kvtest-android /data/local/tmp

Finally use "adb shell" to get a shell prompt on the Android device, cd into the /data/local/tmp directory, and begin running the tests as with any other unix host.

SQLite is 35% Faster Than The Filesystem的更多相关文章

为什么 SQLite 用 C 编写？
简评:SQLite 官方出品. C是最好的选择从 2000 年 5 月 29 日开始,SQLite 就选择了 C 语言.直到今天,C 也是实现 SQLite 这样软件库的最佳语言. C语言是实现 S ...
受 SQLite 多年青睐，C 语言到底好在哪儿？
SQLite 近日发表了一篇博文,解释了为什么多年来 SQLite 一直坚持用 C 语言来实现,以下是正文内容: C 语言是最佳选择从2000年5月29日发布至今,SQLite 一直都是用 C 语言 ...
IT四大名著
标题耸人听闻,sorry. CPU.操作系统.编译器和数据库我都不会.我英语也不行,但我认识所有的字母.:-) 万一有人感兴趣呢?https://sqlite.org/doclist.htmlThe ...
About SQLite
About SQLite See Also... Features When to use SQLite Frequently Asked Questions Well-known Users Boo ...
Checklist For Choosing The Right Database Engine
http://sqlite.org/whentouse.html Appropriate Uses For SQLite SQLite is not directly comparable to cl ...
sqlite-dbeaver-heidisql
http://www.sqlite.org/ http://www.sqliteexpert.com/ gui工具这个网站的大部分信息在2015-10-9阅读完毕,下一步是阅读软件自带的帮助文档将 ...
PowerShell 中的目录文件管理
前面的一篇文章我们说了部分在PS中进行文件浏览的基本概念,说到了几个虚拟驱动器的概念.并没有深入的描述相关的命令,这里我们进一步对这一知识点进行描述. 2.1 管理当前工作路径/位置在日常管理中经常 ...
Linux 文件系统剖析
[转自]https://www.ibm.com/developerworks/cn/linux/l-linux-filesystem/ 按照分层结构讨论 Linux 文件系统在文件系统方面,Linu ...
用C++实现一个Quaternion类
提要四元素是游戏开发中经常使用的用于处理旋转的数学工具,以下就用C++来实现一个四元素类.參考Unity中四元素的接口. 假设没有看之前的彻底搞懂四元数. 建议先看一下. 代码清单 Quatern ...

随机推荐

Python matplotlib pylab 画张图
from pylab import * w1 = 1 w2 = 25 fs = 18 y = np.arange(-2,2,0.001) x = w1*y*log(y)-1.0/w2*exp(-(w2 ...
POJ 2536 Gopher II（二分图最大匹配）
题意: N只地鼠M个洞,每只地鼠.每个洞都有一个坐标. 每只地鼠速度一样,对于每只地鼠而言,如果它跑到某一个洞的所花的时间小于等于S,它才不会被老鹰吃掉. 规定每个洞最多只能藏一只地鼠. 问最少有多少 ...
Linux安装部署Zabbix
Zabbix 是一个企业级的分布式开源监控方案,能够监控各种网络参数以及服务器健康性和完整性的软件.Zabbix使用灵活的通知机制,允许用户为几乎任何事件配置基于邮件的告警.这样可以快速反馈服务器的问 ...
swoole、swoft环境配置
一.服务器环境 1.lnmp wget http://soft.vpser.net/lnmp/lnmp1.5.tar.gz -cO lnmp1.5.tar.gz && tar zxf ...
K8S 部署 SpringBoot 项目(一篇够用)
现在比较多的互联网公司都在尝试将微服务迁到云上,这样的能够通过一些成熟的云容器管理平台更为方便地管理微服务集群,从而提高微服务的稳定性,同时也能较好地提升团队开发效率. 但是迁云存在一定的技术难点,今 ...
[python]Pytest+selenium+git+jenkins持续集成
1安装pytest框架 &pip install pytest #pytest &pip install pytest-html #pytest html测试报告 2.工程介绍 ...
基于I2C的AHT20温湿度传感器的数据采集
关于:IC( Inter-- Integrated Circuit)总线是一种由 PHILIPS公司开发的两线式串行总线,用于连接微控制器及其外围设备.它是由数据线SDA和时钟SCL构成的串行总线,可 ...
博主日常工作中使用的shell脚本分享
前言: 今天给大家分享一篇在我工作中常用的一个shell脚本,里面有一些我们常用到的shell操作.该脚本用于本地电脑和服务器交互上,实现以下功能: 自动拉取自己个人电脑上的源码到服务器上yocto包 ...
Mastering-VSCode
中英文等宽 14寸1920x1080, Win10, 设置如下(前两个字体就够了), 字号14,16都可以. 需要下载UbuntuMono字体. 如果分表率低如14寸1366x768,可尝试 Inco ...
[51nod1587]半现串
将s所有长度为d/2的子串放进ac自动机中,直接匹配就可以判定半现串了再对其做一个差分,询问一个前缀的半现串个数,在ac自动机上数位dp,f[i][j][0/1]表示走了i步(i位的字符串),走到节点 ...

SQLite is 35% Faster Than The Filesystem

SQLite is 35% Faster Than The Filesystem的更多相关文章

随机推荐

热门专题