mdb
计划开发高性能KV数据库, 学习MongoDB leveldb innodb, 练手贴+日记贴: http://bbs.chinaunix.net/thread-4244870-1-1.html
超高性能网络编程, Asynchronous network I/O http://bbs.chinaunix.net/thread-1214570-1-1.html
从LSM-Tree、COLA-Tree谈到StackOverflow、OSQA
4. Implementation Highlights
4.实现要点
Since all of the source code has been publicly available from the outset, and due to space limitations in this paper, only a few of the most notable implementation details will be described here. Interested parties are invited to read the code in the OpenLDAP git repository and post questions on the openldap-technical mailing list.
由于所有的源代码已经公开提供从一开始,并且由于在本文中的空间限制,只有少数的最显着的实施细节将在这里描述。有意者请阅读在OpenLDAP Git仓库代码并发布OpenLDAP的技术性邮件列表上的问题。
The MDB library API was loosely modeled after the BDB API, to ease migration of BDB-based code. The first cut of the back-mdb code was simply copied from the back-bdb source tree, and then all references to the caching layers were deleted. After a few minor API differences were accounted for, the backend was fully operational (though still in need of optimization). As of today back-mdb comprises 340KB of source code, compared to 476KB for back-bdb/hdb, so back-mdb is approximately 30% smaller.
MDB的API参考了BDB的API,以方便从BDB进行迁移。back-mdb代码是简单的从back-bdb进行复制,然后删除缓存层的全部代码。经过几次小的API维护,现在backend是完全可用的(还需要一些优化)。现在back-mdb包含340K代码,相比476K的back-bdb/hdb,它大概要小30%。
The MDB code itself started from Martin Hedenfalk's append-only Btree code in the OpenBSD ldapd source repository[21]. The first cut of the MDB code was simply copied from the ldapd source, and then all of the Btree page cache manager was deleted and replaced with mmap accesses. The original Btree source yielded an object file of 39KB; the MDB version was 32KB. Initial testing with the append-only code proved that approach to be completely impractical. With a small test database and only a few hundred add/delete operations, the DB occupied 1027 pages but only 10 pages actually contained current data; over 99% of the space was wasted.
MDB 代码是从Martin Hedenfalk's在 OpenBSD ldapd 源码仓库的 append-only bTree开始的。先是删除了btree 的页缓存管理代码,取而代之的是内存映射访问(mmap)。原来的btree是39kB, mDB版本的是32KB。初步测试表明,append-only代码是不切实际的,通过一个包含几百个添加/删除操作的数据测试,DB占用了1027页,但只有10页包含当前的数据,99%的空间都被浪费了。
Along with the mmap management and page reclamation, many other significant changes were made to arrive at the current MDB library, mostly to add features from BDB that backmdb would need. As of today the MDB library comprises 35KB of object code. (Comparing source code is not very informative since the MDB source code has been heavily expanded with Doxygen comments. The initial version of mdb.c was 59KB as opposed to btree.c at 76KB but with full documentation embedded mdb.c is now 162KB. Also for comparison, BDB is now over 1.5MB of object code.)
伴随着MMAP管理和页回收,当前MDB库上发生了许多其他显著变化在。大多是从BDB添加功能backmdb需要。由于今天的MDB库包含的对象代码35KB。(比较的源代码是不是非常丰富,因为MDB的源代码已经与Doxygen的意见被大量扩展。mdb.c的最初版本是59KB,而不是在76KB但完整的文档嵌入到mdb.c是btree.c现在162KB。另外比较,BDB现已超过目标代码1.5MB)。
4.1 MDB Change Summary
4.1 MDB更改摘要
The append-only Btree code used a meta page at the end of the database file to point at the current root node of the Btree. New pages were always written out sequentially at the end of the file, followed by a new meta page upon tansaction commit. Any application opening the database needed to search backward from the end of the file to find the most recent meta page, to get a current snapshot of the database. (Further explanation of append-only operation is available at Martin's web site[22].)
append-only Btree 使用一个 meta页指向当前btree的根节点, meta 页是在文件的结尾处。在事务提交时,新page总是按顺序写在文件的结尾,然后是一个新的meta页。所有应用在打开数据库文件时,都要先从文件尾开始向前查找最近的meta页,以获取当前数据库的快照。(更多append-only tree的操作请见)
In MDB there are two meta pages occupying page 0 and page 1 of the file. They are used alternately by transactions. Each meta page points to the root node of two Btrees - one for the free list and one for the application data. New data first re-uses any available pages from the free list, then writes sequentially at the end of the file if no free pages are available. Then the older meta page is written on transaction commit. This is nothing more than standard double-buffering - any application opening the database uses the newer meta page, while a committer overwrites the older one. No locks are needed to protect readers from writers; readers are guaranteed to always see a valid root node.
在MDB中有两个meta页分别映射文件的page 0和page 1, 他们在发生事务时交易使用。每一个meta页都指向两颗btree的根节点——一个指向 free list,另一个则指向应用的数据。新的数据首先重用free list中的可用页,如果free list中没有可用页,则顺序写入到文件尾部。当提交事务时,旧的meta页会被写入。这是标准的双缓冲——其它应用打开数据库时,使用新的meta页。同时提交者根覆盖旧的。不需要使用锁来进行保护,读者保证能看到一个有效的根节点。
The original code only supported a single Btree in a given database file. For MDB we wanted to support multiple trees in a single database file. The back-mdb indexing code uses individual databases for each attribute index, and it would be a non-starter to require a sysadmin to configure multiple mmap regions for a single back-mdb instance. Additionally, the indexing code uses BDB's sorted duplicate feature, which allows multiple data items with the same key to be stored in a Btree, and this feature needed to be added to MDB as well. These features were both added using a subdatabase mechanism, which allows a data item in a Btree to be treated as the root node of another Btree.
原来的代码只支持在一个数据库文件中有一个Btree, 我们希望在MDB中能在一个数据库文件中支持多个Btree。back-mdb的索引代码中,为每个属性索引(attribute index)都提供了一个单独的数据库。系统管理员为每个back-mdb实例配置多个mmap region.此外,代码使用BDB的排序复制功能,它允许相同Key的多个数据项都能被存储在B树中。and this feature needed to be added to MDB as well. 这些特征使用子数据库机制,它允许在一个B树的数据项作为另一个B树的根节点进行处理。
4.2 Locking
For simplicity the MDB library allows only one writer at a time. Creating a write transaction acquires a lock on a writer mutex; the mutex normally resides in a shared memory region so that it can be shared between multiple processes. This shared memory is separate from the region occupied by the main database. The lock region also contains a table with one slot for every active reader in the database. The slots record the reader's process and thread ID, as well as the ID of the transaction snapshot the reader is using. (The process and thread ID are recorded to allow detection of stale entries in the table, e.g. threads that exited without releasing their reader slot.) The table is constructed in processor cache-aligned memory such that False Sharing[23] of cache lines is avoided.
为简单起见,MDB库允许在同一时间只有一个写入。创建一个写事务上获得一个写互斥锁;互斥通常驻留在共享存储器区域,以便它可以在多个进程之间共享。这个共享存储区跟数据库区是分开的。锁区域还包含一个数据表,数据表中包含一个插槽,插槽中记录了当前活动读者的信息,包括进程和线程ID,以及该读者使用的事务快照ID。(记录进程及线程ID是用于失效条目的检测,如线程在退出时不释放它们的reader slot),表是初始化在处理器高速缓冲存储器(processor cache-aligned memory)中,以避免多线程伪共享(False Sharing of cache lines)问题。
Readers acquire a slot the first time a thread opens a read transaction. Acquiring an empty slot in the table requires locking a mutex on the table. The slot address is saved in threadlocal storage and re-used the next time the thread opens a read transaction, so the thread never needs to touch the table mutex ever again. The reader stores its transaction ID in the slot at the start of the read transaction and zeroes the ID in the slot at the end of the transaction. In normal operation, there is nothing that can block the operation of readers.
当一个线程第一次开始一个读事务的时候,读者将获得一个插槽。在表中获得一个空的插槽需要把表锁定。插槽地址保存在threadlocal中,下次该线程打开读事务的时候可以重用。所以该线程不需要再次进行锁定表。读者在开始事务时,在插槽中保存它的事务ID,当事务结束时,在插槽中清空事务ID,在一般操作中读者都不会被阻塞。
The reader table is used when a writer wants to allocate a page, and knows that the free list is not empty. Writes are performed using copy-on-write semantics; whenever a page is to be written, a copy is made and the copy is modified instead of the original. Once copied, the original page's ID is added to an in-memory free list. When a transaction is committed, the inmemory free list is saved as a single record in the free list DB along with the ID of the transaction for this commit. When a writer wants to pull a page from the free list DB, it compares the transaction ID of the oldest record in the free list DB with the transaction IDs of all of the active readers. If the record in the free list DB is older than all of the readers, then all of the pages in that record may be safely re-used because nothing else in the DB points to them any more.
mdb的更多相关文章
- C#+arcengine10.0+SP5实现鹰眼(加载的是mdb数据库中的数据)
using System; using System.Collections.Generic; using System.ComponentModel; using System.Data; usin ...
- FireDAC 连接access MDB数据库的方法
Use Cases Open the Microsoft Access database. DriverID=MSAcc Database=c:\mydata.mdb Open the Microso ...
- 1. AE二次开发——地图的基本操作(加载地图文档,加载shape,加载mdb,地图的保存,缩放,漫游)
1. 加载数据Icommand方法 ICommand Butdata = new ControlsAddDataCommandClass(); Butdata.OnCreate(axMapContro ...
- Convert Excel data to MDB file
所需组件: microsoft ado ext. 2.8 for ddl and security 或者更新的组件. 添加: using ADOX;using System.Runtime.Inter ...
- sql 2012日志文件频繁出现:svchost (4892) 数据库引擎已分离数据库(1、C:\Windows\system32\LogFiles\Sum\Current.mdb)
svchost (4892) 数据库引擎已分离数据库(1.C:\Windows\system32\LogFiles\Sum\Current.mdb).(时间=0 秒) 内部计时序列: [1] 0.00 ...
- jboss EAP 6.2 + Message Drive Bean(MDB) 整合IBM Webshpere MQ 7.5
上一篇我们知道了消息驱动Bean的基本用法,实际大型分布式企业应用中,往往会采用高性能的商业Queue产品,比如IBM Webshpere MQ(目前最新版本是7.5 ),下面讲解下如何在Jboss ...
- ejb3: message drive bean(MDB)示例
上一篇已经知道了JMS的基本操作,今天来看一下ejb3中的一种重要bean:Message Drive Bean(mdb) 如果要不断监听一个队列中的消息,通常我们需要写一个监听程序,这需要一定的开发 ...
- tomee 第一个 远程调用 Message-driven bean(MDB)
MDB 整体结构 HelloMDB.java package cn.zno; import javax.jms.Connection; import javax.jms.ConnectionFacto ...
- lightning mdb 源代码分析(1)
lighting mdb(lmdb) 是一个高性能mmap kv数据库,基本介绍和文档参见symas官网,本文将尝试分析其源代码结构以理解数据库设计的关键技术. 本系列文章将尝试从以下几个方面进行分析 ...
- Sql Server建立链接服务器访问Access的MDB数据库
EXEC master.dbo.sp_addlinkedserver @server = N'test', @srvproduct=N'OLE DB Provider for Jet', @provi ...
随机推荐
- (转) Docker EE/Docker CE简介与版本规划
随着Docker的不断流行与发展,docker公司(或称为组织)也开启了商业化之路,Docker 从 17.03版本之后分为 CE(Community Edition) 和 EE(Enterprise ...
- ALSA声卡08_从零编写之框架_学习笔记
1.整体框架 (1)图示((DAI(全称Digital Audio Interface)接口)) 在嵌入式系统里面,声卡驱动是ASOC,是在ALSA驱动上封装的一层,包括以下三大块 (2)程序框架 m ...
- canvas之画一个三角形
<canvas id="canvas" width="500" height="500" style="background ...
- XML的学习
XML是可扩展标记语言德意思,它和HTML一样都是标记语言(标签语言),不同之处在于XML可拓展,何为可拓展?在HTML中每个标签都有其特定的含义,我们不可以随便写一个标签并赋予其意义,而XML中就可 ...
- Core Java (十一) Java 继承,类,超类和子类
Core Java (十一) Java 继承,类,超类和子类 标签: javaJavaJAVA 2013-01-22 17:08 1274人阅读 评论(0) 收藏 举报 分类: java(58) 读 ...
- pubmed检索完全攻略
第一章 进入PubMed魔法学校--PubMed 概述 有位退休的老教授不止一次的向我感叹:"你们现在真是幸福,我们那时候要查一篇相关的文献,要到图书馆一本一本目录去检索.尤其是做一些别人不 ...
- python+selenium+Firefox+pycharm版本匹配
window(2018-05-29)最新 python:3.6.1 地址https://www.python.org/downloads/release/python-361/ selenium ...
- [转] C#-using用法详解
转载自 WanderOCN的文章 C#-using用法详解 using 关键字有两个主要用途: (一).作为指令,用于为命名空间创建别名或导入其他命名空间中定义的类型. (二).作为语句,用于定义一个 ...
- 「小程序JAVA实战」小程序的基础组件(24)
转自:https://idig8.com/2018/08/12/xiaochengxu-chuji-24/ 来说下 ,小程序的基础组件.源码:https://github.com/limingios/ ...
- LNK2026: 模块对于 SAFESEH 映像是不安全的<转>
转自VC错误:http://www.vcerror.com/?p=162 错误描述: 在使用VS2012编译工程时,提示错误:" error LNK2026: 模块对于 SAFESEH 映像 ...