thread-4244870-1-1.html

超高性能网络编程, Asynchronous network I/O http://bbs.chinaunix.net/thread-1214570-1-1.html

从LSM-Tree、COLA-Tree谈到StackOverflow、OSQA

4. Implementation Highlights

4.实现要点

Since all of the source code has been publicly available from the outset, and due to space limitations in this paper, only a few of the most notable implementation details will be described here. Interested parties are invited to read the code in the OpenLDAP git repository and post questions on the openldap-technical mailing list.

由于所有的源代码已经公开提供从一开始，并且由于在本文中的空间限制，只有少数的最显着的实施细节将在这里描述。有意者请阅读在OpenLDAP Git仓库代码并发布OpenLDAP的技术性邮件列表上的问题。

The MDB library API was loosely modeled after the BDB API, to ease migration of BDB-based code. The first cut of the back-mdb code was simply copied from the back-bdb source tree, and then all references to the caching layers were deleted. After a few minor API differences were accounted for, the backend was fully operational (though still in need of optimization). As of today back-mdb comprises 340KB of source code, compared to 476KB for back-bdb/hdb, so back-mdb is approximately 30% smaller.

MDB的API参考了BDB的API，以方便从BDB进行迁移。back-mdb代码是简单的从back-bdb进行复制，然后删除缓存层的全部代码。经过几次小的API维护，现在backend是完全可用的（还需要一些优化）。现在back-mdb包含340K代码，相比476K的back-bdb/hdb，它大概要小30％。

The MDB code itself started from Martin Hedenfalk's append-only Btree code in the OpenBSD ldapd source repository[21]. The first cut of the MDB code was simply copied from the ldapd source, and then all of the Btree page cache manager was deleted and replaced with mmap accesses. The original Btree source yielded an object file of 39KB; the MDB version was 32KB. Initial testing with the append-only code proved that approach to be completely impractical. With a small test database and only a few hundred add/delete operations, the DB occupied 1027 pages but only 10 pages actually contained current data; over 99% of the space was wasted.

MDB 代码是从Martin Hedenfalk's在 OpenBSD ldapd 源码仓库的 append-only bTree开始的。先是删除了btree 的页缓存管理代码，取而代之的是内存映射访问(mmap)。原来的btree是39kB, mDB版本的是32KB。初步测试表明，append-only代码是不切实际的，通过一个包含几百个添加/删除操作的数据测试，DB占用了1027页，但只有10页包含当前的数据，99％的空间都被浪费了。

Along with the mmap management and page reclamation, many other significant changes were made to arrive at the current MDB library, mostly to add features from BDB that backmdb would need. As of today the MDB library comprises 35KB of object code. (Comparing source code is not very informative since the MDB source code has been heavily expanded with Doxygen comments. The initial version of mdb.c was 59KB as opposed to btree.c at 76KB but with full documentation embedded mdb.c is now 162KB. Also for comparison, BDB is now over 1.5MB of object code.)

伴随着MMAP管理和页回收，当前MDB库上发生了许多其他显著变化在。大多是从BDB添加功能backmdb需要。由于今天的MDB库包含的对象代码35KB。（比较的源代码是不是非常丰富，因为MDB的源代码已经与Doxygen的意见被大量扩展。mdb.c的最初版本是59KB，而不是在76KB但完整的文档嵌入到mdb.c是btree.c现在162KB。另外比较，BDB现已超过目标代码1.5MB）。

4.1 MDB Change Summary

4.1 MDB更改摘要

The append-only Btree code used a meta page at the end of the database file to point at the current root node of the Btree. New pages were always written out sequentially at the end of the file, followed by a new meta page upon tansaction commit. Any application opening the database needed to search backward from the end of the file to find the most recent meta page, to get a current snapshot of the database. (Further explanation of append-only operation is available at Martin's web site[22].)

append-only Btree 使用一个 meta页指向当前btree的根节点, meta 页是在文件的结尾处。在事务提交时，新page总是按顺序写在文件的结尾，然后是一个新的meta页。所有应用在打开数据库文件时，都要先从文件尾开始向前查找最近的meta页，以获取当前数据库的快照。（更多append-only tree的操作请见）

In MDB there are two meta pages occupying page 0 and page 1 of the file. They are used alternately by transactions. Each meta page points to the root node of two Btrees - one for the free list and one for the application data. New data first re-uses any available pages from the free list, then writes sequentially at the end of the file if no free pages are available. Then the older meta page is written on transaction commit. This is nothing more than standard double-buffering - any application opening the database uses the newer meta page, while a committer overwrites the older one. No locks are needed to protect readers from writers; readers are guaranteed to always see a valid root node.

在MDB中有两个meta页分别映射文件的page 0和page 1, 他们在发生事务时交易使用。每一个meta页都指向两颗btree的根节点——一个指向 free list，另一个则指向应用的数据。新的数据首先重用free list中的可用页，如果free list中没有可用页，则顺序写入到文件尾部。当提交事务时，旧的meta页会被写入。这是标准的双缓冲——其它应用打开数据库时，使用新的meta页。同时提交者根覆盖旧的。不需要使用锁来进行保护，读者保证能看到一个有效的根节点。

The original code only supported a single Btree in a given database file. For MDB we wanted to support multiple trees in a single database file. The back-mdb indexing code uses individual databases for each attribute index, and it would be a non-starter to require a sysadmin to configure multiple mmap regions for a single back-mdb instance. Additionally, the indexing code uses BDB's sorted duplicate feature, which allows multiple data items with the same key to be stored in a Btree, and this feature needed to be added to MDB as well. These features were both added using a subdatabase mechanism, which allows a data item in a Btree to be treated as the root node of another Btree.

原来的代码只支持在一个数据库文件中有一个Btree, 我们希望在MDB中能在一个数据库文件中支持多个Btree。back-mdb的索引代码中，为每个属性索引（attribute index）都提供了一个单独的数据库。系统管理员为每个back-mdb实例配置多个mmap region.此外，代码使用BDB的排序复制功能，它允许相同Key的多个数据项都能被存储在B树中。and this feature needed to be added to MDB as well. 这些特征使用子数据库机制，它允许在一个B树的数据项作为另一个B树的根节点进行处理。

4.2 Locking

For simplicity the MDB library allows only one writer at a time. Creating a write transaction acquires a lock on a writer mutex; the mutex normally resides in a shared memory region so that it can be shared between multiple processes. This shared memory is separate from the region occupied by the main database. The lock region also contains a table with one slot for every active reader in the database. The slots record the reader's process and thread ID, as well as the ID of the transaction snapshot the reader is using. (The process and thread ID are recorded to allow detection of stale entries in the table, e.g. threads that exited without releasing their reader slot.) The table is constructed in processor cache-aligned memory such that False Sharing[23] of cache lines is avoided.

为简单起见，MDB库允许在同一时间只有一个写入。创建一个写事务上获得一个写互斥锁;互斥通常驻留在共享存储器区域，以便它可以在多个进程之间共享。这个共享存储区跟数据库区是分开的。锁区域还包含一个数据表，数据表中包含一个插槽，插槽中记录了当前活动读者的信息，包括进程和线程ID，以及该读者使用的事务快照ID。（记录进程及线程ID是用于失效条目的检测，如线程在退出时不释放它们的reader slot），表是初始化在处理器高速缓冲存储器（processor cache-aligned memory）中，以避免多线程伪共享（False Sharing of cache lines）问题。

Readers acquire a slot the first time a thread opens a read transaction. Acquiring an empty slot in the table requires locking a mutex on the table. The slot address is saved in threadlocal storage and re-used the next time the thread opens a read transaction, so the thread never needs to touch the table mutex ever again. The reader stores its transaction ID in the slot at the start of the read transaction and zeroes the ID in the slot at the end of the transaction. In normal operation, there is nothing that can block the operation of readers.

当一个线程第一次开始一个读事务的时候，读者将获得一个插槽。在表中获得一个空的插槽需要把表锁定。插槽地址保存在threadlocal中，下次该线程打开读事务的时候可以重用。所以该线程不需要再次进行锁定表。读者在开始事务时，在插槽中保存它的事务ID，当事务结束时，在插槽中清空事务ID，在一般操作中读者都不会被阻塞。

The reader table is used when a writer wants to allocate a page, and knows that the free list is not empty. Writes are performed using copy-on-write semantics; whenever a page is to be written, a copy is made and the copy is modified instead of the original. Once copied, the original page's ID is added to an in-memory free list. When a transaction is committed, the inmemory free list is saved as a single record in the free list DB along with the ID of the transaction for this commit. When a writer wants to pull a page from the free list DB, it compares the transaction ID of the oldest record in the free list DB with the transaction IDs of all of the active readers. If the record in the free list DB is older than all of the readers, then all of the pages in that record may be safely re-used because nothing else in the DB points to them any more.

mdb的更多相关文章

C#+arcengine10.0+SP5实现鹰眼（加载的是mdb数据库中的数据）
using System; using System.Collections.Generic; using System.ComponentModel; using System.Data; usin ...
FireDAC 连接access MDB数据库的方法
Use Cases Open the Microsoft Access database. DriverID=MSAcc Database=c:\mydata.mdb Open the Microso ...
1. AE二次开发——地图的基本操作（加载地图文档，加载shape，加载mdb,地图的保存，缩放，漫游）
1. 加载数据Icommand方法 ICommand Butdata = new ControlsAddDataCommandClass(); Butdata.OnCreate(axMapContro ...
Convert Excel data to MDB file
所需组件: microsoft ado ext. 2.8 for ddl and security 或者更新的组件. 添加: using ADOX;using System.Runtime.Inter ...
sql 2012日志文件频繁出现：svchost (4892) 数据库引擎已分离数据库(1、C:\Windows\system32\LogFiles\Sum\Current.mdb)
svchost (4892) 数据库引擎已分离数据库(1.C:\Windows\system32\LogFiles\Sum\Current.mdb).(时间=0 秒) 内部计时序列: [1] 0.00 ...
jboss EAP 6.2 + Message Drive Bean(MDB) 整合IBM Webshpere MQ 7.5
上一篇我们知道了消息驱动Bean的基本用法,实际大型分布式企业应用中,往往会采用高性能的商业Queue产品,比如IBM Webshpere MQ(目前最新版本是7.5 ),下面讲解下如何在Jboss ...
ejb3: message drive bean(MDB)示例
上一篇已经知道了JMS的基本操作,今天来看一下ejb3中的一种重要bean:Message Drive Bean(mdb) 如果要不断监听一个队列中的消息,通常我们需要写一个监听程序,这需要一定的开发 ...
tomee 第一个远程调用 Message-driven bean(MDB)
MDB 整体结构 HelloMDB.java package cn.zno; import javax.jms.Connection; import javax.jms.ConnectionFacto ...
lightning mdb 源代码分析（1）
lighting mdb(lmdb) 是一个高性能mmap kv数据库,基本介绍和文档参见symas官网,本文将尝试分析其源代码结构以理解数据库设计的关键技术. 本系列文章将尝试从以下几个方面进行分析 ...
Sql Server建立链接服务器访问Access的MDB数据库
EXEC master.dbo.sp_addlinkedserver @server = N'test', @srvproduct=N'OLE DB Provider for Jet', @provi ...

随机推荐

由 MySQL server 和 mysql-connector 版本的不匹配引发的一场惊魂
剧情还原今天原计划给领导演示一个小Demo, 昨天在自己机器上调通OK以后就下班了... 今天上午早会后,领导说 “昨天,我让我们IT同事把新的测试环境搭建好了,XXX 你把要演示的Demo部署到上 ...
第四章 Kubernetes 架构
4.1 Master节点:Master是大脑,运行如下Daemon服务: API Server(kube-apiserver) API server提供了HTTP/HTTPS RESTful ...
Angular2快速入门-4.创建一个服务（创建NewsService提供数据）
上篇我们使用的数据是通过mock-news.ts中的const News[] 数组直接赋给Component 组件的,这篇我们把提供数据的部分单独封装成服务第一.创建news.service.ts ...
.NET中使用unity实现aop
Unity是一款知名的依赖注入容器,其支持通过自定义扩展来扩充功能.在Unity软件包内默认包含了一个对象拦截(Interception)扩展定义.本篇文章将介绍如何使用对象拦截功能来帮助你分离横切关 ...
IDA Pro 权威指南学习笔记(十三) - 基本代码转换
IDA提供的代码转换包括: 1.将数据转换为代码 2.将代码转换为数据 3.指定一个指令序列为函数 4.更改现有函数的起始或结束地址 5.更改指令操作数的显示格式代码显示选项通过 Options ...
CodeSmith介绍和常见问题解决方案
一.CodeSmith介绍 CodeSmith模板代码生成实战详解 https://www.cnblogs.com/knowledgesea/p/5016077.html 二.CodeSmith连接 ...
python开发_python中的变量：全局变量和局部变量
如果你在为python中的变量:全局变量和局部变量头疼,我想这篇blog会给你帮助运行效果: 代码部分: #Python中的变量:全局变量和局部变量 #在很多语言中,在声明全局变量的时候,都喜欢把全 ...
CentOS7.6安装PM2(Npm方式全局安装)
安装前提: 1. node环境 2. npm 安装开始: 第一步:全局安装,npm install -g pm2 第二步: 保存当前进程状态,pm2 save 第三步: 生成开机自启动服务,pm2 s ...
HTTP之Tcp/Ip协议的工作原理
计算机与网络设备要相互通信,双方就必须基于相同的方法.比如,如何探测到通信目标.由哪一边先发起通信.使用哪种语言进行通信.怎样结束通信等规则都需要事先确定.不同的硬件.操作系统之间的通信,所有的这一切 ...
Rhythmk 一步一步学 JAVA（4）:Spring MVC -之拦截器
1.实现拦截器类(myInterceptor): package com.rhythmk.Interceptor; import javax.servlet.http.HttpServletReque ...

mdb

计划开发高性能KV数据库, 学习MongoDB leveldb innodb, 练手贴+日记贴: http://bbs.chinaunix.net/thread-4244870-1-1.html

超高性能网络编程, Asynchronous network I/O http://bbs.chinaunix.net/thread-1214570-1-1.html

从LSM-Tree、COLA-Tree谈到StackOverflow、OSQA

mdb的更多相关文章

随机推荐

热门专题