The
storage wars: Shadow Paging, Log Structured Merge and Write Ahead Logging

I’ve been doing a lot of research lately on storage. And in general, it seems that the most popular ways of writing to disk today are divide into the following categories.

  • Write Ahead Logging (WAL)–  Many databases use some sort of variant on that.  PostgreSQL,SQLiteMongoDBSQL
    Server
    , etc. Oracle has Redo
    Log
    , which seems similar, but I didn’t check too deeply.
  • Log Structured Merge  (LSM)– a lot of NoSQL databases use this method. Cassandra, Riak, LevelDB, SQLite 4, etc.
  • Shadow Paging – was quite popular a long time ago (80s), but still somewhat in use. LMDB, Tokyo Cabinet, CoucbDB (sort of).

WAL came into being for a very simple reason, it is drastically faster to write sequentially than it is to do random writes. Let us
assume that you store the data on disk using some sort of a tree, when you need to insert / update something in that tree, the record can be anywhere. That means that you would need to do random writes, and have to suffer the perf issues associated with that.
Instead, you can write to the log and have some sort of a background process that would update the on disk data.

It also means that you really only have to update in memory data, flush the log and you are safe. The recovery procedure is going to be pretty complex, but it gives you some nice performance. Note that you write everything at least twice, once for the log,
and once for the read data file. The log writes are sequential, the data writes are random.

LSM also take advantage of sequential write speeds, but it takes it even further, instead of updating the actual data, you will wait until the log gets to a certain size, at which point you are going to merge it with the current data file(s). That means that
you you will usually write things multiple times, in LevelDB, for example, a lot of the effort has actually gone into eradicating this
cost. The cost of compacting your data. Because what ended up happening is that you have user writes competing with the compaction writes.

Shadow Paging is not actually trying to optimize sequential writes. Well, that is not really fair. Shadow Paging & sequential writes are just not related. The reason I said CouchDB is sort of using shadow paging is that it is using the exact same mechanics
as other shadow paging system, but it always write at the end of the file. That means that is has excellent write speed, but it also means that it needs some way
to reduce space. And that means it uses compaction, which brings you right back to the competing write story.

For our purposes, we will ignore the way CouchDB work and focus on systems that works like LMDB. In those sort of systems, instead of modifying the data directly, we create a shadow page (copy on write) and modify that. Because the shadow page is only wired
up to the rest of the pages on commit, this is absolutely atomic. It also means that modifying a page is going to use one page, and leave another free (the old page). And that, in turn, means that you need to have some way of scavenging for free space. CouchDB
does that by creating a whole new file.

LMDB does that by recording the free space and reusing that in the next transaction. That means that writes to LMDB can happen anywhere. We can apply policies on top of that to mitigate that, but that is beside the point.

Let us go back to another important aspect that we have to deal with in databases. Backups. As it turn out, it is actually really simple for most LSM / WAL systems to implement that, because you can just use the logs. For LMDB, you can create a backup really
easily (in fact, since we are using shadow paging, you pretty much get it for free). However, one feature that I don’t think would be possible with LMDB would be incremental backups. WAL/LSM make it easy, just take the logs since a given point. But with LMDB
style dbs, I don’t think that this would be possible.

The storage wars: Shadow Paging, Log Structured Merge and Write Ahead Logging的更多相关文章

  1. Log Structured Merge Trees (LSM)

    1      概念 LSM = Log Structured Merge Trees 来源于google的bigtable论文. 2      解决问题 传统的数据库如MySql采用B+树存放数据,B ...

  2. Log Structured Merge Trees(LSM) 算法

    十年前,谷歌发表了 “BigTable” 的论文,论文中很多很酷的方面之一就是它所使用的文件组织方式,这个方法更一般的名字叫 Log Structured-Merge Tree. LSM是当前被用在许 ...

  3. LSM(Log Structured Merge Trees ) 笔记

    目录 一.大幅度制约存储介质吞吐量的原因 二.传统数据库的实现机制 三.LSM Tree的历史由来 四.提高写吞吐量的思路 4.1 一种方式是数据来后,直接顺序落盘 4.2 另一种方式,是保证落盘的数 ...

  4. Log Structured Merge Trees(LSM) 原理

    http://www.open-open.com/lib/view/open1424916275249.html

  5. SSTable and Log Structured Storage: LevelDB

    If Protocol Buffers is the lingua franca of individual data record at Google, then the Sorted String ...

  6. InfluxDB存储引擎Time Structured Merge Tree——本质上和LSM无异,只是结合了列存储压缩,其中引入fb的float压缩,字串字典压缩等

    The New InfluxDB Storage Engine: Time Structured Merge Tree by Paul Dix | Oct 7, 2015 | InfluxDB | 0 ...

  7. Pull后产生多余的log(Merge branch 'master' of ...)

    第一步: git reset --hard 73d0d18425ae55195068d39b3304303ac43b521a 第二步: git push -f origin feature/PAC_1 ...

  8. 一些开源搜索引擎实现——倒排使用原始文件,列存储Hbase,KV store如levelDB、mongoDB、redis,以及SQL的,如sqlite或者xxSQL

    本文说明:除开ES,Solr,sphinx系列的其他开源搜索引擎汇总于此.   A search engine based on Node.js and LevelDB A persistent, n ...

  9. 如何基于LSM-tree架构实现一写多读

    一  前言 PolarDB是阿里巴巴自研的新一代云原生关系型数据库,在存储计算分离架构下,利用了软硬件结合的优势,为用户提供具备极致弹性.海量存储.高性能.低成本的数据库服务.X-Engine是阿里巴 ...

随机推荐

  1. IPC

    IPC,全名Inter Process Communication即进程间通讯,在同一台机器上的两个进程就用IPC,不能跨物理机器。IPC包括共享内存、队列、信号量等几种方式,由于IPC通讯效率之高, ...

  2. ECMAScript 6教程 (三) Class和Module(类和模块)

    本文版权归作者和博客园共有,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出 原文连接,博客地址为 http://www.cnblogs.com/jasonnode/ .该系列课程是 ...

  3. tcp粘包,udp丢包

    TCP是面向流的, 流, 要说明就像河水一样, 只要有水, 就会一直流向低处, 不会间断. TCP为了提高传输效率, 发送数据的时候, 并不是直接发送数据到网路, 而是先暂存到系统缓冲, 超过时间或者 ...

  4. 页面缩放对css的影响

    昨天发现一个上线的项目css样式明显不对,但是查看别人的电脑上的页面样式都是没问题的,于是找了半天原因,原来是我的浏览器对这个页面缩放了,导致样式问题. 发现了页面缩放会作用在同一个域名下的所有页面, ...

  5. [CCF] Z字形扫描

    CCF Z字形扫描 感觉和LeetCode中的ZigZag还是有一些不一样的. 题目描述 在图像编码的算法中,需要将一个给定的方形矩阵进行Z字形扫描(Zigzag Scan).给定一个n×n的矩阵,Z ...

  6. Android自定义View自定义属性

    1.引言 对于自定义属性,大家肯定都不陌生,遵循以下几步,就可以实现: 自定义一个CustomView(extends View )类 编写values/attrs.xml,在其中编写styleabl ...

  7. URL地址传参乱码

    1.页面使用javascript的方法encodeURIComponent对需要转码的字符进行两次转码,如:encodeURIComponent(encodeURIComponent("** ...

  8. java 字符串split有很多坑,使用时请小心!!

    System.out.println(":ab:cd:ef::".split(":").length);//末尾分隔符全部忽略 System.out.print ...

  9. new一个数组,delete释放内存

    int *a = new int[4]; for(int i=0;i<4;i++) { a[i] = i; printf("a[%d]=%d\n", i, i); } del ...

  10. windows下MySQL 5.7+ 解压缩版安装配置方法

    方法来自伟大的互联网. 1.去官网下载.zip格式的MySQL Server的压缩包,根据需要选择x86或x64版.注意:下载是需要注册账户并登录的. 2.解压缩至你想要的位置. 3.复制解压目录下m ...