MongoDB 3.0 WiredTiger Compression and Performance

One of the most exciting developments over the lifetime of MongoDB must be the inclusion of the WiredTiger storage engine in MongoDB 3.0. Its very design and core architecture are legions ahead of the current MMAPv1 engine and comparable to most modern day storage engines for various relational and non-relational stores. One of the most compelling features of the WiredTiger storage engine is compression. Let's talk a bit more about performance and compression.

Configuration

MongoDB 3.0 allows the user to configure different storage engines through the storage engine API. For the first time ever we have an amazing array of options for setting up MongoDB to match our workloads and use-cases. To run WiredTiger the version must be 3.0 or higher and the configuration file must call for WiredTiger. For example:

storage:
dbPath: "/data/mongodb"
journal:
enabled: true
engine: "wiredTiger"
wiredTiger:
engineConfig:
cacheSizeGB: 99
journalCompressor: none
directoryForIndexes: "/indexes/mongodb/"
collectionConfig:
blockCompressor: snappy
indexConfig:
prefixCompression: true
systemLog:
destination: file
path: "/tmp/mongodb.log"
logAppend: true
processManagement:
fork: true
net:
port: 9005
unixDomainSocket:
enabled : true

There are a lot of new configuration options in 3.0 so let's take the notable options one by one.

  • storage.engine. The setting ensures we are using the WiredTiger storage engine. Should be set to "wiredTiger" to use the WiredTiger engine. It can also be set to "mmapv1". MMAPv1 is the default in 3.0, but in MongoDB 3.1 (potentially) this will change to wiredTiger.

  • storage.wiredTiger.engineConfig.cacheSizeGB. This sets up a page cache for WiredTiger to cache frequently used data and index blocks in GB. If this is not specified, MongoDB will automatically assign memory up to about 50% of total addressable memory.

  • storage.wiredTiger.engineConfig.directoryForIndexes. Yes! We can now store indexes on a separate block device. This should help DBAs size, capacity plan, and augment performance as needed.

  • storage.wiredTiger.collectionConfig.blockCompressor. This can be set to 'snappy' or 'zlib'. Snappy having higher performance and lower compression than zlib. Some more detail later on compression algorithms.

  • storage.wiredTiger.indexConfig.prefixCompression. This setting enables prefix compression for indexes. Valid options are true|false and the default is true.

Let's talk performance

WiredTiger is going to be much faster than MMAPv1 for almost all workloads. Its real sweet spot is highly concurrent and/or workloads with lots of updates. This may surprise some folks because traditionally compression is a trade off. Add compression, lose performance. That is normally true, but a couple of things need to be considered here. One, we are comparing the MMAPv1 engine with database level locking to WiredTiger with document level locking. Any reasonable concurrent workload is almost always bound by locking and seldom by pure system level resources. Two, WiredTiger does page level compression. More on this later.

There are a few things that make WiredTiger faster other than its locking scope. WiredTiger also has a streamlined process for free space lookups and management and it has a proper cache with its own I/O components.

Because WiredTiger allows for compression, a common worry is the potential for overall performance impact. But as you can see, in a practical sense this worry is mostly unfounded.

A couple graphs for relative performance difference for sysbench-mongodb. It should be noted that WiredTiger is using defaults in this configuration, including snappy compression and index prefix compression.

Let's break it down a bit more:

The relative CPU usage for each:

Let's talk more about compression

Compressing data inside a database is tricky. WiredTiger does a great job at handling compression because of its sophisticated management approach:

The cache generally stores uncompressed changes (the exception is for very large documents). The default snappy compression is fairly straightforward: it gathers data up to a maximum of 32KB, compresses it, and if compression is successful, writes the block rounded up to the nearest 4KB.

The alternative zlib compression works a little differently: it will gather more data and compress enough to fill a 32KB block on disk. This is more CPU intensive but generally results in better compression ratios (independent of the inherent differences between snappy and zlib).

—Michael Cahill

This approach is great for performance. But compression still has overhead and can vary in effectiveness. What this means for users is two-fold:

  • Not all data sets compress equally, it depends on the data format itself.
  • Data compression is temporal. One day being better than another depending on the specific workload.

One approach is to take a mongodump of the dataset in question then mongorestore that data to a compressed WiredTiger database and measure the difference. This gives a rough measurement of what one can expect the compression ratio to be. That said, as soon as the new compressed database starts taking load, that compression ratio may vary. Probably not by a massive margin however.

It should be noted there are some tricky bits to consider when running a database using compression. Because WiredTiger compresses each page before it hits the disk the memory region is uncompressed. This means that highly compressed data will have a large ratio between its footprint on disk and the cache that serves it. Poorly compressed data the opposite. The effect may be the database becomes slow. It will be hard to know that the problem is the caching pattern has changed because the compression properties of the underlaying data have changed. Keeping good time series data on the cache utilization, and periodically checking the compression of the data by hand may help the DBA understand these patterns better.

For instance, note the different compression ratios of various datasets:

Take Aways

  • MongoDB 3.0 has a new storage engine API, and is delivered with the optional WiredTiger engine.
  • MongoDB 3.0 with WiredTiger is much faster than MMAPv1 mostly because of increased concurrency.
  • MongoDB 3.0 with WiredTiger is much faster than MMAPv1 even when compressing the data.

Lastly, remember, MongoDB 3.0 is a new piece of software. Test before moving production workloads to it. TEST TEST TEST.

If you would like to test MongoDB 3.0 with WiredTiger, ObjectRocket has it as generally available and it's simple and quick to setup. As with anything ObjectRocket, there are a team of DBAs and Developers to help you with your projects. Don't be shy hitting them up at support@objectrocket.com with questions or email me directly.

Note: test configuration and details documented here.

mongodb压缩——snappy、zlib块压缩,btree索引前缀压缩的更多相关文章

  1. MyISAM的前缀压缩索引在索引块中的组织方式

    纯粹自己的理解,哪位大佬看到了还请指正. 首先贴一张<高性能MySQL>中的一段话: 这句话的意思是说,MyISAM使用b+树组织索引.也就是说无论索引压缩与否,组织方式一定是B+树. 下 ...

  2. myisam压缩(前缀压缩)索引

    myisam使用前缀压缩来减少索引的大小,从而让更多的索引可以放入内存中,默认只压缩字符串,但通过参数配置也可以对整数做压缩,myisam压缩每个索引块的方法是,先完全保存索引块中的第一个值,然后将其 ...

  3. mysql索引之八:myisam压缩(前缀压缩)索引

    myisam使用前缀压缩来减少索引的大小,从而让更多的索引可以放入内存中,默认只压缩字符串,但通过参数配置也可以对整数做压缩,myisam压缩每个索引块的方法是,先完全保存索引块中的第一个值,然后将其 ...

  4. Node基础:资源压缩之zlib

    概览 做过web性能优化的同学,对性能优化大杀器gzip应该不陌生.浏览器向服务器发起资源请求,比如下载一个js文件,服务器先对资源进行压缩,再返回给浏览器,以此节省流量,加快访问速度. 浏览器通过H ...

  5. 【Bitmap Index】B-Tree索引与Bitmap位图索引的锁代价比较研究

    通过以下实验,来验证Bitmap位图索引较之普通的B-Tree索引锁的“高昂代价”.位图索引会带来“位图段级锁”,实际使用过程一定要充分了解不同索引带来的锁代价情况. 1.为比较区别,创建两种索引类型 ...

  6. 腾讯Hermes设计概要——数据分析用的是列存储,词典文件前缀压缩,倒排文件递增id、变长压缩、依然是跳表-本质是lucene啊

    转自:http://data.qq.com/article?id=817 三.Hermes设计概要 架构描述 系统核心进程均采用分散化设计,根据业务发展需求,可随意扩缩容机器; 周期性数据直接通过td ...

  7. 0103MySQL中的B-tree索引 USINGWHERE和USING INDEX同时出现

    转自博客http://www.amogoo.com/article/4 前提1,为了与时俱进,文中数据库环境为MySQL5.6版本2,为了通用,更为了避免造数据的痛苦,文中所涉及表.数据,均来自于My ...

  8. Zip文件压缩(加密||非加密||压缩指定目录||压缩目录下的单个文件||根据路径压缩||根据流压缩)

    1.写入Excel,并加密压缩.不保存文件 String dcxh = String.format("%03d", keyValue); String folderFileName ...

  9. oracle中的B-TREE索引

    在字段值情况不同的条件下测试B-TREE索引效率 清空共享池和数据缓冲区alter system flush shared_pool;alter system flush buffer_cache; ...

随机推荐

  1. 邁向IT專家成功之路的三十則鐵律 鐵律四:IT人快速成長之道-複製

    相信您一定看到過現今有許多各行各業的成功人士,他們最初都是從複製別人的成功經驗開始的,就算是一位知名的歌手,有許多都是在未成名以前,先行模仿知名歌手的唱腔.舞蹈.服裝等等開始的,然後在慢慢經過自我努力 ...

  2. Android应用开发之所有动画使用详解

    题外话:有段时间没有更新博客了,这篇文章也是之前写了一半一直放在草稿箱,今天抽空把剩余的补上的.消失的这段时间真的好忙,节奏一下子有些适应不过来,早晨七点四十就得醒来,晚上九点四十才准备下班,好像最近 ...

  3. 使用sqlalchemy查询并删除数据表的唯一性索引

    简单描述表结构,字段类型 desc tabl_name 删除索引:alter table `db`.`table_name` drop index `index_name`  注意里面的特殊符号: ` ...

  4. Lazarus安装使用

    Lazarus安装使用 最后还是安装了Lazarus: 安装之后,新建了项目,还引入了Unit,就可以跑了: 学习:http://tieba.baidu.com/p/3164001113 progra ...

  5. JavaScript-4.7-friendly_table---ShinePans

    <html> <head> <meta http-equiv="content-type" content="text/html;chars ...

  6. 赵雅智_Android案例_刮刮乐

    实现效果 主要代码 <FrameLayout xmlns:android="http://schemas.android.com/apk/res/android" xmlns ...

  7. python--简易员工信息系统编写

    补充内容:eval 将字符串变成变量名locals   看输入的是否是字典中的一个keyfunc.__name____怎么看变量名的数据类型斐波那契数列 li=[1,1] while li[-1]&l ...

  8. JSP 随记

    jstl <c:forEach> 遍历,多个<option>时显示"全部".单个 option时,默认选中! 引入:<%@ taglib prefix ...

  9. 隐私问题成O2O绊脚石,加强行业监管迫在眉睫

        这年头,O2O的发展越来越给力了.因为O2O能充分结合互联网经济的线上优势和传统经济的线下优势,因此,传统商户纷纷借助O2O来开展业务,取得了不俗的成绩.只是,在移动互联网越来越"开 ...

  10. iOS开发者必备:四款后端服务工具

    本文转载至 http://mobile.51cto.com/iphone-411917.htm 对于开发者来说,连接后端数据或许是一件特别痛苦的事情.但后端服务却能够帮助开发人员以更快的速度构建移动应 ...