MongoDB3.0中的压缩选项

在MongoDB 3.0中,WiredTiger为集合提供三个压缩选项:

  1. 无压缩
  2. Snappy(默认启用) – 很不错的压缩,有效利用资源
  3. zlib(类似gzip) – 出色的压缩,但需要占用更多资源

有索引的两个压缩选项:

  1. 无压缩
  2. 前缀(默认启用) – 良好的压缩,资源的有效利用

请记住哪些适用于MongoDB的3.0所有压缩选项:

  1. 随机数据不能压缩
  2. 二进制数据不能压缩(它可能已经被压缩)
  3. 文本压缩效果特别好
  4. 对于文件中的字段名压缩效果特别好(尤其对短字段名来说)

官方说法:

Compression

With WiredTiger, MongoDB supports compression for all collections and indexes. Compression minimizes storage use at the expense of additional CPU.

By default, WiredTiger uses block compression with the snappy compression library for all collections andprefix compression for all indexes.

For collections, block compression with zlib is also available. To specify an alternate compression algorithm or no compression, use the storage.wiredTiger.collectionConfig.blockCompressor setting.

For indexes, to disable prefix compression, use thestorage.wiredTiger.indexConfig.prefixCompression setting.

Compression settings are also configurable on a per-collection and per-index basis during collection and index creation. See Specify Storage Engine Options and db.collection.createIndex() storageEngine option.

For most workloads, the default compression settings balance storage efficiency and processing requirements.

The WiredTiger journal is also compressed by default. For information on journal compression, see Journal.

MongoDB 3.0 WiredTiger Compression and Performance

  • 转自:http://objectrocket.com/blog/company/mongodb-wiredtiger

One of the most exciting developments over the lifetime of MongoDB must be the inclusion of the WiredTiger storage engine in MongoDB 3.0. Its very design and core architecture are legions ahead of the current MMAPv1 engine and comparable to most modern day storage engines for various relational and non-relational stores. One of the most compelling features of the WiredTiger storage engine is compression. Let's talk a bit more about performance and compression.

Configuration

MongoDB 3.0 allows the user to configure different storage engines through the storage engine API. For the first time ever we have an amazing array of options for setting up MongoDB to match our workloads and use-cases. To run WiredTiger the version must be 3.0 or higher and the configuration file must call for WiredTiger. For example:

storage:
dbPath: "/data/mongodb"
journal:
enabled: true
engine: "wiredTiger"
wiredTiger:
engineConfig:
cacheSizeGB: 99
journalCompressor: none
directoryForIndexes: "/indexes/mongodb/"
collectionConfig:
blockCompressor: snappy
indexConfig:
prefixCompression: true
systemLog:
destination: file
path: "/tmp/mongodb.log"
logAppend: true
processManagement:
fork: true
net:
port: 9005
unixDomainSocket:
enabled : true

There are a lot of new configuration options in 3.0 so let's take the notable options one by one.

  • storage.engine. The setting ensures we are using the WiredTiger storage engine. Should be set to "wiredTiger" to use the WiredTiger engine. It can also be set to "mmapv1". MMAPv1 is the default in 3.0, but in MongoDB 3.1 (potentially) this will change to wiredTiger.

  • storage.wiredTiger.engineConfig.cacheSizeGB. This sets up a page cache for WiredTiger to cache frequently used data and index blocks in GB. If this is not specified, MongoDB will automatically assign memory up to about 50% of total addressable memory.

  • storage.wiredTiger.engineConfig.directoryForIndexes. Yes! We can now store indexes on a separate block device. This should help DBAs size, capacity plan, and augment performance as needed.

  • storage.wiredTiger.collectionConfig.blockCompressor. This can be set to 'snappy' or 'zlib'. Snappy having higher performance and lower compression than zlib. Some more detail later on compression algorithms.

  • storage.wiredTiger.indexConfig.prefixCompression. This setting enables prefix compression for indexes. Valid options are true|false and the default is true.

Let's talk performance

WiredTiger is going to be much faster than MMAPv1 for almost all workloads. Its real sweet spot is highly concurrent and/or workloads with lots of updates. This may surprise some folks because traditionally compression is a trade off. Add compression, lose performance. That is normally true, but a couple of things need to be considered here. One, we are comparing the MMAPv1 engine with database level locking to WiredTiger with document level locking. Any reasonable concurrent workload is almost always bound by locking and seldom by pure system level resources. Two, WiredTiger does page level compression. More on this later.

There are a few things that make WiredTiger faster other than its locking scope. WiredTiger also has a streamlined process for free space lookups and management and it has a proper cache with its own I/O components.

Because WiredTiger allows for compression, a common worry is the potential for overall performance impact. But as you can see, in a practical sense this worry is mostly unfounded.

A couple graphs for relative performance difference for sysbench-mongodb. It should be noted that WiredTiger is using defaults in this configuration, including snappy compression and index prefix compression.

Let's break it down a bit more:

The relative CPU usage for each:

Let's talk more about compression

Compressing data inside a database is tricky. WiredTiger does a great job at handling compression because of its sophisticated management approach:

The cache generally stores uncompressed changes (the exception is for very large documents). The default snappy compression is fairly straightforward: it gathers data up to a maximum of 32KB, compresses it, and if compression is successful, writes the block rounded up to the nearest 4KB.

The alternative zlib compression works a little differently: it will gather more data and compress enough to fill a 32KB block on disk. This is more CPU intensive but generally results in better compression ratios (independent of the inherent differences between snappy and zlib).

—Michael Cahill

This approach is great for performance. But compression still has overhead and can vary in effectiveness. What this means for users is two-fold:

  • Not all data sets compress equally, it depends on the data format itself.
  • Data compression is temporal. One day being better than another depending on the specific workload.

One approach is to take a mongodump of the dataset in question then mongorestore that data to a compressed WiredTiger database and measure the difference. This gives a rough measurement of what one can expect the compression ratio to be. That said, as soon as the new compressed database starts taking load, that compression ratio may vary. Probably not by a massive margin however.

It should be noted there are some tricky bits to consider when running a database using compression. Because WiredTiger compresses each page before it hits the disk the memory region is uncompressed. This means that highly compressed data will have a large ratio between its footprint on disk and the cache that serves it. Poorly compressed data the opposite. The effect may be the database becomes slow. It will be hard to know that the problem is the caching pattern has changed because the compression properties of the underlaying data have changed. Keeping good time series data on the cache utilization, and periodically checking the compression of the data by hand may help the DBA understand these patterns better.

For instance, note the different compression ratios of various datasets:

Take Aways

  • MongoDB 3.0 has a new storage engine API, and is delivered with the optional WiredTiger engine.
  • MongoDB 3.0 with WiredTiger is much faster than MMAPv1 mostly because of increased concurrency.
  • MongoDB 3.0 with WiredTiger is much faster than MMAPv1 even when compressing the data.

Lastly, remember, MongoDB 3.0 is a new piece of software. Test before moving production workloads to it. TEST TEST TEST.

If you would like to test MongoDB 3.0 with WiredTiger, ObjectRocket has it as generally available and it's simple and quick to setup. As with anything ObjectRocket, there are a team of DBAs and Developers to help you with your projects. Don't be shy hitting them up at support@objectrocket.com with questions or email me directly.

Note: test configuration and details documented here.

MongoDB 3.0 WiredTiger Compression and Performance的更多相关文章

  1. 新年新技术:MongoDB 3.0

    前一篇介绍了HTTP/2,这一篇简单介绍下3月3号发布的MongoDB 3.0. What’s new in MongoDB 3.0? 新的存储引擎WiredTiger MongoDB 3.0的存储引 ...

  2. CentOS7 安装MongoDB 3.0服务器

    1,下载&安装 MongoDB 3.0 正式版本发布!这标志着 MongoDB 数据库进入了一个全新的发展阶段,提供强大.灵活而且易于管理的数据库管理系统.MongoDB宣称,3.0新版本不只 ...

  3. MongoDB 3.0 新特性【转】

    本文来自:http://www.open-open.com/lib/view/open1427078982824.html#_label3 更多信息见官网: http://docs.mongodb.o ...

  4. MongoDB 3.0(1):CentOS7 安装MongoDB 3.0服务

    目录(?)[-] 1下载安装 2MongoDB CRUD 1创建数据 2更新数据 3删除 4查询 5更多方法 3MongoDB可视化工具 4总结   本文原文连接: http://blog.csdn. ...

  5. CentOS7 安装MongoDB 3.0服务

    1,下载&安装 MongoDB 3.0 正式版本发布!这标志着 MongoDB 数据库进入了一个全新的发展阶段,提供强大.灵活而且易于管理的数据库管理系统.MongoDB宣称,3.0新版本不只 ...

  6. [译]MongoDB 3.0发布说明

    原文来自:http://docs.mongodb.org/manual/release-notes/3.0/ 2015年3月3日 MongoDB 3.0现已可供使用.关键新特性包括支持WiredTig ...

  7. (转)MongoDB 3.0 WT引擎参考配置文件

    mongodb 3.0 改变很多,从2.6版本升级到3.0要关注的细节很多,如权限等等.3.0在数据存储引擎上更换成了wiredTiger,在数据压缩方面很有效,解决大数据量问题的情况下,磁盘不够用的 ...

  8. (转)重磅出击:MongoDB 3.0正式版即将发布

    MongoDB 今天宣布 3.0 正式版本即将发布.这标志着 MongoDB 数据库进入了一个全新的发展阶段,提供强大.灵活而且易于管理的数据库管理系统. MongoDB 3.0 在性能和伸缩性方面都 ...

  9. MongoDB 3.0新增特性一览

    转自:http://blog.sina.com.cn/s/blog_48c95a190102vedr.html 引言 在历经版本号修改(2.8版本直接跳到3.0版本)和11个rc版本之后,MongoD ...

随机推荐

  1. UVA 10759 Dice Throwing

    题意为抛n个骰子凑成的点数和大于或等于x的概率,刚开始用暴力枚举,虽然AC了,但时间为2.227s,然后百度了下别人的做法,交了一遍,靠,0.000s,然后看了下思路,原来是dp,在暴力的基础上记忆化 ...

  2. 初识Selenium(四)

    用Selenium实现页面自动化测试 引言 要不要做页面测试自动化的争议由来已久,不做或少做的主要原因是其成本太高,其中一个成本就是自动化脚本的编写和维护,那么有没有办法降低这种成本呢?童战同学在其博 ...

  3. 一个完整的项目中,需要的基本gulp

    一个完整的项目需要使用gulp的多种功能,包括—— (1)加载各种需要的插件 var concat=require('gulp'); var clean=require(''gulp); 等等.需要的 ...

  4. Java 多态,重载,重写

    1.多态(polymorphism): 多态是指程序中定义的引用变量所指向的具体类型和通过该引用变量发出的方法调用在编程时并不确定,而是在程序运行期间才确定,即一个引用变量倒底会指向哪个类的实例对象, ...

  5. Java 8新特性探究(九)跟OOM:Permgen说再见吧

    PermGen space简单介绍 元空间(MetaSpace)一种新的内存空间诞生 PermGen 空间的状况 Metaspace 内存分配模型 Metaspace 容量 Metaspace 垃圾回 ...

  6. 求余区间的求和类问题 离线+线段树 HDU4228

    题目大意:给一个数组a,他的顺序是严格的单调增,然后有如下三个操作 ①加入一个val到a数组里面去,加入的位置就是a[i-1]<val<a[i+1] ②删除一个a[i]=val的值 ③查询 ...

  7. Ubuntu14.04下SP_Flash_Tool_exe_Linux无法烧录

    1,用命令lsusb查看usb信息. 2,vim 20-mm-blacklist-mtk.rules 输入下面内容: ATTRS{idVendor}=="0e8d",ENV{ID_ ...

  8. tcp 的6个控制位

    原文:http://blog.chinaunix.net/uid-26413668-id-3376762.html TCP(Transmission Control Protocol) 传输控制协议 ...

  9. Why attitude is more important than IQ

    原文:http://www.businessinsider.com/why-attitude-is-more-important-than-iq-2015-9?IR=T& LinkedIn I ...

  10. myeclipse中打开java文件中文乱码

    中文乱码肯定是编码与解码不一样导致. 1.如果是平时写代码都没有问题,但是打开其他项目时出现这种问题: window->preferences->General->Content T ...