官方说法,来自https://www.elastic.co/guide/en/elasticsearch/reference/2.2/index-modules.html#_static_index_settings:

index.codecThe default value compresses stored data with LZ4 compression, but this can be set tobest_compression which uses DEFLATE for a higher compression ratio, at the expense of slower stored fields performance.

注意:2.1以下都是实验特性!2.2+才稳定!

Now you can also enable better compression on the cold nodes by setting index.codec: best_compression in theirconfig/elasticsearch.yml file in order to be able to archive more data with the same amount of disk space.

摘自:https://www.elastic.co/blog/store-compression-in-lucene-and-elasticsearch

下面的数据摘自:https://www.elastic.co/blog/elasticsearch-storage-the-true-story-2.0

The test methodology hasn’t changed so you can check out the old blog post or the README in the Github repo for the details.

Test String fields _all index size /w LZ4 index size /w DEFLATE expansion ratio /w LZ4 expansion ratio /w DEFLATE Impact of DEFLATE
Structured data file. Original file size: 67644119              
1 analyzed and not_analyzed  enabled 63047579 53131592 0.932 0.785 -0.157
2 analyzed and not_analyzed  disabled 48271433 38327106 0.713 0.566 -0.206
3 not_analyzed disabled 38920800 29014796 0.575 0.428 -0.254
3b not_analyzed, except for 'message' field which is retained and analyzed disabled 65382872 49532858 0.966 0.732 -0.242
4 not_analyzed, except for 'agent' field which is analyzed disabled 43083702 32063602 0.636 0.474 -0.255
Semi-structured data file.
Original file size: 75037027
             
1 analyzed and not_analyzed  enabled 100478376 82132782 1.339 1.094 -0.182
2 analyzed and not_analyzed  disabled 75238480 56911638 1.002 0.758 -0.243
3 not_analyzed disabled 71866672 53553561 0.957 0.713 -0.254
3b not_analyzed, except for 'message' field which is retained and analyzed disabled 104638750 83824398 1.394 1.117 -0.198
4 not_analyzed, except for 'agent' field which is analyzed disabled 72925624 54603882 0.971 0.727 -0.251

With the standard LZ4-based compression, the indexed data size to raw data size ratio ranged from 0.575 to 1.394. After enabling DEFLATE-based compression using the best_compression index.codec option, the indexed data size to raw data size ratio range came down to 0.429 to 1.117. Enabling the best_compression option resulted in a 15.7% to 25.6% reduction in indexed data size depending on the test parameters.

As you can see, the ratio of index size to raw data size can vary greatly based on your mapping configuration, what fields you decide to create/retain, and the characteristics of the data set itself. We encourage you to run similar tests yourself to determine what the data compression/expansion factor is for your data set and application requirements.

Conclusion

There were many amazing features added to Elasticsearch 2.0 worth considering. As we’ve discussed, two of these new features in particular can reduce the hardware footprint required for an Elasticsearch cluster by 15-25% or more: 1) the addition of a best_compression option and 2) enabling doc_values by default. This allows us to get to compression ratios between 0.429 and 1.117.

elasticsearch 2.2+ index.codec: best_compression启用压缩的更多相关文章

  1. Oracle 数据库备份启用压缩以及remap

    1. Oracle数据库进行备份恢复 客户测试环境, 有时候需要从现场copy到公司, 压缩虽然能够减少部分空间大小,但是copy到虚拟机里面也时浪费很大量的磁盘,所以能够在备份恢复的过程中执行压缩最 ...

  2. tomcat启用压缩的方式

    <Connector port="7070" protocol="HTTP/1.1"connectionTimeout="20000" ...

  3. 使sqoop能够启用压缩的一些配置

    在使用sqoop 将数据库表中数据导入至hdfs时 配置启用压缩 hadoop 的命令    检查本地库支持哪些  bin/hadoop checknative 需要配置native    要编译版本 ...

  4. HBase启用压缩

    1. 压缩算法的比较 算法 压缩比 压缩 解压 GZIP 13.4% 21MB/s 118MB/s LZO 20.5% 135MB/s 410MB/s Snappy/Zippy 22.2% 172MB ...

  5. Elasticsearch:inverted index,doc_values及source

    以后会用到的相关知识:索引中某些字段禁止搜索,排序等操作 当我们学习Elasticsearch时,经常会遇到如下的几个概念: Reverted index doc_values source? 这个几 ...

  6. logstash 输出到elasticsearch 自动建立index

    由于es 单index 所能承受的数据量有限,之前情况是到400w数据300G左右的时候,整个数据的插入会变得特别慢(索引重建)甚至会导致集群之间的通信断开,于是我们采用每天一个index的方法来缓解 ...

  7. hive启用压缩

    <property> <name>hive.exec.compress.intermediate</name> <value>true</valu ...

  8. kibana无法显示elasticsearch中的index

    我是用的logstash将kafka中的数据同步到elasticsearch.logstash和kafka在同一台服务器,elasticsearch在另外的服务器上. 经过排查,是因为我的logsta ...

  9. ElasticSearch(十一)Elasticsearch清空指定Index/Type数据

    POST /index_name/type_name/_delete_by_query?conflicts=proceed { "query": { "match_all ...

随机推荐

  1. IOS研究之IOS7四种后台机制

     IOS 7中,实际上APP拥有四种后台模式.不管是哪一种后台机制,均须要利用苹果给予的对应后台接口实现.IOS7系统中,开发人员能够灵活利用多种后台接口(API)实现更加智能的应用操作. 对获取 ...

  2. 使用Xcode改动iOS项目project名和路径名

    对,好.错.改正. ------ 前言 系统 10.9 开发平台 xcode 5.0 旧project名 MyProject-iPad 改动之后 新project名 FjSk-iPad 点击项目,进入 ...

  3. 008-查看JVM参数及值的命令行工具

    1. HotSpot vm中的各个globals.hpp文件  查看jvm初始的默认值及参数 globals.hpp globals_extension.hpp c1_globals.hpp c1_g ...

  4. vue自定义全局和局部指令

    一.介绍 1.除了核心功能默认内置的指令 (v-model 和 v-show),Vue 也允许注册自定义指令. 2.自定义指令的分类       1.全局指令 2.局部指令 3.自定义全局指令格式 V ...

  5. Django-进阶之路--信号

    Model 到目前为止,当我们的程序涉及到数据库相关操作时,我们一般都会这么搞: 创建数据库,设计表结构和字段 使用 MySQLdb 来连接数据库,并编写数据访问层代码 业务逻辑层去调用数据访问层执行 ...

  6. 面向对象封装 classmethod和staticmethod方法

    接口类 接口类:是规范子类的一个模板,只要接口类中定义的,就应该在子类中实现接口类不能被实例化,它只能被继承支持多继承接口隔离原则:使用多个专门的接口,而不使用单一的总接口.即客户端不应该依赖那些不需 ...

  7. xpath(待补充)

    from lxml import etree html=""" <div> <ul> <li>1</li> <li ...

  8. Node.js API学习笔记(一)

    此文章已经发表于本人博客. Terminal(终端) 说起这个使用过linux系统的兄台一般都会知道的,本人理解:类似Putty这些ssh工具通过 软件来实现远程控制主机,对于我们使用者来说,它会显示 ...

  9. BZOJ 5312: 冒险

    首先我们考虑,对于And 和 Or 操作,对于操作位上只有And 0 和 Or 1 是有效果的. 我们注意到如果区间内需要改动的操作位上的数字都相同,那么是可以区间取与以及区间取或的. 那其实可以维护 ...

  10. Java 基础总结(一)

    本文参见:http://www.cnblogs.com/dolphin0520/category/361055.html 1. String,StringBuffer,StringBuilder 1) ...