elasticsearch 2.2+ index.codec: best_compression启用压缩
官方说法,来自https://www.elastic.co/guide/en/elasticsearch/reference/2.2/index-modules.html#_static_index_settings:
index.codecThe default value compresses stored data with LZ4 compression, but this can be set tobest_compression which uses DEFLATE for a higher compression ratio, at the expense of slower stored fields performance.
注意:2.1以下都是实验特性!2.2+才稳定!
Now you can also enable better compression on the cold nodes by setting index.codec: best_compression in theirconfig/elasticsearch.yml file in order to be able to archive more data with the same amount of disk space.
摘自:https://www.elastic.co/blog/store-compression-in-lucene-and-elasticsearch
下面的数据摘自:https://www.elastic.co/blog/elasticsearch-storage-the-true-story-2.0
The test methodology hasn’t changed so you can check out the old blog post or the README in the Github repo for the details.
| Test | String fields | _all | index size /w LZ4 | index size /w DEFLATE | expansion ratio /w LZ4 | expansion ratio /w DEFLATE | Impact of DEFLATE |
| Structured data file. Original file size: 67644119 | |||||||
| 1 | analyzed and not_analyzed | enabled | 63047579 | 53131592 | 0.932 | 0.785 | -0.157 |
| 2 | analyzed and not_analyzed | disabled | 48271433 | 38327106 | 0.713 | 0.566 | -0.206 |
| 3 | not_analyzed | disabled | 38920800 | 29014796 | 0.575 | 0.428 | -0.254 |
| 3b | not_analyzed, except for 'message' field which is retained and analyzed | disabled | 65382872 | 49532858 | 0.966 | 0.732 | -0.242 |
| 4 | not_analyzed, except for 'agent' field which is analyzed | disabled | 43083702 | 32063602 | 0.636 | 0.474 | -0.255 |
| Semi-structured data file. Original file size: 75037027 |
|||||||
| 1 | analyzed and not_analyzed | enabled | 100478376 | 82132782 | 1.339 | 1.094 | -0.182 |
| 2 | analyzed and not_analyzed | disabled | 75238480 | 56911638 | 1.002 | 0.758 | -0.243 |
| 3 | not_analyzed | disabled | 71866672 | 53553561 | 0.957 | 0.713 | -0.254 |
| 3b | not_analyzed, except for 'message' field which is retained and analyzed | disabled | 104638750 | 83824398 | 1.394 | 1.117 | -0.198 |
| 4 | not_analyzed, except for 'agent' field which is analyzed | disabled | 72925624 | 54603882 | 0.971 | 0.727 | -0.251 |
With the standard LZ4-based compression, the indexed data size to raw data size ratio ranged from 0.575 to 1.394. After enabling DEFLATE-based compression using the best_compression index.codec option, the indexed data size to raw data size ratio range came down to 0.429 to 1.117. Enabling the best_compression option resulted in a 15.7% to 25.6% reduction in indexed data size depending on the test parameters.
As you can see, the ratio of index size to raw data size can vary greatly based on your mapping configuration, what fields you decide to create/retain, and the characteristics of the data set itself. We encourage you to run similar tests yourself to determine what the data compression/expansion factor is for your data set and application requirements.
Conclusion
There were many amazing features added to Elasticsearch 2.0 worth considering. As we’ve discussed, two of these new features in particular can reduce the hardware footprint required for an Elasticsearch cluster by 15-25% or more: 1) the addition of a best_compression option and 2) enabling doc_values by default. This allows us to get to compression ratios between 0.429 and 1.117.
elasticsearch 2.2+ index.codec: best_compression启用压缩的更多相关文章
- Oracle 数据库备份启用压缩以及remap
1. Oracle数据库进行备份恢复 客户测试环境, 有时候需要从现场copy到公司, 压缩虽然能够减少部分空间大小,但是copy到虚拟机里面也时浪费很大量的磁盘,所以能够在备份恢复的过程中执行压缩最 ...
- tomcat启用压缩的方式
<Connector port="7070" protocol="HTTP/1.1"connectionTimeout="20000" ...
- 使sqoop能够启用压缩的一些配置
在使用sqoop 将数据库表中数据导入至hdfs时 配置启用压缩 hadoop 的命令 检查本地库支持哪些 bin/hadoop checknative 需要配置native 要编译版本 ...
- HBase启用压缩
1. 压缩算法的比较 算法 压缩比 压缩 解压 GZIP 13.4% 21MB/s 118MB/s LZO 20.5% 135MB/s 410MB/s Snappy/Zippy 22.2% 172MB ...
- Elasticsearch:inverted index,doc_values及source
以后会用到的相关知识:索引中某些字段禁止搜索,排序等操作 当我们学习Elasticsearch时,经常会遇到如下的几个概念: Reverted index doc_values source? 这个几 ...
- logstash 输出到elasticsearch 自动建立index
由于es 单index 所能承受的数据量有限,之前情况是到400w数据300G左右的时候,整个数据的插入会变得特别慢(索引重建)甚至会导致集群之间的通信断开,于是我们采用每天一个index的方法来缓解 ...
- hive启用压缩
<property> <name>hive.exec.compress.intermediate</name> <value>true</valu ...
- kibana无法显示elasticsearch中的index
我是用的logstash将kafka中的数据同步到elasticsearch.logstash和kafka在同一台服务器,elasticsearch在另外的服务器上. 经过排查,是因为我的logsta ...
- ElasticSearch(十一)Elasticsearch清空指定Index/Type数据
POST /index_name/type_name/_delete_by_query?conflicts=proceed { "query": { "match_all ...
随机推荐
- js基础练习--控制多组图片切换
js基础练习题,一个按钮控制两组图片的切换,做这题的时候我忽然想到了将num1.mun2……都存放在一个数组中,根据索引值匹配到对应相应组的图片,这样不管有多少组图片都简单的搞定切换,可惜js基础都没 ...
- Django相关介绍
先认识一下MVC框架 MVC的框架模式,即模型M,视图V和控制器C.他们之间以一种插件似的,松耦合的方式连接在一起. Model(模型)是应用程序中用于处理应用程序数据逻辑的部分. 通常模型对象负责在 ...
- LINQ不包含列表
var query=lista.Where(p=>!listb.Any(g=>p.id==g.id && p.no==g.no))
- C#实现对图片文件的压缩、裁剪操作实例
本文实例讲述了C#对图片文件的压缩.裁剪操作方法,在C#项目开发中非常有实用价值.分享给大家供大家参考.具体如下: 一般在做项目时,对图片的处理,以前都采用在上传时,限制其大小的方式,这样带来诸多不便 ...
- SVN如何切换用户对代码进行操作
在使用svn更新或提交数据时需要输入用户名和密码,在输入框中可以选择是否记录,以便下次操作无需再次输入用户名和密码: 要切换其他用户名时,需要删除已记录用户的数据,在电脑桌面上右击,依次点击菜单项To ...
- sublime Text emmet插件使用手册
转自:http://www.w3cplus.com/tools/emmet-cheat-sheet.html 介绍 Emmet (前身为 Zen Coding) 是一个能大幅度提高前端开发效率的一个工 ...
- 网络管理常用命令(6/14) -netstat命令详解
Netstat 命令用于显示各种网络相关信息,如网络连接,路由表,接口状态 (Interface Statistics),masquerade 连接,多播成员 (Multicast Membershi ...
- 较常用的Math方法及ES6中的扩展
记录下与Math有关的常用方法,如:求最大值.最小值等,或者是保留几位数啥的 1.数据 let floatA = 2.325232; let floatB = 2.3456; let temporar ...
- java类执行顺序
1. 静态初始化块 > 初始化块 > 构造器 2. 父类 > 子类 综合下来顺序就是: 父类静态初始化块和静态成员变量 子类静态初始化块和静态成员变量 父类初始化块和普通成员变量 父 ...
- 《网络攻防》Web安全基础实践
20145224陈颢文 <网络攻防>Web安全基础实践 基础问题回答 SQL注入攻击原理,如何防御: 部分程序员在编写代码的时候,没有对用户输入数据的合法性进行判断,黑客利用这个bug在数 ...