elasticsearch 2.2+ index.codec: best_compression启用压缩
官方说法,来自https://www.elastic.co/guide/en/elasticsearch/reference/2.2/index-modules.html#_static_index_settings:
index.codecThe default value compresses stored data with LZ4 compression, but this can be set tobest_compression which uses DEFLATE for a higher compression ratio, at the expense of slower stored fields performance.
注意:2.1以下都是实验特性!2.2+才稳定!
Now you can also enable better compression on the cold nodes by setting index.codec: best_compression in theirconfig/elasticsearch.yml file in order to be able to archive more data with the same amount of disk space.
摘自:https://www.elastic.co/blog/store-compression-in-lucene-and-elasticsearch
下面的数据摘自:https://www.elastic.co/blog/elasticsearch-storage-the-true-story-2.0
The test methodology hasn’t changed so you can check out the old blog post or the README in the Github repo for the details.
| Test | String fields | _all | index size /w LZ4 | index size /w DEFLATE | expansion ratio /w LZ4 | expansion ratio /w DEFLATE | Impact of DEFLATE |
| Structured data file. Original file size: 67644119 | |||||||
| 1 | analyzed and not_analyzed | enabled | 63047579 | 53131592 | 0.932 | 0.785 | -0.157 |
| 2 | analyzed and not_analyzed | disabled | 48271433 | 38327106 | 0.713 | 0.566 | -0.206 |
| 3 | not_analyzed | disabled | 38920800 | 29014796 | 0.575 | 0.428 | -0.254 |
| 3b | not_analyzed, except for 'message' field which is retained and analyzed | disabled | 65382872 | 49532858 | 0.966 | 0.732 | -0.242 |
| 4 | not_analyzed, except for 'agent' field which is analyzed | disabled | 43083702 | 32063602 | 0.636 | 0.474 | -0.255 |
| Semi-structured data file. Original file size: 75037027 |
|||||||
| 1 | analyzed and not_analyzed | enabled | 100478376 | 82132782 | 1.339 | 1.094 | -0.182 |
| 2 | analyzed and not_analyzed | disabled | 75238480 | 56911638 | 1.002 | 0.758 | -0.243 |
| 3 | not_analyzed | disabled | 71866672 | 53553561 | 0.957 | 0.713 | -0.254 |
| 3b | not_analyzed, except for 'message' field which is retained and analyzed | disabled | 104638750 | 83824398 | 1.394 | 1.117 | -0.198 |
| 4 | not_analyzed, except for 'agent' field which is analyzed | disabled | 72925624 | 54603882 | 0.971 | 0.727 | -0.251 |
With the standard LZ4-based compression, the indexed data size to raw data size ratio ranged from 0.575 to 1.394. After enabling DEFLATE-based compression using the best_compression index.codec option, the indexed data size to raw data size ratio range came down to 0.429 to 1.117. Enabling the best_compression option resulted in a 15.7% to 25.6% reduction in indexed data size depending on the test parameters.
As you can see, the ratio of index size to raw data size can vary greatly based on your mapping configuration, what fields you decide to create/retain, and the characteristics of the data set itself. We encourage you to run similar tests yourself to determine what the data compression/expansion factor is for your data set and application requirements.
Conclusion
There were many amazing features added to Elasticsearch 2.0 worth considering. As we’ve discussed, two of these new features in particular can reduce the hardware footprint required for an Elasticsearch cluster by 15-25% or more: 1) the addition of a best_compression option and 2) enabling doc_values by default. This allows us to get to compression ratios between 0.429 and 1.117.
elasticsearch 2.2+ index.codec: best_compression启用压缩的更多相关文章
- Oracle 数据库备份启用压缩以及remap
1. Oracle数据库进行备份恢复 客户测试环境, 有时候需要从现场copy到公司, 压缩虽然能够减少部分空间大小,但是copy到虚拟机里面也时浪费很大量的磁盘,所以能够在备份恢复的过程中执行压缩最 ...
- tomcat启用压缩的方式
<Connector port="7070" protocol="HTTP/1.1"connectionTimeout="20000" ...
- 使sqoop能够启用压缩的一些配置
在使用sqoop 将数据库表中数据导入至hdfs时 配置启用压缩 hadoop 的命令 检查本地库支持哪些 bin/hadoop checknative 需要配置native 要编译版本 ...
- HBase启用压缩
1. 压缩算法的比较 算法 压缩比 压缩 解压 GZIP 13.4% 21MB/s 118MB/s LZO 20.5% 135MB/s 410MB/s Snappy/Zippy 22.2% 172MB ...
- Elasticsearch:inverted index,doc_values及source
以后会用到的相关知识:索引中某些字段禁止搜索,排序等操作 当我们学习Elasticsearch时,经常会遇到如下的几个概念: Reverted index doc_values source? 这个几 ...
- logstash 输出到elasticsearch 自动建立index
由于es 单index 所能承受的数据量有限,之前情况是到400w数据300G左右的时候,整个数据的插入会变得特别慢(索引重建)甚至会导致集群之间的通信断开,于是我们采用每天一个index的方法来缓解 ...
- hive启用压缩
<property> <name>hive.exec.compress.intermediate</name> <value>true</valu ...
- kibana无法显示elasticsearch中的index
我是用的logstash将kafka中的数据同步到elasticsearch.logstash和kafka在同一台服务器,elasticsearch在另外的服务器上. 经过排查,是因为我的logsta ...
- ElasticSearch(十一)Elasticsearch清空指定Index/Type数据
POST /index_name/type_name/_delete_by_query?conflicts=proceed { "query": { "match_all ...
随机推荐
- C#连接EXCEL和ACCESS字符串2003及2007版
97-2003版本 EXCEL Provider=Microsoft.Jet.OLEDB.4.0;Data Source=文件位置;ExtendedProperties=Excel 8.0;HDR=Y ...
- 011-JDK可视化监控工具-Jstat
一.概述 Jstat 是JDK自带的一个轻量级小工具.全称“Java Virtual Machine statistics monitoring tool”,它位于java的bin目录下,主要利用JV ...
- HDFS各个进程存储在磁盘上的数据含义和注意事项
本文地址:http://www.cnblogs.com/qiaoyihang/p/6293402.html (一)Namenode的目录结构 HDFS进行初次格式化之后将会在$dfs.namenode ...
- pc端用微信扫一扫实现微信第三方登陆
官方文档链接 第一步:获取AppID AppSecret (微信开发平台申请PC端微信登陆) 第二步:生成扫描二维码,获取code https://open.weixin.qq.com/conn ...
- windows如何安装mysql
参考一下网址,已测试可用 https://www.cnblogs.com/reyinever/p/8551977.html
- Linux命令——ln命令创建和删除软、硬链接(6/29)
ln命令用来为文件创建链接,连接类型分为硬链接和符号链接两种,默认的连接类型是硬连接.如果要创建符号连接必须使用"-s"选项. 用法: ln [options] sou ...
- 【leetcode刷题笔记】Rotate Image
You are given an n x n 2D matrix representing an image. Rotate the image by 90 degrees (clockwise). ...
- Linux内核参数之rp_filter
一.rp_filter参数介绍 rp_filter参数用于控制系统是否开启对数据包源地址的校验. 首先看一下Linux内核文档documentation/networking/ip-sysctl.tx ...
- SVN使用—高级用法
一.SVN分支 Branch 选项会给开发者创建出另外一条线路.当有人希望开发进程分开成两条不同的线路时,这个选项会非常有用. 情景: 比如项目 demo 下有两个小组,svn 下有一个 trunk ...
- Zabbix Windos mysql 监控脚本
说明:判断mysql主进程是否关闭,如果关闭则返回0 创建文件:MySQL-ping.vbs Set objFS = CreateObject("Scripting.FileSystemOb ...