elasticsearch 2.2+ index.codec: best_compression启用压缩
官方说法,来自https://www.elastic.co/guide/en/elasticsearch/reference/2.2/index-modules.html#_static_index_settings:
index.codecThe default value compresses stored data with LZ4 compression, but this can be set tobest_compression which uses DEFLATE for a higher compression ratio, at the expense of slower stored fields performance.
注意:2.1以下都是实验特性!2.2+才稳定!
Now you can also enable better compression on the cold nodes by setting index.codec: best_compression in theirconfig/elasticsearch.yml file in order to be able to archive more data with the same amount of disk space.
摘自:https://www.elastic.co/blog/store-compression-in-lucene-and-elasticsearch
下面的数据摘自:https://www.elastic.co/blog/elasticsearch-storage-the-true-story-2.0
The test methodology hasn’t changed so you can check out the old blog post or the README in the Github repo for the details.
| Test | String fields | _all | index size /w LZ4 | index size /w DEFLATE | expansion ratio /w LZ4 | expansion ratio /w DEFLATE | Impact of DEFLATE |
| Structured data file. Original file size: 67644119 | |||||||
| 1 | analyzed and not_analyzed | enabled | 63047579 | 53131592 | 0.932 | 0.785 | -0.157 |
| 2 | analyzed and not_analyzed | disabled | 48271433 | 38327106 | 0.713 | 0.566 | -0.206 |
| 3 | not_analyzed | disabled | 38920800 | 29014796 | 0.575 | 0.428 | -0.254 |
| 3b | not_analyzed, except for 'message' field which is retained and analyzed | disabled | 65382872 | 49532858 | 0.966 | 0.732 | -0.242 |
| 4 | not_analyzed, except for 'agent' field which is analyzed | disabled | 43083702 | 32063602 | 0.636 | 0.474 | -0.255 |
| Semi-structured data file. Original file size: 75037027 |
|||||||
| 1 | analyzed and not_analyzed | enabled | 100478376 | 82132782 | 1.339 | 1.094 | -0.182 |
| 2 | analyzed and not_analyzed | disabled | 75238480 | 56911638 | 1.002 | 0.758 | -0.243 |
| 3 | not_analyzed | disabled | 71866672 | 53553561 | 0.957 | 0.713 | -0.254 |
| 3b | not_analyzed, except for 'message' field which is retained and analyzed | disabled | 104638750 | 83824398 | 1.394 | 1.117 | -0.198 |
| 4 | not_analyzed, except for 'agent' field which is analyzed | disabled | 72925624 | 54603882 | 0.971 | 0.727 | -0.251 |
With the standard LZ4-based compression, the indexed data size to raw data size ratio ranged from 0.575 to 1.394. After enabling DEFLATE-based compression using the best_compression index.codec option, the indexed data size to raw data size ratio range came down to 0.429 to 1.117. Enabling the best_compression option resulted in a 15.7% to 25.6% reduction in indexed data size depending on the test parameters.
As you can see, the ratio of index size to raw data size can vary greatly based on your mapping configuration, what fields you decide to create/retain, and the characteristics of the data set itself. We encourage you to run similar tests yourself to determine what the data compression/expansion factor is for your data set and application requirements.
Conclusion
There were many amazing features added to Elasticsearch 2.0 worth considering. As we’ve discussed, two of these new features in particular can reduce the hardware footprint required for an Elasticsearch cluster by 15-25% or more: 1) the addition of a best_compression option and 2) enabling doc_values by default. This allows us to get to compression ratios between 0.429 and 1.117.
elasticsearch 2.2+ index.codec: best_compression启用压缩的更多相关文章
- Oracle 数据库备份启用压缩以及remap
1. Oracle数据库进行备份恢复 客户测试环境, 有时候需要从现场copy到公司, 压缩虽然能够减少部分空间大小,但是copy到虚拟机里面也时浪费很大量的磁盘,所以能够在备份恢复的过程中执行压缩最 ...
- tomcat启用压缩的方式
<Connector port="7070" protocol="HTTP/1.1"connectionTimeout="20000" ...
- 使sqoop能够启用压缩的一些配置
在使用sqoop 将数据库表中数据导入至hdfs时 配置启用压缩 hadoop 的命令 检查本地库支持哪些 bin/hadoop checknative 需要配置native 要编译版本 ...
- HBase启用压缩
1. 压缩算法的比较 算法 压缩比 压缩 解压 GZIP 13.4% 21MB/s 118MB/s LZO 20.5% 135MB/s 410MB/s Snappy/Zippy 22.2% 172MB ...
- Elasticsearch:inverted index,doc_values及source
以后会用到的相关知识:索引中某些字段禁止搜索,排序等操作 当我们学习Elasticsearch时,经常会遇到如下的几个概念: Reverted index doc_values source? 这个几 ...
- logstash 输出到elasticsearch 自动建立index
由于es 单index 所能承受的数据量有限,之前情况是到400w数据300G左右的时候,整个数据的插入会变得特别慢(索引重建)甚至会导致集群之间的通信断开,于是我们采用每天一个index的方法来缓解 ...
- hive启用压缩
<property> <name>hive.exec.compress.intermediate</name> <value>true</valu ...
- kibana无法显示elasticsearch中的index
我是用的logstash将kafka中的数据同步到elasticsearch.logstash和kafka在同一台服务器,elasticsearch在另外的服务器上. 经过排查,是因为我的logsta ...
- ElasticSearch(十一)Elasticsearch清空指定Index/Type数据
POST /index_name/type_name/_delete_by_query?conflicts=proceed { "query": { "match_all ...
随机推荐
- JIRA 模块 bug管理工具
from jira import JIRA #导入jira jira=JIRA(server='http://127.0.0.1:8080', basic_auth=('name', 'passwor ...
- Linux指令od和hexdump
Linux指令:od (octal dump) 示例用法:od -c hello Linux指令:od od命令用户通常使用od命令查看特殊格式的文件内容.通过指定该命令的不同选项可以以十进制.八进制 ...
- linux网络基础设置 以及 软件安装
ifconfig #查看所有已激活的网卡信息 临时配置 #yum install net-tools -y 默认ifconfig是没有安装的,可能需要安装 ifconfig eth0 #查看单独一块网 ...
- redis 学习笔记(二)
1. 在centos下安装g++,如果输入 yum install g++,那么将会提示找不到g++.因为在centos下g++安装包名字叫做:gcc-c++ 所以应该输入 yum install g ...
- 从Redux源码探索最佳实践
前言 Redux 已经历了几个年头,很多 React 技术栈开发者选用它,我也是其中一员.期间看过数次源码,从最开始为了弄清楚某一部分运行方式来解决一些 Bug,到后来看源码解答我的一些假设性疑问,到 ...
- EasyUI:datagrid数据汇总
EasyUI:datagrid数据汇总 js代码: var total=0;//全局变量 $(function(){ $('#tablebudgetdata').datagrid({ title:' ...
- Stalstack 连接管理配置
Stalstack 连接管理配置 注:master端,minion端,配置完成. Saltstack master 测试管理端minion链接状态. salt-key Accepted Keys: ...
- 课堂测试Mysort
课上没有做出来的原因 因为自己平时很少动手敲代码,所以在自己写代码的时候往往会比较慢,而且容易出现一些低级错误,再加上基础没有打牢,对于老师课上所讲的知识不能及时的理解消化,所以可能以后的课上测试都要 ...
- window下安裝redis服務
一.下载windows版本的Redis github下载地址:https://github.com/MicrosoftArchive/redis/releases/tag/win-3.2.100 ...
- 网络数据包头部在linux网络协议栈中的变化
接收时使用skb_pull()不断去掉各层协议头部:发送时使用skb_push()不断添加各层协议头部. 先说说接收: * eth_type_trans - determine the packet' ...