Lucene dvd dvm文件便是docvalues文件——就是针对field value的列存储
public final class Lucene54DocValuesFormat
extends DocValuesFormat
Encodes the five per-document value types (Numeric,Binary,Sorted,SortedSet,SortedNumeric) with these strategies:
- Delta-compressed: per-document integers written as deltas from the minimum value, compressed with bitpacking. For more information, see
DirectWriter. - Table-compressed: when the number of unique values is very small (< 256), and when there are unused "gaps" in the range of values used (such as
SmallFloat), a lookup table is written instead. Each per-document entry is instead the ordinal to this table, and those ordinals are compressed with bitpacking (DirectWriter). - GCD-compressed: when all numbers share a common divisor, such as dates, the greatest common denominator (GCD) is computed, and quotients are stored using Delta-compressed Numerics.
- Monotonic-compressed: when all numbers are monotonically increasing offsets, they are written as blocks of bitpacked integers, encoding the deviation from the expected delta.
- Const-compressed: when there is only one possible non-missing value, only the missing bitset is encoded.
- Sparse-compressed: only documents with a value are stored, and lookups are performed using binary search.
- Fixed-width Binary: one large concatenated byte[] is written, along with the fixed length. Each document's value can be addressed directly with multiplication (
docID * length). - Variable-width Binary: one large concatenated byte[] is written, along with end addresses for each document. The addresses are written as Monotonic-compressed numerics.
- Prefix-compressed Binary: values are written in chunks of 16, with the first value written completely and other values sharing prefixes. chunk addresses are written as Monotonic-compressed numerics. A reverse lookup index is written from a portion of every 1024th term.
- Sorted: a mapping of ordinals to deduplicated terms is written as Binary, along with the per-document ordinals written using one of the numeric strategies above.
- Single: if all documents have 0 or 1 value, then data are written like SORTED.
- SortedSet table: when there are few unique sets of values (< 256) then each set is assigned an id, a lookup table is written and the mapping from document to set id is written using the numeric strategies above.
- SortedSet: a mapping of ordinals to deduplicated terms is written as Binary, an ordinal list and per-document index into this list are written using the numeric strategies above.
- Single: if all documents have 0 or 1 value, then data are written like NUMERIC.
- SortedSet table: when there are few unique sets of values (< 256) then each set is assigned an id, a lookup table is written and the mapping from document to set id is written using the numeric strategies above.
- SortedNumeric: a value list and per-document index into this list are written using the numeric strategies above.
Files:
- .dvd: DocValues data
- .dvm: DocValues metadata
转自:http://lucene.apache.org/core/6_4_2/core/org/apache/lucene/codecs/lucene54/Lucene54DocValuesFormat.html
可以看到占用空间非常小!!!
du -sm elasticsearch/nodes/0/indices/hec_test2/0/index/*
299 elasticsearch/nodes/0/indices/hec_test2/0/index/_e.fdt
1 elasticsearch/nodes/0/indices/hec_test2/0/index/_e.fdx
1 elasticsearch/nodes/0/indices/hec_test2/0/index/_e.fnm
148 elasticsearch/nodes/0/indices/hec_test2/0/index/_e_Lucene50_0.doc
130 elasticsearch/nodes/0/indices/hec_test2/0/index/_e_Lucene50_0.tim
5 elasticsearch/nodes/0/indices/hec_test2/0/index/_e_Lucene50_0.tip
1 elasticsearch/nodes/0/indices/hec_test2/0/index/_e_Lucene54_0.dvd
1 elasticsearch/nodes/0/indices/hec_test2/0/index/_e_Lucene54_0.dvm
1 elasticsearch/nodes/0/indices/hec_test2/0/index/_e.si
1 elasticsearch/nodes/0/indices/hec_test2/0/index/segments_7
0 elasticsearch/nodes/0/indices/hec_test2/0/index/write.lock
Lucene dvd dvm文件便是docvalues文件——就是针对field value的列存储的更多相关文章
- 腾讯Hermes设计概要——数据分析用的是列存储,词典文件前缀压缩,倒排文件递增id、变长压缩、依然是跳表-本质是lucene啊
转自:http://data.qq.com/article?id=817 三.Hermes设计概要 架构描述 系统核心进程均采用分散化设计,根据业务发展需求,可随意扩缩容机器; 周期性数据直接通过td ...
- FileShare文件读写锁解决“文件XXX正由另一进程使用,因此该进程无法访问此文件”(转)
开发过程中,我们往往需要大量与文件交互,读文件,写文件已成家常便饭,本地运行完美,但一上到投产环境,往往会出现很多令人措手不及的意外,或开发中的烦恼,因此,我对普通的C#文件操作做了一次总结,问题大部 ...
- .c和.h文件的区别(头文件与之实现文件的的关系~ )
.c和.h文件的区别 一个简单的问题:.c和.h文件的区别 学了几个月的C语言,反而觉得越来越不懂了.同样是子程序,可以定义在.c文件中,也可以定义在.h文件中,那这两个文件到底在用法上有什么区别呢 ...
- [转载]webarchive文件转换成htm文件
原文地址:webarchive文件转换成htm文件作者:xhbaxf Mac OS X系统带有文件转换功能,可以把webarchive文件变成html文件.方法是: Step 1: 建立一个文件夹 ...
- 怎样将word文件转化为Latex文件:word-to-latex-2.56具体解释
首先推荐大家读一读这篇博文:http://blog.csdn.net/ibingow/article/details/8613556 --------------------------------- ...
- PHP上传文件参考配置大文件上传
PHP用超级全局变量数组$_FILES来记录文件上传相关信息的. 1.file_uploads=on/off 是否允许通过http方式上传文件 2.max_execution_time=30 允许脚本 ...
- R8—批量生成文件夹,批量读取文件夹名称+R文件管理系统操作函数
一. 批量生成文件夹,批量读取文件夹名称 今日,工作中遇到这样一个问题:boss给我们提供了200多家公司的ID代码(如6007.7920等),需要根据这些ID号去搜索下载新闻,从而将下载到的新闻存到 ...
- Python:文件操作总结1——文件基本操作
一.文件的操作流程 1.打开文件,得到文件句柄并赋值给一个变量 2.通过句柄对文件进行操作 3.关闭文件 二.文件的打开与关闭 A.文件的打开——open函数 语法:open(file[,mode[, ...
- linux下压缩成zip文件解压zip文件
linux zip命令的基本用法是: zip [参数] [打包后的文件名] [打包的目录路径] linux zip命令参数列表: -a 将文件转成ASCII模式 -F 尝试修复损坏 ...
随机推荐
- CSS选择器定位的使用
CSS 可以比较灵活选择控件的任意属性,一般情况下定位速度要比XPath 快,但对于初学者来说比较难以学习使用,下面我们就详细的介绍CSS 的语法与使用.一.CSS 选择器的常见语法: 例如下面一段代 ...
- 大数据学习——java操作hdfs环境搭建以及环境测试
1 新建一个maven项目 打印根目录下的文件的名字 添加pom依赖 pom.xml <?xml version="1.0" encoding="UTF-8&quo ...
- 性能测试工具 - ab 简单应用
之前知道一般网站性能可以通过 LoadRunner, JMeter, QTP 等相应的软件进行测试, 印象中本科学习 “软件测试” 这门课程时安装并使用过, LoadRunner等不是一个小软件, 安 ...
- 虚拟机(Virtual Machine)和容器(Container)的对比
目前云计算平台常用的虚拟化技术有虚拟机(Virtual Machine)和容器(Container)两种.虚拟机已经是比较成熟的技术,容器技术作为下一代虚拟化技术,国内的各厂商应用还不广,但似乎其代表 ...
- BZOJ4553 - [TJOI2016]序列
Portal Description 给出一个\(n(n\leq10^5)\)个数的数列\(\{a_n\}\)和\(m(m\leq10^5)\)个形如\((x,y)\)的变化,表示\(a_x\)可以变 ...
- 亚瑟王(bzoj 4008)
Description 小 K 不慎被 LL 邪教洗脑了,洗脑程度深到他甚至想要从亚瑟王邪教中脱坑. 他决定,在脱坑之前,最后再来打一盘亚瑟王.既然是最后一战,就一定要打得漂 亮.众所周知,亚瑟王是一 ...
- 《TCP/IP详解卷1:协议》——第5章 RARP:逆地址解析协议(转载)
1.引言 具有本地磁盘的系统引导时,一般是从磁盘上的配置文件中读取IP地址.但是无盘机,如X终端或无盘工作站,则需要采用其他方法来获得IP地址. 网络上的每个系统都具有唯一的硬件地址,它是由网络接口生 ...
- “亚信科技杯”南邮第七届大学生程序设计竞赛之网络预赛 A noj 2073 FFF [ 二分图最大权匹配 || 最大费用最大流 ]
传送门 FFF 时间限制(普通/Java) : 1000 MS/ 3000 MS 运行内存限制 : 65536 KByte总提交 : 145 测试通过 : 13 ...
- poj3648,2-sat求解
关键是题意的理解,英语,有时候明明每个字都认识,但是还是理解错误!哎!!悲剧啊!题意啊! 这是关键!开始误理解为n对新娘郞,非也!是只有一对,其他是夫妇,理解后就好做了,建立图 是关键,怎么转化关系, ...
- iOS url带中文下载时 报错解决方法
问题描述:下载文件时, 请求带中文的URL的资源时,比如:http://s237.sznews.com/pic/2010/11/23/e4fa5794926548ac953a8a525a23b6f2/ ...