lucene Index Store TermVector 说明

最新的lucene 3.0的field是这样的:

Field options for indexing
Index.ANALYZED – use the analyzer to break the Field’s value into a stream of separate tokens and make each token searchable.
Index.NOT_ANALYZED – do index the field, but do not analyze the String. Instead, treat the Field’s entire value as a single token and make that token searchable.
Index.ANALYZED_NO_NORMS – an advanced variant of Index.ANALYZED which does not store norms information in the index.
Index.NOT_ANALYZED_NO_NORMS – just like , but also do not store Norms.
Index.NO – don’t make this field’s value available for searching at all.

Field options for storing fields
Store.YES — store the value. When the value is stored, the original String in its entirety is recorded in the index and may be retrieved by an IndexReader.
Store.NO – do not store the value. This is often used along with Index.ANALYZED to index a large text field that doesn’t need to be retrieved in its original form.

Field options for term vectors
TermVector.YES – record the unique terms that occurred, and their counts, in each document, but do not store any positions or offsets information.
TermVector.WITH_POSITIONS – record the unique terms and their counts, and also the positions of each occurrence of every term, but no offsets.
TermVector.WITH_OFFSETS – record the unique terms and their counts, with the offsets (start & end character position) of each occurrence of every term, but no positions.
TermVector.WITH_POSITIONS_OFFSETS – store unique terms and their counts, along with positions and offsets.
TermVector.NO – do not store any term vector information.
If Index.NO is specified for a field, then you must also specify TermVector.NO.

具一些例子来说明这些怎么用
Index                   Store TermVector                                Example usage
NOT_ANALYZED     YES         NO                                        Identifiers (file names, primary keys),
                                                                                         Telephone and Social Security
                                                                                         numbers, URLs, personal names, Dates
ANALYZED              YES WITH_POSITIONS_OFFSETS    Document title, document abstract
ANALYZED              NO      WITH_POSITIONS_OFFSETS    Document body
NO                         YES        NO              Document type, database primary key
NOT_ANALYZED     NO         NO                 Hidden keywords

When Lucene builds the inverted index, by default it stores all necessary information to implement the Vector Space model. This model requires the count of every term that occurred in the document, as well as the positions of each occurrence (needed for phrase searches).
You can tell Lucene to skip indexing the term frequency and positions by calling:
Field.setOmitTermFreqAndPositions(true)

摘自：http://www.cnblogs.com/fxjwind/archive/2011/07/04/2097705.html

lucene Index Store TermVector 说明的更多相关文章

ElasticSearch 2 (10) - 在ElasticSearch之下（深入理解Shard和Lucene Index）
摘要从底层介绍ElasticSearch Shard的内部原理,以及回答为什么使用ElasticSearch有必要了解Lucene的内部工作方式? 了解ElasticSearch API的代价构建 ...
Lucene——Field.Store（存储域选项）及Field.Index（索引选项）
Field.Store.YES或者NO(存储域选项) 设置为YES表示或把这个域中的内容完全存储到文件中,方便进行文本的还原设置为NO表示把这个域的内容不存储到文件中,但是可以被索引,此时内容无法完 ...
Lucene Index Search
转发自: https://my.oschina.net/u/3777556/blog/1647031 什么是Lucene?? Lucene 是 apache 软件基金会发布的一个开放源代码的全文检索 ...
使用Lucene.Net实现全文检索
使用Lucene.Net实现全文检索目录一 Lucene.Net概述二分词三索引四搜索五实践中的问题一 Lucene.Net概述 Lucene.Net是一个C#开发的开源全文索引 ...
Lucene教程具体解释
(建立索引)] )中生成的索引文件的存放地址.详细步骤简单介绍例如以下: 1.创建Directory对象,索引目录 2.创建IndexSearch对象,建立查询(參数是Directory对象) 3.创 ...
lucene 中关于Store.YES 关于Store.NO的解释
总算搞明白 lucene 中关于Store.YES 关于Store.NO的解释了一直对Lucene Store.YES不太理解,网上多数的说法是存储字段,NO为不存储. 这样的解释有点郁闷:字面意 ...
解决org.apache.lucene.store.AlreadyClosedException: this Directory is closed
在Lucene中,关闭一个IndexWriter时抛出AlreadyClosedException异常: org.apache.lucene.store.AlreadyClosedException: ...
Lucene教程（转）
Lucene教程 1 lucene简介1.1 什么是lucene Lucene是一个全文搜索框架,而不是应用产品.因此它并不像www.baidu.com 或者google Desktop那么拿来 ...
Lucene.net站内搜索—5、搜索引擎第一版实现
目录 Lucene.net站内搜索—1.SEO优化 Lucene.net站内搜索—2.Lucene.Net简介和分词Lucene.net站内搜索—3.最简单搜索引擎代码Lucene.net站内搜索—4 ...

随机推荐

Post Content_Length exceeds the limit
2017.12,公司市场专员反馈我在公司开发与维护的iOS包内审系统在上传ipa包文件的时候报错了.经过调试发现原来是因为上传的文件太大导致报错(由下图可知,接收方允许的最大请求内容为128M,但我们 ...
HDU 5676 ztr loves lucky numbers【DFS】
题目链接; http://acm.hdu.edu.cn/showproblem.php?pid=5676 题意: 由4和7组成的且4和7出现次数相同的数称为幸运数字,给定n,求不大于n的最大幸运数字. ...
codevs——2152 滑雪
2152 滑雪时间限制: 1 s 空间限制: 32000 KB 题目等级 : 黄金 Gold 题解题目描述 Description trs喜欢滑雪.他来到了一个滑雪场,这个滑雪场 ...
ByteArrayInputStream的作用，和BufferedOutputStream 的区别
个人好奇ByteArrayInputStream,到底是有什么用于是百度了一些资料整合了下,********这两个类对于要创建临时性文件的程序以及网络数据的传输.数据压缩后的传输等可以提高运行的的效 ...
scp操作实例
scp 可用于文件的上传与下载,默认端口号是22,通常我们为了安全起见会将默认端口号修改了,而不去直接使用默认的22端口,以下我们以8888端口为例目标机器 A :192.168.10.30 目标机 ...
时序数据库TSDB简单了解
由于项目需要,简单看来下时序数据库: 时序数据库是针对大量数据写入.主要用于记录时序数据的,使用于监控记录的场景:写多读少场景: 什么是时序数据.时序数据是基于时间的一系列的数据.在有时间的坐标中将这 ...
第24章、OnLongClickListener长按事件（从零开始学Android）
在Android App应用中,OnLongClick事件表示长按2秒以上触发的事件,本章我们通过长按图像设置为墙纸来理解其具体用法. 知识点:OnLongClickListener OnLongCl ...
C# UserControl 判断是否是设计模式中
In Windows Forms application, we can use Control.IsInDesignMode or LicenseManager.UsageMode == Licen ...
关于one-hot encoding思考
Many learning algorithms either learn a single weight per feature, or they use distances between sam ...
MBProgressHUD 显示方向异常
一直在iphone上使用MBProgressHUD做提示信息视图.一直都没有什么问题,但用在ipad上使用时.却有时会出现显示方向不正常.如ipad屏幕是横的,但当MBProgressHUD出现时却依 ...

lucene Index Store TermVector 说明

lucene Index Store TermVector 说明的更多相关文章

随机推荐

热门专题