利用Boost影响Lucene查询结果的排序

转自：http://catastiger.iteye.com/blog/803796

前提:不对结果做sort操作.
在搜索中,并不是所有的Document和Fields都是平等的.有些技术会要求到对其Doucment或者Fields的权值改变,默认值为:1.0F,以上需求都是通过改变Document的boost因子来改变的. 下面是通过lucene3.0,IKAnalyzer
1.通过设置doc boost改变排序结果

/**
* 设置DOC boost 值影响查询排序结果
* @throws Exception
*/
public void testBoost1() throws Exception{
System.out.println("设置DOC boost 值影响查询排序结果");
RAMDirectory ramDir = new RAMDirectory();
Analyzer analyzer = new IKAnalyzer();
IndexWriter iw = new IndexWriter(ramDir, analyzer, true ,IndexWriter.MaxFieldLength.LIMITED);
String[] nameList = { "you are my friend", "a are my wife", "I love you" };
String[] addList = { "b", "you are my wife", "c" };
String[] fileList = { "1", "2", "3" };
for (int i = 0; i < nameList.length; i++){
Document doc = new Document();
doc.add(new Field("name", nameList[i], Field.Store.YES, Field.Index.ANALYZED));
doc.add(new Field("file", fileList[i], Field.Store.YES, Field.Index.ANALYZED));
doc.add(new Field("address", addList[i], Field.Store.YES, Field.Index.ANALYZED));
if (i == 2) {
doc.setBoost(2.0f);
}
// 这里设置了第三个文档优先级最高，所以在搜索出来的结果中，该文档排在最前
iw.addDocument(doc);
}
iw.close();
IndexSearcher _searcher = new IndexSearcher(ramDir);
String[] fields =new String[]{"name","address"};
Query query=IKQueryParser.parseMultiField(fields, "you");
TopDocs topDocs = _searcher.search(query,_searcher.maxDoc());
ScoreDoc[] hits = topDocs.scoreDocs;
for (int i = 0; i < hits.length; i++) {
Document doc = _searcher.doc(hits[i].doc);
System.out.println("name:"+doc.get("name"));
System.out.println("file:"+doc.get("file"));
}
_searcher.close();
}

if (i == 2) { doc.setBoost(2.0f); }这样I love you 将先输出，
2.通过设置query 影响排序

/**
* 设置query boost值影响排序结果,如果有排序sort，则完全按照sort结果进行
* @throws Exception
*/
public void testBoost2() throws Exception{
System.out.println("设置query boost值影响排序结果");
RAMDirectory ramDir = new RAMDirectory();
Analyzer analyzer = new IKAnalyzer();
IndexWriter iw = new IndexWriter(ramDir, analyzer, true ,IndexWriter.MaxFieldLength.LIMITED);
String[] nameList = { "you are my friend", "a are my wife", "I love you" };
String[] addList = { "b", "you are my wife", "c" };
String[] fileList = { "1", "2", "3" };
for (int i = 0; i < nameList.length; i++)
{
Document doc = new Document();
doc.add(new Field("name", nameList[i], Field.Store.YES, Field.Index.ANALYZED));
doc.add(new Field("file", fileList[i], Field.Store.YES, Field.Index.ANALYZED));
doc.add(new Field("address", addList[i], Field.Store.YES, Field.Index.ANALYZED));
iw.addDocument(doc);
}
iw.close();
IndexSearcher _searcher = new IndexSearcher(ramDir);
BooleanQuery bq = new BooleanQuery();
QueryParser _parser = new QueryParser(Version.LUCENE_30,"name",analyzer);
Query _query = _parser.parse("you");
_query.setBoost(2f);
QueryParser _parser1 = new QueryParser(Version.LUCENE_30,"address",analyzer);
Query _query1 = _parser1.parse("you");
_query1.setBoost(1f);
bq.add(_query, BooleanClause.Occur.SHOULD);
bq.add(_query1, BooleanClause.Occur.SHOULD);
//
// for(int i=0;i<2;i++){
// QueryParser parser = new MultiFieldQueryParser(Version.LUCENE_30,new String[] {"name", "address" }, analyzer);
// Query q1 = parser.parse("you");
// bq.add(q1, BooleanClause.Occur.MUST);
// }
//
// SortField[] sortFields = new SortField[1];
// SortField sortField = new SortField("file", SortField.INT, true);//false升序，true降序
// sortFields[0] = sortField;
// Sort sort = new Sort(sortFields);
// TopDocs topDocs = _searcher.search(bq,null,_searcher.maxDoc(),sort);
//
TopDocs topDocs = _searcher.search(bq,_searcher.maxDoc());
ScoreDoc[] hits = topDocs.scoreDocs;
for (int i = 0; i < hits.length; i++) {
Document doc = _searcher.doc(hits[i].doc);
System.out.println("name:"+doc.get("name"));
System.out.println("file:"+doc.get("file"));
}
_searcher.close();
}

结果如下：（name 的boost最高，所以name优先于address排序在前面）
设置query boost值影响排序结果
name:you are my friend
file:1
name:I love you
file:3
name:a are my wife
file:2

3.通过设置fields 的boost 影响排序

/**
* 设置field boost 值影响查询排序结果,有排序则按照排序
* @throws Exception
*/
//没设置field boost 213 设置后是132
public void testBoost3() throws Exception{
System.out.println("设置fields boost 值影响查询排序结果");
RAMDirectory ramDir = new RAMDirectory();
Analyzer analyzer = new IKAnalyzer();
IndexWriter iw = new IndexWriter(ramDir, analyzer, true ,IndexWriter.MaxFieldLength.LIMITED);
String[] nameList = { "you are my friend", "a are my wife", "I love you" };
String[] addList = { "b", "you are my wife", "c" };
String[] fileList = { "1", "2", "3" };
for (int i = 0; i < nameList.length; i++)
{
Document doc = new Document();
Field nameField = new Field("name", nameList[i], Field.Store.YES, Field.Index.ANALYZED);
nameField.setBoost(20f);
doc.add(nameField);
doc.add(new Field("file", fileList[i], Field.Store.YES, Field.Index.ANALYZED));
Field f = new Field("address", addList[i], Field.Store.YES, Field.Index.ANALYZED);
f.setBoost(30f);
doc.add(f);
iw.addDocument(doc);
}
iw.close();
IndexSearcher _searcher = new IndexSearcher(ramDir);
String[] fields =new String[]{"name","file","address"};
Query query=IKQueryParser.parseMultiField(fields, "you");
// SortField[] sortFields = new SortField[1];
// SortField sortField = new SortField("file", SortField.INT, true);//false升序，true降序
// sortFields[0] = sortField;
// Sort sort = new Sort(sortFields);
// TopDocs topDocs = _searcher.search(query,null,_searcher.maxDoc(),sort);
TopDocs topDocs = _searcher.search(query,_searcher.maxDoc());
ScoreDoc[] hits = topDocs.scoreDocs;
for (int i = 0; i < hits.length; i++) {
Document doc = _searcher.doc(hits[i].doc);
System.out.println("name:"+doc.get("name"));
System.out.println("file:"+doc.get("file"));
}
_searcher.close();
}

结果如下：（address 的boost最高，先排在前面了）
设置fields boost 值影响查询排序结果
name:a are my wife
file:2
name:you are my friend
file:1
name:I love you
file:3

利用Boost影响Lucene查询结果的排序的更多相关文章

lucene 查询+分页+排序
lucene 查询+分页+排序 1.定义一个工厂类 LuceneFactory 1 import java.io.IOException; 2 3 import org.apache.lucene.a ...
有关Lucene的问题(4):影响Lucene对文档打分的四种方式
原文出自:http://forfuture1978.iteye.com/blog/591804点击打开链接在索引阶段设置Document Boost和Field Boost,存储在(.nrm)文件中 ...
query_string查询支持全部的Apache Lucene查询语法低频词划分依据模糊查询 Disjunction Max
3.3 基本查询3.3.1词条查询词条查询是未经分析的,要跟索引文档中的词条完全匹配注意:在输入数据中,title字段含有Crime and Punishment,但我们使用小写开头的crime来搜 ...
lucene查询解析器语法
注意:使用QueryParser查询,关键词是会被分词的,如果不需要分词,可以选择使用Lucene提供的API查询类. Lucene提供了丰富的API来组合定制你所需要的查询器,同时也可以利用Quer ...
Lucene查询条数限制
运用Lucene进行索引,在查询的时候是有条数限制的 public virtual TopFieldDocs Search(Query query, Filter filter, int n, Sor ...
利用SQL索引提高查询速度
1.合理使用索引索引是数据库中重要的数据结构,它的根本目的就是为了提高查询效率.现在大多数的数据库产品都采用IBM最先提出的ISAM索引结构. 索引的使用要恰到好处,其使用原则如下: 在经常进行连接 ...
基于Lucene查询原理分析Elasticsearch的性能
前言 Elasticsearch是一个很火的分布式搜索系统,提供了非常强大而且易用的查询和分析能力,包括全文索引.模糊查询.多条件组合查询.地理位置查询等等,而且具有一定的分析聚合能力.因为其查询场景 ...
Lucene 查询原理传统二级索引方案倒排链合并倒排索引跳表位图
提问: 1.倒排索引与传统数据库的索引相比优势? 2.在lucene中如果想做范围查找,根据上面的FST模型可以看出来,需要遍历FST找到包含这个range的一个点然后进入对应的倒排链,然后进行求并集 ...
利用DetachedCriteria实现模糊查询和分页
分类: Java-Developing 前段时间在做模糊查询,并利用数据库分页,DAO用hibernate实现,刚开始的时候根据业务层的数据,拼hql语句进行查询,且不说要进行一些if判断,单 ...

随机推荐

优先队列（Priority Queue）
优先队列(Priority Queue) A priority queue must at least support the following operations: insert_with_pr ...
PAT (Basic Level) Practise：1001. 害死人不偿命的(3n+1)猜想
[题目链接] 卡拉兹(Callatz)猜想: 对任何一个自然数n,如果它是偶数,那么把它砍掉一半:如果它是奇数,那么把(3n+1)砍掉一半.这样一直反复砍下去,最后一定在某一步得到n=1.卡拉兹在19 ...
四主要的几种 Web 服务器
一 Microsoft IIS 1. 仅支持 Windows 操作系统,用于 .Net 平台网站的部署和运行. 2. IIS 是一种 Web 服务组件,包括括 Web 服务器.FTP 服务器.NNTP ...
Codeforces Round #310 (Div. 2) A B C
A. Case of the Zeros and Ones time limit per test 1 second memory limit per test 256 megabytes input ...
poj 2299 树状数组求逆序对数+离散化
Ultra-QuickSort Time Limit: 7000MS Memory Limit: 65536K Total Submissions: 54883 Accepted: 20184 ...
学习使用：before和：after伪元素
http://www.w3cplus.com/css3/learning-to-use-the-before-and-after-pseudo-elements-in-css.html
uboot完全手册---14
1. u-boot介绍本次移植采用的是U-Boot-1.2.0版本. 3. U-Boot源码分析 3.1 源码入口的解释可能大多数的同学上网查资料后都了解到,stage1阶段的启动代码,主要就在s ...
基于Spring MVC的Web应用开发(三) - Resources
基于Spring MVC的Web应用开发(3) - Resources 上一篇介绍了在基于Spring MVC的Web项目中加入日志,本文介绍Spring MVC如何处理资源文件. 注意到本项目的we ...
SoftmaxLayer and SoftmaxwithLossLayer 代码解读
SoftmaxLayer and SoftmaxwithLossLayer 代码解读 Wang Xiao 先来看看 SoftmaxWithLoss 在prototext文件中的定义: layer { ...
jfinal对象封装Record原理
/*DbPro.class*/ public transient Record findFirst(String sql, Object paras[]{ List result = find(sql ...

利用Boost影响Lucene查询结果的排序

利用Boost影响Lucene查询结果的排序的更多相关文章

随机推荐

热门专题