Lucene多字段搜索

最近在学习Lucene的过程中遇到了需要多域搜索并排序的问题，在网上找了找，资料不是很多，现在都列出来，又需要的可以自己认真看看，都是从其他网站粘贴过来的，所以比较乱，感谢原创的作者们！
使用MultiFieldQueryParser类即可。

示例代码：

package com.lucene.search;
import java.io.File;
import java.io.IOException; 54com.cn
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.queryParser.MultiFieldQueryParser;
import org.apache.lucene.search.BooleanClause;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
public class Searcher {
feedom.net
public static void main(String[] args) throws Exception {
File indexDir = new File("C:\\target\\index\\book");
if (!indexDir.exists() || !indexDir.isDirectory()) {
throw new IOException();
}
search(indexDir);
}
public static void search(File indexDir) throws Exception {
Directory fsDir = FSDirectory.getDirectory(indexDir);
IndexSearcher searcher = new IndexSearcher(fsDir);
String[] queries = { "中文版", "8*" };
String[] fields = { "name", "isbn" };
BooleanClause.Occur[] clauses = { BooleanClause.Occur.SHOULD, BooleanClause.Occur.SHOULD };
Query query = MultiFieldQueryParser.parse(queries, fields, clauses, new StandardAnalyzer());
Hits hits = searcher.search(query);
System.out.println("共有" + searcher.maxDoc() + "条索引，命中" + hits.length() + "条");
; i < hits.length(); i++) {
int DocId = hits.id(i);
String DocName = hits.doc(i).get("name");
String DocIsbn = hits.doc(i).get("isbn");
String DocPblDt = hits.doc(i).get("pbl_dt");
System.out.println(DocId + ":" + DocName + " ISBN:" + DocIsbn + " PBLDT:" + DocPblDt);
}
}
}

package com.lucene.search;
import java.io.File;
import java.io.IOException;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.queryParser.MultiFieldQueryParser;
import org.apache.lucene.search.BooleanClause;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
public class Searcher {
public static void main(String[] args) throws Exception {
File indexDir = new File("C:\\target\\index\\book");
if (!indexDir.exists() || !indexDir.isDirectory()) {
throw new IOException();
}
search(indexDir);
}
public static void search(File indexDir) throws Exception {
Directory fsDir = FSDirectory.getDirectory(indexDir);
IndexSearcher searcher = new IndexSearcher(fsDir);
String[] queries = { "中文版", "8*" };
String[] fields = { "name", "isbn" };
BooleanClause.Occur[] clauses = { BooleanClause.Occur.SHOULD, BooleanClause.Occur.SHOULD };
Query query = MultiFieldQueryParser.parse(queries, fields, clauses, new StandardAnalyzer());
Hits hits = searcher.search(query);
System.out.println("共有" + searcher.maxDoc() + "条索引，命中" + hits.length() + "条");
; i < hits.length(); i++) {
int DocId = hits.id(i);
String DocName = hits.doc(i).get("name");
String DocIsbn = hits.doc(i).get("isbn");
String DocPblDt = hits.doc(i).get("pbl_dt");
System.out.println(DocId + ":" + DocName + " ISBN:" + DocIsbn + " PBLDT:" + DocPblDt);
}
}
}

注意：BooleanClause.Occur[]数组,它表示多个条件之间的关系：

BooleanClause.Occur.MUST表示and, feedom.net

BooleanClause.Occur.MUST_NOT表示not, 54com.cn

BooleanClause.Occur.SHOULD表示or.

---------------------------------------------------------------------------------------------------------
多个关键字直接的关系是或，所以直接使用多域搜索对象查询出来的结果就是这样。
更灵活的控制方式为：

BooleanQuery booleanQuery = new BooleanQuery();
QueryParser parser = new QueryParser("title",分词器);
Query titleQuery = parser .parser("中国人民共和国");
booleanQuery.add(titleQuery,....SHOULD);
QueryParser parser = new QueryParser("content",分词器);
Query contentQuery = parser .parser("中国人民共和国");
booleanQuery.add(contentQuery ,....SHOULD);

--------------------------------------------------------------------------------------------------

package com.lucene.search;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.queryParser.MultiFieldQueryParser;
import org.apache.lucene.search.BooleanClause;
import org.apache.lucene.search.Hits; import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.MultiSearcher;
import org.apache.lucene.search.Query;
public class Multisearcher {
private static String INDEX_STORE_PATH1 = "C:\\multi\\1"; private static String INDEX_STORE_PATH2 = "C:\\multi\\2";
public static void main(String[] args) throws Exception {
Multisearcher.multisearcher();
}
public static void multisearcher() throws Exception {
IndexWriter writer = new IndexWriter(INDEX_STORE_PATH1, new StandardAnalyzer(), true);
writer.setUseCompoundFile(false);
Document doc1 = new Document();
Field f1 = new Field("bookname", "钢铁是怎样炼成的", Field.Store.YES, Field.Index.TOKENIZED);
Field f11 = new Field("price", "20.5", Field.Store.YES, Field.Index.UN_TOKENIZED);
doc1.add(f1); doc1.add(f11);
Document doc2 = new Document();
Field f2 = new Field("bookname", "钢铁战士", Field.Store.YES, Field.Index.TOKENIZED);
Field f22 = new Field("price", "18.4", Field.Store.YES, Field.Index.UN_TOKENIZED);
doc2.add(f2);
doc2.add(f22);
Document doc3 = new Document();
Field f3 = new Field("bookname", "钢和铁是两种不同的元素", Field.Store.YES, Field.Index.TOKENIZED);
Field f33 = new Field("price", "7.6", Field.Store.YES, Field.Index.UN_TOKENIZED);
doc3.add(f3);
doc3.add(f33);
writer.addDocument(doc1);
writer.addDocument(doc2);
writer.addDocument(doc3);
writer.close(); //创建第二个索引器；
IndexWriter writer2 = new IndexWriter(INDEX_STORE_PATH2, new StandardAnalyzer(), true);
writer2.setUseCompoundFile(false);
Document doc4 = new Document();
Field f4 = new Field("bookname", "钢要比铁有更多的元素", Field.Store.YES, Field.Index.TOKENIZED);
Field f44 = new Field("price", "22.5", Field.Store.YES, Field.Index.UN_TOKENIZED);
doc4.add(f4); doc4.add(f44);
Document doc5 = new Document();
Field f5 = new Field("bookname", "钢和铁是两种重要的金属", Field.Store.YES, Field.Index.TOKENIZED);
Field f55 = new Field("price", "15.9", Field.Store.YES, Field.Index.UN_TOKENIZED);
doc5.add(f5); doc5.add(f55); Document doc6 = new Document();
Field f6 = new Field("bookname", "钢铁是两种重要的金属", Field.Store.YES, Field.Index.TOKENIZED);
Field f66 = new Field("price", "19.00", Field.Store.YES, Field.Index.UN_TOKENIZED);
doc6.add(f6);
doc6.add(f66);
writer2.addDocument(doc4);
writer2.addDocument(doc5);
writer2.addDocument(doc6);
writer2.close();
String query1 = "钢";
String query2 = "[10 TO 20]";//注意格式：中括号还有关键字TO是大写的
String[] queries = { query1, query2 }; //指定两个域
Field String field1 = "bookname";
String field2 = "price";
String[] fields = { field1, field2 }; //指定查询字句之间的关系
BooleanClause.Occur[] clauses = {
BooleanClause.Occur.MUST, BooleanClause.Occur.MUST
}; //转成多域查询
MultiFieldQuery Query q = MultiFieldQueryParser.parse(queries, fields, clauses, new StandardAnalyzer()); //打印Query的内容 System.out.println(q.toString()); //创建两个IndexSearcher，以实现在多个索引目录进行查询
IndexSearcher searcher1 = new IndexSearcher(INDEX_STORE_PATH1);
IndexSearcher searcher2 = new IndexSearcher(INDEX_STORE_PATH2);
IndexSearcher[] searchers = { searcher1, searcher2 }; //使用MultiSearcher进行多域搜索
MultiSearcher searcher = new MultiSearcher(searchers);
Hits hits = searcher.search(q);
; i < hits.length(); i++) {
System.out.println(hits.doc(i));
}
}
}

------------------------------------------------------------------------------------------------------------------------------------------
默认情况下,IndexSearcher类的search方法返回查询结果时,是按文档的分值排序的,可以使用重载的search方法对结果排序

IndexSearcher.search(Query,Sort);

new Sort() 和 Sort.RELEVANCE,以及null一样,采用默认排序,要定义排序字段,方法是将字段传入Sort对象

Sort sort = new Sort(String field);

也可以对多个字段排序

Sort sort = new Sort(String[] fields);

例:

Sort sort = new Sort(new SortField[]{new SortField(“title”),new SortField(“name”)});
Hits hits=searcher.search(query,Sort);

多字段查找MultiFieldQueryParser

只在某些Term中查找,不关心在哪个字段

Query query = new MultiFieldQueryParser.parse(“word”,new String[]{“title”,”content”},analyzer);

//在title和content中找word

多字段时默认是OR关系,要改变它,使用以下方法:

Query query = MultiFieldQueryParser.parse(“word”,new String[]{“title”,”content”},new int[]{MultiFieldQueryParser.REQUIRED_FIELD,MultiFieldQueryParser.PROHIBITED_FIELD},analyzer);

其中:

REQUIRED_FIELD 表示该条件必须有

PROHIBITED_FIELD 表示必须不含

搜索多个索引文件MultiSearcher

1) 建立多个索引:使用不同的索引目录,实例化不同的IndexWriter

2) 建立多索引搜索器:

Searcher[] searchers = new SEARCHER[2];

Searchers[0] = new IndexSearcher(dir1); //搜索索引目录一

Searchers[1]= new IndexSearcher(dir2);//搜索索引目录二

Searcher searcher = new MultiSearcher(serarchers);

3) 开始查询:Hits hits = searcher.search(query);
---------------------------------------------------------------------------------------------------------------------------------------

BooleanQuery typeNegativeSearch = new BooleanQuery();
QueryParser parser = new QueryParser("contents", new Analyzer());
parser.setDefaultOperator(QueryParser.AND_OPERATOR);
query = parser.parse(queryString);
QueryParser parser2 = new QueryParser("adISELL", new Analyzer());
query2 = parser2.parse("\"2\"");
QueryParser parser3 = new QueryParser("adISELL", new Analyzer());
query3 = parser3.parse("\"2\"");
QueryParser parser4 = new QueryParser("adISELL", new Analyzer());
query4 = parser4.parse("\"2\"");
QueryParser parser4 = new QueryParser("adISELL", new Analyzer());
query4 = parser4.parse("\"2\"");
。。。。
QueryParser parser..n = new QueryParser("adISELL", new Analyzer());
query..n = parser..n.parse("\"2\"");
typeNegativeSearch.add(query,Occur.MUST);
typeNegativeSearch.add(query2,Occur.MUST);
typeNegativeSearch.add(query3,Occur.MUST);
typeNegativeSearch.add(query4,Occur.MUST);
.....
typeNegativeSearch.add(query..n,Occur.MUST);
hits = searcher.search(typeNegativeSearch);

1, 几种span的querySpanTermQuery：检索效果完全同TermQuery，但内部会记录一些位置信息

，供SpanQuery的其它API使用，是其它属于SpanQuery的Query的基础。
SpanFirstQuery：查找方式为从Field的内容起始位置开始，在一个固定的宽度内查找所指定的

词条。
SpanNearQuery：功能类似PharaseQuery。SpanNearQuery查找所匹配的不一定是短语，还有可

能是另一个SpanQuery的查询结果作为整体考虑，进行嵌套查询。
SpanOrQuery：把所有SpanQuery查询结果综合起来，作为检索结果。
SpanNotQuery：从第一个SpanQuery查询结果中，去掉第二个SpanQuery查询结果，作为检索结

果。

2, 多条件索引关系

BooleanClause用于表示布尔查询子句关系的类，包括：BooleanClause.Occur.MUST，

BooleanClause.Occur.MUST_NOT，BooleanClause.Occur.SHOULD。有以下6种组合：
1．MUST和MUST：取得连个查询子句的交集。
2．MUST和MUST_NOT：表示查询结果中不能包含MUST_NOT所对应得查询子句的检索结果。
3．MUST_NOT和MUST_NOT：无意义，检索无结果。
4．SHOULD与MUST、SHOULD与MUST_NOT：SHOULD与MUST连用时，无意义，结果为MUST子句的检索

结果。与MUST_NOT连用时，功能同MUST。
5．SHOULD与SHOULD：表示“或”关系，最终检索结果为所有检索子句的并集。

Lucene多字段搜索的更多相关文章

Apache Lucene(全文检索引擎)—搜索
目录返回目录:http://www.cnblogs.com/hanyinglong/p/5464604.html 本项目Demo已上传GitHub,欢迎大家fork下载学习:https://gith ...
Apache Solr采用Java开发、基于Lucene的全文搜索服务器
http://docs.spring.io/spring-data/solr/ 首先介绍一下solr: Apache Solr (读音: SOLer) 是一个开源.高性能.采用Java开发.基于Luc ...
elasticsearch多字段搜索
https://blog.csdn.net/Ricky110/article/details/78888711 多字段搜索多字符串查询boost 参数 “最佳” 值,较为简单的方式就是不断试错,比较合 ...
Lucene的其他搜索(三)
生成索引: package com.wp.search; import java.nio.file.Paths; import org.apache.lucene.analysis.Analyzer; ...
ElasticSearch 2 (15) - 深入搜索系列之多字段搜索
ElasticSearch 2 (15) - 深入搜索系列之多字段搜索摘要查询很少是简单的一句话匹配(one-clause match)查询.很多时候,我们需要用相同或不同的字符串查询1个或多个字 ...
基于 Lucene 的桌面文件搜索
开源2010年,自己在学习 Lucene 时开发的一款桌面文件搜索工具,这么多年过去了,代码一直静静存放在自己的硬盘上,与其让其沉睡,不如分享出来. 这款工具带有明显的模仿 Everything 的痕 ...
Elasticsearch 全字段搜索_all，query_string查询，不进行分词
最近在使用ELasitcsearch的时候,需要用到关键字搜索,因为是全字段搜索,就需要使用_all字段的query_string进行搜索. 但是在使用的时候,遇到问题了.我们的业务并不需要分词,我在 ...
[Elasticsearch] 多字段搜索 (五) - 以字段为中心的查询
以字段为中心的查询(Field-centric Queries) 上述提到的三个问题都来源于most_fields是以字段为中心(Field-centric),而不是以词条为中心(Term-centr ...
[Elasticsearch] 多字段搜索 (一) - 多个及单个查询字符串
多字段搜索(Multifield Search) 本文翻译自官方指南的Multifield Search一章. 查询很少是只拥有一个match查询子句的查询.我们经常需要对一个或者多个字段使用相同或者 ...

随机推荐

VS2010 Cstring to int
今天遇到一个将Cstring转化为int的问题,用atoi(),发现不可以,找了一下原因. 原来因为在VS2015(2010)的版本中是UNICODE ,请使用这个函数 _ttoi() . CStri ...
一个IP支持多个网站实例(Apache2、Ubuntu相关)
http://www.blogjava.net/Andyluo/archive/2009/08/24/21821.html http://blog.csdn.net/zltianhen/article ...
《chkconfig命令》－linux命令五分钟系列之四
本原创文章属于<Linux大棚>博客. 博客地址为http://roclinux.cn. 文章作者为roc 希望您能通过捐款的方式支持Linux大棚博客的运行和发展.请见“关于捐款” == ...
DOM对象控制HTML无素——详解3
创建元素节点createElement createElement()方法可创建元素节点.此方法可返回一个 Element 对象. 语法: document.createElement(tagName ...
js基础 - 兼容代码
js基础 - 兼容代码 . scrollTop . chrome document.body.scrollTop . IE && firefox document.documentEl ...
devicePixelRatio
devicePixelRatio window.devicePixelRatio是设备上物理像素和逻辑像素的比例.公式表示就是:window.devicePixelRatio = 物理像素 / 逻辑像 ...
php之分页类代码
/* 思路 1.把地址栏的URL获取 2.分析URL中的query部分--就是?后面传参数的部分 3.query部分分析成数组 4.把数组中的page单元,+1,-1,形成2个新的数组 5.再把新数组 ...
opencv数据结构总结
OpenCV里面用到了很多图像相关的数据结构,熟练掌握它们是学习图像的基础. 1.IplImage IplImage IplImage IPL 图像头 typedef struct _IplImage ...
Python3.X与urllib
在Python3.X中使用urllib时,不能像Python2.X一样直接使用: import urllib response = urllib.urlopen("http://www.ba ...
day2练习题
#import <Foundation/Foundation.h> int main(int argc, const char * argv[]) { @autoreleasepool { ...

Lucene多字段搜索

Lucene多字段搜索的更多相关文章

随机推荐

热门专题