背景：

工作任务完成后，闲暇之计给自己充充电！

Lucene是一个纯java全文检索工具包，采用倒排索引原理。

全文检索：指的是计算机索引程序通过扫描文章的每一个词，对每一个词建立一个索引，并指明该词在文章中出现的次数和位置。

索引的类型分为：1：为一索引、2：主键索引、3：聚集索引。索引就是加快检索表中数据的方法。

搜索：
一：按被搜索的资源类型
1、可以转为文本的
2、多媒体类型的
二：按照搜索方式：
1、不处理语义，只是找出现了指定词语的所有文本。（指对词语进行匹配）
基本概念：
1、使用流程：先建立索引，（索引库）在进行搜索。
2、使用Lucene的数据结构，document、field。
建立索引的过程：
1、定义一个语法分词器
2、确定索引存储的位置
3、创建IndexWriter，进行索引的写入
4、内容提取，进行索引文件的写入
5、关闭indexWriter
从索引库中搜索的过程：
1、打开存储位置
2、创建搜索器
3、类似SQL进行查询
4、处理结果
5、关闭DirectoryReader

-----------------------------------------------------------------------------------------------------------------

/**
* @项目名称：lucene
* @类名称：Article
* @类描述：这是一个文章实体类
* @创建人：YangChao
* @创建时间：2016年8月30日下午3:11:38
* @version 1.0.0
*/
public class Article {
private Integer id;
private String title;
private String content;
}

/**
* @项目名称：lucene
* @类名称：DocumentUtils
* @类描述：文章实体类和Document的转换工具
* @创建人：YangChao
* @创建时间：2016年8月31日上午10:15:22
* @version 1.0.0
*/
public class DocumentUtils {
public static Document article2Document(Article article) {
Document doc = new Document();
doc.add(new Field("id", article.getId().toString(), TextField.TYPE_STORED));
doc.add(new Field("title", article.getTitle(), TextField.TYPE_STORED));
doc.add(new Field("content", article.getContent(), TextField.TYPE_STORED));
return doc;
}
public static Article document2Ariticle(Document doc) {
Article article = new Article();
article.setId(Integer.parseInt(doc.get("id")));
article.setTitle(doc.get("title"));
article.setContent(doc.get("content"));
return article;
}
}

/**
* @项目名称：lucene
* @类名称：LuceneUtils
* @类描述：获取分词器和索引位置
* @创建人：YangChao
* @创建时间：2016年8月31日上午9:48:06
* @version 1.0.0
*/
public class LuceneUtils {
private static Logger logger = Logger.getLogger(LuceneUtils.class);
private static Directory directory;
private static Analyzer analyzer;
static {
try {
directory = FSDirectory.open(Paths.get("./tmp/testindex"));
// analyzer = new StandardAnalyzer();
analyzer = new SmartChineseAnalyzer();
} catch (Exception e) {
logger.error("LuceneUtils error!", e);
}
}
public static Directory getDirectory() {
return directory;
}
public static Analyzer getAnalyzer() {
return analyzer;
}
public static void closeIndexWriter(IndexWriter indexWriter) {
if (indexWriter != null) {
try {
indexWriter.close();
} catch (Exception e2) {
logger.error("indexWriter.close error", e2);
}
}
}
}

**
* @项目名称：lucene
* @类名称：QueryResult
* @类描述：结果集
* @创建人：YangChao
* @创建时间：2016年8月31日下午4:56:24
* @version 1.0.0
*/
public class QueryResult {
private int count;
private List list;
public QueryResult() {
super();
}
public QueryResult(int count, List list) {
super();
this.count = count;
this.list = list;
}
}

/**
* @项目名称：lucene
* @类名称：IndexDao
* @类描述：
* @创建人：YangChao
* @创建时间：2016年8月31日上午10:12:05
* @version 1.0.0
*/
public class IndexDao {
private static Logger logger = Logger.getLogger(IndexDao.class);
public void save(Article article) {
Document doc = DocumentUtils.article2Document(article);
IndexWriter indexWriter = null;
try {
IndexWriterConfig config = new IndexWriterConfig(LuceneUtils.getAnalyzer());
indexWriter = new IndexWriter(LuceneUtils.getDirectory(), config);
indexWriter.addDocument(doc);
} catch (Exception e) {
logger.error("IndexDao.save error", e);
} finally {
LuceneUtils.closeIndexWriter(indexWriter);
}
}
public void delete(String id) {
IndexWriter indexWriter = null;
try {
Term term = new Term("id", id);
IndexWriterConfig config = new IndexWriterConfig(LuceneUtils.getAnalyzer());
indexWriter = new IndexWriter(LuceneUtils.getDirectory(), config);
indexWriter.deleteDocuments(term);// 删除含有指定term的所有文档
} catch (Exception e) {
logger.error("IndexDao.save error", e);
} finally {
LuceneUtils.closeIndexWriter(indexWriter);
}
}
public void update(Article article) {
Document doc = DocumentUtils.article2Document(article);
IndexWriter indexWriter = null;
try {
Term term = new Term("id", article.getId().toString());
IndexWriterConfig config = new IndexWriterConfig(LuceneUtils.getAnalyzer());
indexWriter = new IndexWriter(LuceneUtils.getDirectory(), config);
indexWriter.updateDocument(term, doc);// 先删除，后创建。
} catch (Exception e) {
logger.error("IndexDao.save error", e);
} finally {
LuceneUtils.closeIndexWriter(indexWriter);
}
}
public QueryResult search(String queryString, int firstResult, int maxResult) {
List<Article> list = new ArrayList<Article>();
try {
DirectoryReader ireader = DirectoryReader.open(LuceneUtils.getDirectory());
// 2、第二步，创建搜索器
IndexSearcher isearcher = new IndexSearcher(ireader);
// 3、第三步，类似SQL，进行关键字查询
String[] fields = { "title", "content" };
QueryParser parser = new MultiFieldQueryParser(fields, LuceneUtils.getAnalyzer());
Query query = parser.parse("检索");
TopDocs topDocs = isearcher.search(query, firstResult + maxResult);
int count = topDocs.totalHits;// 总记录数
System.out.println("总记录数为：" + topDocs.totalHits);// 总记录数
ScoreDoc[] hits = topDocs.scoreDocs;// 第二个参数，指定最多返回前n条结果
// 高亮
Formatter formatter = new SimpleHTMLFormatter("<font color='red'>", "</font>");
Scorer source = new QueryScorer(query);
Highlighter highlighter = new Highlighter(formatter, source);
// 摘要
// Fragmenter fragmenter = new SimpleFragmenter(5);
// highlighter.setTextFragmenter(fragmenter);
// 处理结果
int endIndex = Math.min(firstResult + maxResult, hits.length);
for (int i = firstResult; i < endIndex; i++) {
Document hitDoc = isearcher.doc(hits[i].doc);
Article article = DocumentUtils.document2Ariticle(hitDoc);
//
String text = highlighter.getBestFragment(LuceneUtils.getAnalyzer(), "content", hitDoc.get("content"));
if (text != null) {
article.setContent(text);
}
list.add(article);
}
ireader.close();
return new QueryResult(count, list);
} catch (Exception e) {
logger.error("IndexDao.search error", e);
}
return null;
}
}
lucence详细学习地址:http://www.cnblogs.com/zhuxiaojie/p/5277219.html

全文检索lucene6.1的检索方式的更多相关文章

Hibernate —— HQL、QBC检索方式
一.HQL 检索方式以双向的一对多来测试 HQL 检索方式.以 Department 和 Employee 为例. 建表语句: CREATE TABLE department ( dept_id ) ...
Hibernate的检索方式
Hibernate的检索方式检索方式(查询的方式) 导航对象图检索方式: 根据已经加载的对象导航到其他对象 Customer customer = (Customer)session.get(Cus ...
攻城狮在路上（壹） Hibernate（十四）--- Hibernate的检索方式（下）
本节介绍HQL和QBC的高级用法:各种连接查询.投影查询.报表查询.动态查询.集合过滤和子查询等.另外将归纳优化查询程序代码,从而提高查询性能的各种技巧.一.连接查询: HQL与QBC支持的各种连接类 ...
攻城狮在路上（壹） Hibernate（十三）--- Hibernate的检索方式（上）
Hibernate提供了以下几种检索对象的方式: A.导航对象图检索方式. B.OID检索方式.Session.get() load(); C.HQL检索方式.Query. D.QBC检索方式.Que ...
hibernate检索方式（HQL 检索方式，QBC 检索方式，本地 SQL 检索方式）
hibernate有五种检索方式,这儿用单向的一对多的映射关系例子,这儿有后三种的方式: 导航对象图检索方式: 根据已经加载的对象导航到其他对象 OID 检索方式: 按照对象的 OID 来检索对象 ...
Hibernate 检索方式
概述 •Hibernate 提供了以下几种检索对象的方式 –导航对象图检索方式: 根据已经加载的对象导航到其他对象 –OID 检索方式: 按照对象的 OID 来检索对象 –HQL 检索方式: 使用 ...
Hibernate入门6.Hibernate检索方式
Hibernate入门6.Hibernate检索方式 20131128 代码下载链接: http://pan.baidu.com/s/1Ccuup 密码: vqlv Hibernate的整体框架已经 ...
[原创]java WEB学习笔记89：Hibernate学习之路-- -Hibernate检索方式(5种)，HQL介绍，实现功能，实现步骤，
本博客的目的:①总结自己的学习过程,相当于学习笔记 ②将自己的经验分享给大家,相互学习,互相交流,不可商用内容难免出现问题,欢迎指正,交流,探讨,可以留言,也可以通过以下方式联系. 本人互联网技术爱 ...
Hibernate的三种常用检索方式
Hibernate 提供了以下几种检索对象的方式 ¨ 导航对象图检索方式: 根据已经加载的对象导航到其他对象 ¨ OID 检索方式: 按照对象的 OID 来检索对象 ¨ ...

随机推荐

javascript事件列表解说
javascript事件列表解说事件浏览器支持解说一般事件 onclick IE3.N2 鼠标点击时触发此事件 ondblclick IE4.N4 鼠标双击时触发此事件 onmousedown ...
Unsupported platform for fsevents@1.2.3: wanted {"os":"darwin","arch":"any"} (current: {"os":"win32","arch":"x64"})
系统:win10 使用 npm 安装依赖时报错: Unsupported platform for fsevents@1.2.3: wanted {"os":"darwi ...
POJ 1258 Agri-Net（裸最小生成树）
链接:传送门! 题意:一个裸最小生成树,采用Kruskal. /******************************************************************** ...
训练1-J
把一个偶数拆成两个不同素数的和,有几种拆法呢? Input 输入包含一些正的偶数,其值不会超过10000,个数不会超过500,若遇0,则结束. Output 对应每个偶数,输出其拆成不同素数的个数,每 ...
POJ 2376 Cleaning Shifts (线段树优化DP)
题目大意:给你很多条线段,开头结尾是$[l,r]$,让你覆盖整个区间$[1,T]$,求最少的线段数题目传送门线段树优化$DP$裸题.. 先去掉所有能被其他线段包含的线段,这种线段一定不在最优解里 ...
[tyvj-1194]划分大理石二进制优化多重背包
突然发现这个自己还不会... 其实也不难,就和快速幂感觉很像,把物品数量二进制拆分一下,01背包即可我是咸鱼 #include <cstdio> #include <cstring ...
DNS解析流程原理（图例）
13台根服务器的dns: 1.root-servers.net198.41.0.4美国2.root-servers.net192.228.79.201美国(另支持IPv6)3.root-servers ...
log4j 设置日志输出文件的路径
log4j.rootLogger=debug, A1 #输出到指定目录下 og4j.appender.A1.File=/log.log #输出到tomcat容器下的指定目录 log4j.appende ...
【CodeForces 271D】Good Substrings
[链接] 我是链接,点我呀:) [题意] [题解] 字典树我们可以两重循环(i,j) 来枚举所有的子串即i=1,j=1,2,3... i=2,j = 2,3,4,.. 于是我们在i变化的时候(就是 ...
sso 登录业务逻辑

全文检索lucene6.1的检索方式

背景：

全文检索lucene6.1的检索方式的更多相关文章

随机推荐

热门专题