lucene3.6.1 经典案例入门教程（包含从文件中读取content）

转载http://liqita.iteye.com/blog/1676664

第一步：下载lucene的核心包

lucene-core-3.6.1-javadoc.jar (3.5 MB)

lucene-core-3.6.1.jar (1.5 MB)

拷贝到项目的lib 文件夹里

第二步：

在C盘下建立source文件夹（C:\source）

source文件夹存放待索引的文件，例如，建立两个文件，名称为 test1.txt test2.txt 。

test1.txt文件内容为：欢迎来到绝对秋香的博客。

test2.txt文件内容为：绝对秋香引领你走向潮流。

在C盘下再建立index文件夹，存放索引文件（C:\index）

第三步，建立索引类 TextFileIndexer ，并运行主函数

package com.newtouchone.lucene;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.Date;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.IndexWriterConfig.OpenMode;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;
public class TextFileIndexer {
public static void main(String[] args) throws Exception {
/* 指明要索引文件夹的位置,这里是C盘的source文件夹下 */
File fileDir = new File("C:\\source");
/* 这里放索引文件的位置 */
File indexDir = new File("C:\\index");
Directory dir = FSDirectory.open(indexDir);
Analyzer luceneAnalyzer = new StandardAnalyzer(Version.LUCENE_36);
IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_36,luceneAnalyzer);
iwc.setOpenMode(OpenMode.CREATE);
IndexWriter indexWriter = new IndexWriter(dir,iwc);
File[] textFiles = fileDir.listFiles();
long startTime = new Date().getTime();
//增加document到索引去
for (int i = 0; i < textFiles.length; i++) {
if (textFiles[i].isFile()
&& textFiles[i].getName().endsWith(".txt")) {
System.out.println("File " + textFiles[i].getCanonicalPath()
+ "正在被索引....");
String temp = FileReaderAll(textFiles[i].getCanonicalPath(),
"GBK");
System.out.println(temp);
Document document = new Document();
Field FieldPath = new Field("path", textFiles[i].getPath(),
Field.Store.YES, Field.Index.NO);
Field FieldBody = new Field("body", temp, Field.Store.YES,
Field.Index.ANALYZED,
Field.TermVector.WITH_POSITIONS_OFFSETS);
document.add(FieldPath);
document.add(FieldBody);
indexWriter.addDocument(document);
}
}
indexWriter.close();
//测试一下索引的时间
long endTime = new Date().getTime();
System.out
.println("这花费了"
+ (endTime - startTime)
+ " 毫秒来把文档增加到索引里面去!"
+ fileDir.getPath());
}
public static String FileReaderAll(String FileName, String charset)
throws IOException {
BufferedReader reader = new BufferedReader(new InputStreamReader(
new FileInputStream(FileName), charset));
String line = new String();
String temp = new String();
while ((line = reader.readLine()) != null) {
temp += line;
}
reader.close();
return temp;
}
}

输出结果为：

File C:\source\test1.txt正在被索引....
欢迎来到绝对秋香的博客。
File C:\source\test2.txt正在被索引....
绝对秋香引领你走向潮流。
这花费了641 毫秒来把文档增加到索引里面去!C:\source

第四步，建立测试类TestQuery，并运行主函数，输出测试结果

package com.newtouchone.lucene;
import java.io.File;
import java.io.IOException;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;
public class TestQuery {
public static void main(String[] args) throws IOException, ParseException {
String index = "C:\\index"; //搜索的索引路径
IndexReader reader = IndexReader.open(FSDirectory.open(new File(index)));
IndexSearcher searcher = new IndexSearcher(reader);
ScoreDoc[] hits = null;
String queryString = "绝对秋香"; //搜索的关键词
Query query = null;
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_36);
try {
QueryParser qp = new QueryParser(Version.LUCENE_36,"body", analyzer);
query = qp.parse(queryString);
} catch (ParseException e) {
}
if (searcher != null) {
TopDocs results = searcher.search(query,10); //返回最多为10条记录
hits = results.scoreDocs;
if (hits.length > 0) {
System.out.println("找到:" + hits.length + " 个结果!");
}
searcher.close();
}
}
}

测试输出结果为：

找到:2 个结果!

附件homework.rar为项目文件，解压部署则可运行该lucene案例

lucene-core-3.6.1.jar (1.5 MB)
下载次数: 605

lucene-core-3.6.1-javadoc.jar (3.5 MB)
下载次数: 674

homework.rar (4.5 MB)
下载次数: 500

lucene3.6.1 经典案例入门教程（包含从文件中读取content）的更多相关文章

lucene3.6.0 经典案例入门教程
第一步:下载并导入lucene的核心包(注意版本问题): 例如Lucene3.6版本:将lucene-core-3.6.0.jar拷贝到项目的libs 文件夹里. 例如Lucene4.6版本:将l ...
Entity Framework入门教程（3)---EF中的上下文简介
1.DbContext(上下文类) 在DbFirst模式中,我们添加一个EDM(Entity Data Model)后会自动生成一个.edmx文件,这个文件中包含一个继承DbContext类的上下文实 ...
DotNetBrowser入门教程（更新完善中）
DotNetBrowser 希望实现的目标:桌面软件可以完美运行Html5,内置支持MVC与WebSocket的微型服务器. 基于.Net 4.0开发.开发环境:VS2017,运行环境支持Window ...
linux入门教程(六) Linux文件与目录管理
在linux中什么是一个文件的路径呢,说白了就是这个文件存在的地方,例如在上一章提到的/root/.ssh/authorized_keys 这就是一个文件的路径.如果你告诉系统这个文件的路径,那么系统 ...
flask的模板引擎jinja入门教程包含一个通过网络实时传输Video视频流的示例
本文首发于个人博客https://kezunlin.me/post/1e37a6/,欢迎阅读最新内容! tutorial to use python flask jinja templates and ...
Entity Framework入门教程（4)---EF中的实体关系
这一节将总结EF是怎么管理实体之间的关系.EF与数据库一样支持三种关系类型:①一对一 ,②一对多,③多对多. 下边是一个SchoolDB数据库的实体数据模型,图中包含所有的实体和各个实体间的关系.通过 ...
JavaScript 入门教程二在HTML中使用 JavaScript
一.使用 <script> 元素的方式有两种:直接在页面中嵌入 JavaScript 代码和引用外部 JavaScript 文件. 二.使用内嵌方式,一般写法为: <script t ...
Entity Framework入门教程（5)---EF中的持久化场景
EF中的持久性场景使用EF实现实体持久化(保存)到数据库有两种情况:在线场景和离线场景. 1.在线场景在线场景中,context是同一个上下文实例(从DbContext派生),检索和保存实体都通过 ...
Entity Framework入门教程（7)--- EF中的查询方法
这里主要介绍两种查询方法 Linq to entity(L2E)和Sql 1.L2E查询 L2E查询时可以使用linq query语法,或者lambda表达式,默认返回的类型是IQueryable,( ...

随机推荐

tomcat 7 启动超时设置。。。实在太隐蔽了
打开Tomcat,选择 Window->Show View->Servers,在主窗口下的窗口中的Servers标签栏鼠标左键双击tomcat服务器名,例如 Tomcat v7.0 Ser ...
apicloud
<!doctype html> <html class="no-js"> <head> <meta charset="utf-8 ...
【Machine Learning in Action --3】决策树ID3算法
1.简单概念描述决策树的类型有很多,有CART.ID3和C4.5等,其中CART是基于基尼不纯度(Gini)的,这里不做详解,而ID3和C4.5都是基于信息熵的,它们两个得到的结果都是一样的,本次定 ...
Yii2.0的安装与配置教程
版权声明:本文为博主原创文章,未经博主允许不得转载. PHP版本需求:PHP5.4.0以上,因为Yii2.0基于PHP5.4以上版本进行了完全重写. 目前有两种方法可以安装Yii2.0,一种是安装Co ...
Shell变量：Shell变量的定义、删除变量、只读变量、变量类型
http://c.biancheng.net/cpp/shell/ 1.打印 2.运算符
Logistic Regression 模型简介
逻辑回归(Logistic Regression)是机器学习中的一种分类模型,由于算法的简单和高效,在实际中应用非常广泛.本文作为美团机器学习InAction系列中的一篇, 主要关注逻辑回归算法的数学 ...
在Activity之间传递数据—传递值对象
传递有两种方式,一种是类继承自Serializable(Java方式,速度较慢),另一种是类继承自Parcelable(Android方式) 继承自Serializable的时候,实现比较简单,类只需 ...
Linux中seq命令的用法
用于产生从某个数到另外一个数之间的所有整数例一: # seq 1 10 结果是1 2 3 4 5 6 7 8 9 10 例二: #!/bin/bash for i in `seq 1 10`; do ...
JRE与JDK
Java源代码是以*.java的纯文本文件,可以使用任何文本编辑器编写,但不可以执行. JDK是Java语言的开发包,可以将*.java文件编译成可执行Java文件. 可执行Java程序需要JVM才可 ...
去掉uitableveiw多余的分割线
UIView *v = [[UIView alloc] initWithFrame:CGRectZero]; [_tableView setTableFooterView:v];

lucene3.6.1 经典案例 入门教程 （包含从文件中读取content）

lucene3.6.1 经典案例 入门教程 （包含从文件中读取content）的更多相关文章

随机推荐

热门专题

lucene3.6.1 经典案例入门教程（包含从文件中读取content）

lucene3.6.1 经典案例入门教程（包含从文件中读取content）的更多相关文章