Lucene(01)

我的博客园博文地址：http://www.cnblogs.com/tenglongwentian/

Lucene，最新版是Lucene6.2.1，匹配的jdk版本是1.8正式版。
这里用jdk7最后一版，所以用Lucene5.3.3。

新建一个maven项目，如果不会可以参考前面的博文，前面的博文有专门提到如何新建maven项目。
新建的maven项目：<packaging>jar</packaging>，

 <dependencies>

         <!-- https://mvnrepository.com/artifact/org.apache.lucene/lucene-core -->

         <dependency>

             <groupId>org.apache.lucene</groupId>

             <artifactId>lucene-core</artifactId>

             <version>5.5.3</version>

         </dependency>

         <!-- https://mvnrepository.com/artifact/org.apache.lucene/lucene-queryparser -->

         <dependency>

             <groupId>org.apache.lucene</groupId>

             <artifactId>lucene-queryparser</artifactId>

             <version>5.5.3</version>

         </dependency>

         <!-- https://mvnrepository.com/artifact/org.apache.lucene/lucene-analyzers-common -->

         <dependency>

             <groupId>org.apache.lucene</groupId>

             <artifactId>lucene-analyzers-common</artifactId>

             <version>5.5.3</version>

         </dependency>

     </dependencies>

因为我用jdk7，不喜欢每次更新maven仓库都要手动调整项目的jdk版本，所以

 <!-- 源码目录，插件管理等配置 -->

     <build>

         <finalName>Lucene</finalName>

         <plugins>

             <plugin>

                 <groupId>org.apache.maven.plugins</groupId>

                 <artifactId>maven-compiler-plugin</artifactId>

                 <version>3.3</version>

                 <configuration>

                     <!-- 指定source和target的版本 -->

                     <!-- source 指定用哪个版本的编译器对java源码进行编译 -->

                     <source>1.7</source>

                     <!-- target 指定生成的class文件将保证和哪个版本的虚拟机进行兼容 -->

                     <target>1.7</target>

                 </configuration>

             </plugin>

         </plugins>

     </build>

可以这样。

新建两个类：

Indexer

import java.io.File;

import java.io.FileReader;

import java.nio.file.Paths;

import org.apache.lucene.analysis.Analyzer;

import org.apache.lucene.analysis.standard.StandardAnalyzer;

import org.apache.lucene.document.Document;

import org.apache.lucene.document.Field;

import org.apache.lucene.document.TextField;

import org.apache.lucene.index.IndexWriter;

import org.apache.lucene.index.IndexWriterConfig;

import org.apache.lucene.store.Directory;

import org.apache.lucene.store.FSDirectory;

public class Indexer {

    private IndexWriter writer;// 写索引实例

    /**

     * 构造方法实例化IndexWriter

     *

     * @param indexDir

     * @throws Exception

     */

    public Indexer(String indexDir) throws Exception {

        Directory dir = FSDirectory.open(Paths.get(indexDir));

        Analyzer analyzer = new StandardAnalyzer();// 标准分词器

        IndexWriterConfig iwc = new IndexWriterConfig(analyzer);

        writer = new IndexWriter(dir, iwc);

    }

    /**

     * 关闭写索引

     *

     * @throws Exception

     */

    public void close() throws Exception {

        writer.close();

    }

    /**

     * 索引指定目录的所有文件

     *

     * @param dataDir

     * @throws Exception

     */

    public int index(String dataDir) throws Exception {

        File[] files = new File(dataDir).listFiles();

        for (File f : files) {

            indexFile(f);

        }

        return writer.numDocs();

    }

    /**

     * 索引指定文件

     *

     * @param f

     */

    private void indexFile(File f) throws Exception {

        // TODO Auto-generated method stub

        System.out.println("索引文件：" + f.getCanonicalFile());

        Document doc = getDocument(f);

        writer.addDocument(doc);

    }

    /**

     * 获取文档，文档里在设置每个字段

     *

     * @param f

     * @return

     * @throws Exception

     */

    private Document getDocument(File f) throws Exception {

        // TODO Auto-generated method stub

        Document doc = new Document();

        doc.add(new TextField("contents", new FileReader(f)));

        doc.add(new TextField("fileName", f.getName(), Field.Store.YES));

        doc.add(new TextField("fullPath", f.getCanonicalPath(), Field.Store.YES));

        return doc;

    }

    public static void main(String[] args){

        String indexDir="E:\\lucene";

        String dataDir="E:\\lucene\\data";

        Indexer indexer = null;

        int numIndexed=0;

        long start=System.currentTimeMillis();

        try {

            indexer = new Indexer(indexDir);

            numIndexed=indexer.index(dataDir);

        } catch (Exception e) {

            // TODO Auto-generated catch block

            e.printStackTrace();

        }finally {

            try {

                indexer.close();

            } catch (Exception e) {

                // TODO Auto-generated catch block

                e.printStackTrace();

            }

        }

        long end=System.currentTimeMillis();

        System.out.println("索引："+numIndexed+"个文件，花费了"+(end-start)+"毫秒");

    }

}

String indexDir="E:\\lucene";

String dataDir="E:\\lucene\\data";
看到这里不要好奇，盘符随意，在任意盘符根目录下新建文件夹，最好英文无空格，中文未测试，然后拷贝几个txt文件到data文件夹下面，一会测试用的到。
然后运行这个类，可以看到



然后可以在lucene文件夹下看到这几个奇怪的文件，是什么后面会提到，稍安勿躁。

新建另一个类：

Searcher

 import java.nio.file.Paths;

 import org.apache.lucene.analysis.Analyzer;

 import org.apache.lucene.analysis.standard.StandardAnalyzer;

 import org.apache.lucene.document.Document;

 import org.apache.lucene.index.DirectoryReader;

 import org.apache.lucene.index.IndexReader;

 import org.apache.lucene.queryparser.classic.QueryParser;

 import org.apache.lucene.search.IndexSearcher;

 import org.apache.lucene.search.Query;

 import org.apache.lucene.search.ScoreDoc;

 import org.apache.lucene.search.TopDocs;

 import org.apache.lucene.store.Directory;

 import org.apache.lucene.store.FSDirectory;

 public class Searcher {

     public static void search(String indexDir, String q) throws Exception {

         Directory dir = FSDirectory.open(Paths.get(indexDir));

         IndexReader reader = DirectoryReader.open(dir);

         IndexSearcher is = new IndexSearcher(reader);

         Analyzer analyzer = new StandardAnalyzer();

         QueryParser parse = new QueryParser("contents", analyzer);

         Query query = parse.parse(q);

         long start = System.currentTimeMillis();

         TopDocs hits = is.search(query, 10);

         long end = System.currentTimeMillis();

         System.out.println("匹配" + q + "，总共花费" + (end - start) + "毫秒，" + "查询到" + hits.totalHits + "个记录");

         for (ScoreDoc scoreDoc : hits.scoreDocs) {

             Document doc = is.doc(scoreDoc.doc);

             System.out.println(doc.get("fullPath"));

         }

         reader.close();

     }

     public static void main(String[] args) {

         String indexDir = "E:\\lucene";

         //String q = "LICENSE-2.0";

         String q = "Zygmunt Saloni";

         try {

             search(indexDir, q);

         } catch (Exception e) {

             // TODO Auto-generated catch block

             e.printStackTrace();

         }

     }

 }

运行这个类，

不要把第一个类生成的几个特殊的文件删除，任性的话，试试看，会报错，如果删除运行第一个类生成的几个特殊的奇怪文件后再运行第二个类的时候会报错。

还是任性的试试看吧。

对比String q = "Zygmunt Saloni";事实证明没什么影响，因为分词了，整体切割。

加上-运行第二个类的话，结果一样，自己试试看。

转载请注明出处，谢谢。

Lucene(01)的更多相关文章

Lucene 01 - 初步认识全文检索和Lucene
目录 1 搜索简介 1.1 搜索实现方案 1.2 数据查询方法 1.2.1 顺序扫描法 1.2.2 倒排索引法(反向索引) 1.3 搜索技术应用场景 2 Lucene简介 2.1 Lucene是什么 ...
lucene&solr-day1
全文检索课程 Lucene&Solr(1) 1. 计划第一天:Lucene的基础知识 1.案例分析:什么是全文检索,如何实现全文检索 2.Lucene实现全文检索的流程 a) ...
ES 01 - Elasticsearch入门 + 基础概念学习
目录 1 Elasticsearch概述 1.1 Elasticsearch是什么 1.2 Elasticsearch的优点 1.3 Elasticsearch的相关产品 1.4 Elasticsea ...
JAVAEE——Lucene基础：什么是全文检索、Lucene实现全文检索的流程、配置开发环境、索引库创建与管理
1. 学习计划第一天:Lucene的基础知识 1.案例分析:什么是全文检索,如何实现全文检索 2.Lucene实现全文检索的流程 a) 创建索引 b) 查询索引 3.配置开发环境 4.创建索引库 5 ...
全文搜索技术—Lucene
1. 内容安排实现一个文件的搜索功能,通过关键字搜索文件,凡是文件名或文件内容包括关键字的文件都需要找出来.还可以根据中文词语进程查询,并且支持多种条件查询. 本案例中的原始内容就是磁盘上的文件 ...
Elasticsearch入门 + 基础概念学习
原文地址:https://www.cnblogs.com/shoufeng/p/9887327.html 目录 1 Elasticsearch概述 1.1 Elasticsearch是什么 1.2 E ...
Lucene.Net简单例子-01
前面已经简单介绍了Lucene.Net,下面来看一个实际的例子 1.1 引用必要的bll文件.这里不再介绍(Lucene.Net PanGu PanGu.HightLight PanGu.Luc ...
01 lucene基础北风网项目培训 Lucene实践课程索引
在创建索引的过程中IndexWriter会创建多个对应的Segment,这个Segment就是对应一个实体的索引段.随着索引的创建,Segment会慢慢的变大.为了提高索引的效率,IndexWrite ...
01 lucene基础北风网项目培训 Lucene实践课程系统架构
Lucene在搜索的时候数据源可以是文件系统,数据库,web等等. Lucene的搜索是基于索引,Lucene是基于前面建立的索引之上进行搜索的. 使用Lucene就像使用普通的数据库一样. Luce ...

随机推荐

Android 在View中更新View
直接用Invalidate()方法会导致错误:只有主线程才能更新UI 取而代之的是可以使用postInvalidate(); 原因: 最终会调用ViewRootImpl类的dispatchInvali ...
atitit 点播系统概览 v2 qb1.docx
atitit 点播系统概览 v2 qb1.docx 1.1. 多界面(可以挂载多个不同的界面主题)1 1.2. 独立的选片模块(跨设备,跨平台)2 1.3. 跨设备平台(android安卓盒子,pc ...
paip.重装系统需要备份的资料总结..v2.0 cad
paip.重装系统需要备份的资料总结..v2.0 cad 这里我的系统装在C盘..所以需要备份C盘的东西就好了.. 作者Attilax 艾龙, EMAIL:1466519819@qq.com ...
JS 内置对象
内置对象的定义:有ECMAScript实现提供的.不依赖与宿主环境的对象,在ECMAScript运行之前就已经创建好的对象就叫做内置对象. 就是说,是不需要我们开发人员先是的实例化对象就能够调用和运行 ...
PHP实现RESTful风格的API实例（二）
接前一篇PHP实现RESTful风格的API实例(一) Response.php :包含一个Request类,即输出类.根据接收到的Content-Type,将Request类返回的数组拼接成对应的格 ...
WPF入门教程系列八——布局之Grid与UniformGrid（三）
五. Grid Grid顾名思义就是“网格”,它的子控件被放在一个一个实现定义好的小格子里面,整齐配列. Grid和其他各个Panel比较起来,功能最多也最为复杂.要使用Grid,首先要向RowDef ...
px 与 dp, sp换算公式？
PPI = Pixels per inch,每英寸上的像素数,即 "像素密度" xhdpi: 2.0 hdpi: 1.5 mdpi: 1.0 (baseline) ldpi: 0. ...
开发笔记：基于EntityFramework.Extended用EF实现指定字段的更新
今天在将一个项目中使用存储过程的遗留代码迁移至新的架构时,遇到了一个问题——如何用EF实现数据库中指定字段的更新(根据UserId更新Users表中的FaceUrl与AvatarUrl字段)? 原先调 ...
Bjarne Stroustrup对C++程序员的忠告
转自:http://blog.csdn.net/adm_qxx/archive/2007/05/20/1617488.aspx 第1章致读者 [1] 在编写程序时,你是在为你针对某个问题的解决方 ...
硬刚Google ，这家小公司的增长团队长啥样
背景: AdRoll 是一家主打重定向广告(Retargeting)服务的技术公司,基于用户浏览记录等信息,为广告主提供几乎瞬时的广告位购买服务,当前估值15.5亿美元.吊打谷歌, AdRoll 已经 ...

Lucene(01)

Lucene(01)的更多相关文章

随机推荐

热门专题