Lucene学习笔记1(V7.1)

Lucene是一个搜索类库,solr、nutch和elasticsearch都是基于Lucene。个人感觉学习高级搜索引擎应用程序之前有必要了解Lucene。

开发环境：idea maven springboot

开始贴代码：

maven配置

 <parent>

        <groupId>org.springframework.boot</groupId>

        <artifactId>spring-boot-starter-parent</artifactId>

        <version>1.4..RELEASE</version>

    </parent>

    <properties>

        <java.version>1.8</java.version>

    </properties>

    <dependencies>

        <dependency>

            <groupId>org.springframework.boot</groupId>

            <artifactId>spring-boot-starter</artifactId>

        </dependency>

        <dependency>

            <groupId>org.springframework.boot</groupId>

            <artifactId>spring-boot-starter-thymeleaf</artifactId>

        </dependency>

        <!-- hot swapping, disable cache for template, enable live reload -->

        <dependency>

            <groupId>org.springframework.boot</groupId>

            <artifactId>spring-boot-devtools</artifactId>

            <optional>true</optional>

        </dependency>

            <!--Lucene-->

            <dependency>

                <groupId>org.apache.lucene</groupId>

                <artifactId>lucene-core</artifactId>

                <version>7.1.</version>

            </dependency>

            <!--中文分词器,一般分词器适用于英文分词(common)-->

            <dependency>

                <groupId>org.apache.lucene</groupId>

                <artifactId>lucene-analyzers-smartcn</artifactId>

                <version>7.1.</version>

            </dependency>

            <dependency>

                <groupId>org.apache.lucene</groupId>

                <artifactId>lucene-queryparser</artifactId>

                <version>7.1.</version>

            </dependency>

            <!--检索关键字高亮显示-->

            <dependency>

                <groupId>org.apache.lucene</groupId>

                <artifactId>lucene-highlighter</artifactId>

                <version>7.1.</version>

            </dependency>

            <!--Lucene-->

            <dependency>

                <groupId>junit</groupId>

                <artifactId>junit</artifactId>

                <version>4.12</version>

            </dependency>

    </dependencies>

    <build>

        <plugins>

            <!-- Package as an executable jar/war -->

            <plugin>

                <groupId>org.springframework.boot</groupId>

                <artifactId>spring-boot-maven-plugin</artifactId>

            </plugin>

        </plugins>

    </build>

辅助类

public class LuceneConstants {

    public static final String CONTENTS="contents";

    public static final String FILE_NAME="filename";

    public static final String FILE_PATH="filepath";

    public static final int MAX_SEARCH = ;

    public  static final String IndexDir ="E:\\Lucene\\Index";

    public  static final String DataDir ="E:\\Lucene\\Data";

    public  static final String ArticleDir ="E:\\Lucene\\Files\\article.txt";

}

调用Lucene

public class Indexer {

    public void addEntity() throws IOException {

        Article article = new Article();

        //article.setId(1);

        //article.setTitle("Lucene全文检索");

        //article.setContent("Lucene是apache软件基金会4 jakarta项目组的一个子项目，是一个开放源代码的全文检索引擎工具包，但它不是一个完整的全文检索引擎，而是一个全文检索引擎的架构，提供了完整的查询引擎和索引引擎，部分文本分析引擎（英文与德文两种西方语言）。");

        article.setId();

        article.setTitle("Solr搜索引擎");

        article.setContent("Solr是基于Lucene框架的搜索莹莹程序，是一个开放源代码的全文检索引擎。");

        final Path path = Paths.get(LuceneConstants.IndexDir);

        Directory directory = FSDirectory.open(path);//索引存放目录 存在磁盘

        //Directory RAMDirectory= new RAMDirectory();// 存在内存

        Analyzer analyzer = new StandardAnalyzer();

        IndexWriterConfig indexWriterConfig = new IndexWriterConfig(analyzer);

        //indexWriterConfig.setOpenMode(IndexWriterConfig.OpenMode.CREATE);

        indexWriterConfig.setOpenMode(IndexWriterConfig.OpenMode.APPEND);

        IndexWriter indexWriter = new IndexWriter(directory, indexWriterConfig);//更新或创建索引

        Document document = new Document();

        document.add(new TextField("id", article.getId().toString(), Field.Store.YES));

        document.add(new TextField("title", article.getTitle(), Field.Store.YES));

        document.add(new TextField("content", article.getContent(), Field.Store.YES));

        indexWriter.addDocument(document);

        indexWriter.close();

    }

    public void addFile() throws IOException {

        final Path path = Paths.get(LuceneConstants.IndexDir);

        Directory directory = FSDirectory.open(path);

        Analyzer analyzer=new StandardAnalyzer();

        IndexWriterConfig indexWriterConfig=new IndexWriterConfig(analyzer);

        indexWriterConfig.setOpenMode(IndexWriterConfig.OpenMode.CREATE);

        IndexWriter indexWriter=new IndexWriter(directory,indexWriterConfig);

        InputStreamReader isr = new InputStreamReader(new FileInputStream(LuceneConstants.ArticleDir), "GBK");//.txt文档,不设置格式会乱码

        BufferedReader bufferedReader=new BufferedReader(isr);

        String content="";

        while ((content=bufferedReader.readLine())!=null){

            Document document=new Document();

            document.add(new TextField("content",content,Field.Store.YES) );

            indexWriter.addDocument(document);

        }

        bufferedReader.close();

        indexWriter.close();

    }

    public List<String> SearchFiles() throws IOException, ParseException {

        String queryString = "Solr";

        final Path path = Paths.get(LuceneConstants.IndexDir);

        Directory directory = FSDirectory.open(path);//索引存储位置

        Analyzer analyzer = new StandardAnalyzer();//分析器

        //单条件

        //关键词解析

        //QueryParser queryParser=new QueryParser("content",analyzer);

        //Query query=queryParser.parse(queryString);

        //多条件

        Query mQuery = MultiFieldQueryParser.parse(new String[]{"Solr"},new String[]{"content"},new StandardAnalyzer());

        IndexReader indexReader = DirectoryReader.open(directory);//索引阅读器

        IndexSearcher indexSearcher = new IndexSearcher(indexReader);//查询

        //TopDocs topDocs=indexSearcher.search(query,3);

        TopDocs topDocs=indexSearcher.search(mQuery,);

        long count = topDocs.totalHits;

        ScoreDoc[] scoreDocs = topDocs.scoreDocs;

        List<String> list=new ArrayList<String>();

        list.add(String.valueOf(count));

        Integer cnt=;

        for (ScoreDoc scoreDoc : scoreDocs) {

            Document document = indexSearcher.doc(scoreDoc.doc);

            //list.add(cnt.toString()+"-"+"相关度："+scoreDoc.score+"-----time:"+document.get("time"));

            //list.add("|||");

            //list.add(cnt.toString()+"-"+document.get("content"));

            list.add(document.get("content"));

            cnt++;

        }

        return  list;

    }

}

查看运行效果

@Controller

public class LuceneController {

    @RequestMapping("/add")

    public String welcomepage(Map<String, Object> model) {

        try {

            Indexer indexer = new Indexer();

            indexer.addEntity();

            model.put("message", "Success");

        } catch (IOException ex) {

            model.put("message", "Failure");

        }

        return "welcome";

    }

    @RequestMapping("/file")

    public String fileindex(Map<String, Object> model) {

        try {

            Indexer indexer = new Indexer();

            indexer.addFile();

            model.put("message", "SuccessF");

        } catch (IOException ex) {

            model.put("message", "FailureF");

        }

        return "welcome";

    }

    @RequestMapping("/search")

    public String searchindex(Map<String, Object> model) {

        try {

            Indexer indexer = new Indexer();

            List<String> rlts = indexer.SearchFiles();

            String message = "";

            for (String str : rlts) {

                message += str + " ";

            }

            model.put("message", message);

        } catch (Exception ex) {

            model.put("message", "FailureF");

        }

        return "welcome";

    }

}

Lucene学习笔记1(V7.1)的更多相关文章

Lucene学习笔记（更新）
1.Lucene学习笔记 http://www.cnblogs.com/hanganglin/articles/3453415.html
Lucene学习笔记2-Lucene的CRUD(V7.1)
在进行CRUD的时候请注意IndexWriterConfig的设置. public class IndexCRUD { "}; private String citys[]={"j ...
Apache Lucene学习笔记
Hadoop概述 Apache lucene: 全球第一个开源的全文检索引擎工具包完整的查询引擎和搜索引擎部分文本分析引擎开发人员在此基础建立完整的全文检索引擎以下为转载:http://www ...
Lucene学习笔记
师兄推荐我学习Lucene这门技术,用了两天时间,大概整理了一下相关知识点. 一.什么是Lucene Lucene即全文检索.全文检索是计算机程序通过扫描文章中的每一个词,对每一个词建立一个索引,指明 ...
Lucene学习笔记：四，Lucene索引过程分析
对于Lucene的索引过程,除了将词(Term)写入倒排表并最终写入Lucene的索引文件外,还包括分词(Analyzer)和合并段(merge segments)的过程,本次不包括这两部分,将在以后 ...
Solr学习笔记1(V7.2)
下载压缩包http://archive.apache.org/dist/lucene/,解压后放到某一盘符下面 Windows下启动命令 :\solr-7.2.0>bin\solr.cmd st ...
Lucene学习笔记：基础
Lucence是Apache的一个全文检索引擎工具包.可以将采集的数据存储到索引库中,然后在根据查询条件从索引库中取出结果.索引库可以存在内存中或者存在硬盘上. 本文主要是参考了这篇博客进行学习的,原 ...
Lucene学习笔记：五，Lucene搜索过程解析
一.Lucene搜索过程总论搜索的过程总的来说就是将词典及倒排表信息从索引中读出来,根据用户输入的查询语句合并倒排表,得到结果文档集并对文档进行打分的过程. 其可用如下图示: 总共包括以下几个过程: ...
lucene学习笔记：三，Lucene的索引文件格式
Lucene的索引里面存了些什么,如何存放的,也即Lucene的索引文件格式,是读懂Lucene源代码的一把钥匙. 当我们真正进入到Lucene源代码之中的时候,我们会发现: Lucene的索引过程, ...

随机推荐

K：java中枚举的常见用法
用法一:常量在JDK1.5 之前,我们定义常量都是: public static fianl.....现在好了,有了枚举,可以把相关的常量分组到一个枚举类型里,而且枚举提供了比常量更多的方法. ...
ubuntu环境下python虚拟环境的安装
一. 虚拟环境搭建在开发中安装模块的方法: pip install 模块名称之前我们安装模块都是直接在物理环境下安装,这种安装方法,后面一次安装的会覆盖掉前面一次安装的.那如果一台机器上面开发多个 ...
漂亮的提示框SweetAlert使用教程
一.简介所使用过的弹出框插件,SweetAlert是最好用的.发展至今,已经有两个版本,一个是原版 t4t5/sweetalert , 一个是分支版 limonte/sweetalert2 ,更新相 ...
Python day02 三元运算
type 查看数据类型.2 **32 :2的32次方 .浮点的表示类型是小数,但是小数不仅仅包括浮点浮点数用来处理实数,即带有小数的数字三元运算: result = 值1 if 条件 el ...
Python面向对象篇（1）-类和对象
面向对象编程 1.编程范式我们写代码的目的是什么?就是为了能够让计算机识别我们所写的代码并完成我们的需求,规范点说,就是通过编程,用特定的语法+数据结构+特殊算法来让计算机执行特定的功能,实现一 ...
HTTP 错误 500.19 - Internal Server Error 0x80070005 0x80070003
IIS发布时错误错误代码 0x80070005 一.权限:设置文件权限--属性-安全-添加everyone的读取权限(注意是给整个发布文件设置权限而不是config) 二.查看物理路径中是否存在中文 ...
python3之装饰器
1.装饰器装饰器本质上是一个python函数,它可以让其他函数在不需要做任何代码变动的前提下增加额外功能,装饰器的返回值也是一个函数对象.它经常用于有切面需求的场景,比如:插入日志.性能测试.事务处 ...
C++继承分析
面向对象的三大特性之一就是继承,继承运行我么重用基类中已经存在的内容,这样就简化了代码的编写工作.继承中有三种继承方式即:public protected private,这三种方式规定了不同的访问权 ...
ASP.NET Core中使用IOC三部曲(一.使用ASP.NET Core自带的IOC容器)
前言本文主要是详解一下在ASP.NET Core中,自带的IOC容器相关的使用方式和注入类型的生命周期. 这里就不详细的赘述IOC是什么以及DI是什么了.. emm..不懂的可以自行百度. 目录 ...
java.lang.Class类中的某些方法
反射的代码会经常遇到,Class类中方法真的多,且用的少,大多用在底层源码这块,既然看到了,就记录一下吧,说不定以后厉害了,自己封装框架,haha getComponentType()方法: Syst ...

Lucene学习笔记1(V7.1)

Lucene学习笔记1(V7.1)的更多相关文章

随机推荐

热门专题