Lucene学习笔记：基础

Lucence是Apache的一个全文检索引擎工具包。可以将采集的数据存储到索引库中，然后在根据查询条件从索引库中取出结果。索引库可以存在内存中或者存在硬盘上。

本文主要是参考了这篇博客进行学习的，原博客地址https://blog.csdn.net/bskfnvjtlyzmv867/article/details/80914156

主要开发流程是：采集数据，将数据转化成索引文档，然后存储在索引库中，索引库可以保存在内存中，或者保存在硬盘上。在查询的时候通过索引库查询结果，返回数据。

下面的例子主要是将Product表中的数据存储到索引库中，并通过索引库进行查询。项目依赖的jar包可以参考原博客，我用的Lucence版本是4.7。

新建实体类Product，其代码如下：

public class Product {

    private Long id;

    private String title;

    private String sellPoint;

}

将Product实体转化成索引库中Document，并存到索引库中。Product数据可以从数据库中查询，然后通过此方法转化成索引库中的Document，此处省略从数据库查询Product的逻辑。

import java.io.IOException;

import java.nio.file.Path;

import java.nio.file.Paths;

import org.apache.lucene.analysis.Analyzer;

import org.apache.lucene.analysis.standard.StandardAnalyzer;

import org.apache.lucene.document.Document;

import org.apache.lucene.document.Field;

import org.apache.lucene.document.StringField;

import org.apache.lucene.document.TextField;

import org.apache.lucene.index.IndexWriter;

import org.apache.lucene.index.IndexWriterConfig;

import org.apache.lucene.store.Directory;

import org.apache.lucene.store.FSDirectory;

import org.apache.lucene.util.Version;

import entity.Product;

public class ProductRepository {

    public void createIndex(Product product) {

        Field id = new StringField("id", product.getId().toString(), Field.Store.YES);

        Field title = new TextField("title", product.getTitle().toString(), Field.Store.YES);

        Field sellPoint = new TextField("sellPoint", product.getSellPoint().toString(), Field.Store.YES);

        Document document = new Document();

        document.add(id);

        document.add(title);

        document.add(sellPoint);

        Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_47);

        IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_47, analyzer);

　　　　　

        Path path = Paths.get("D:/develop/workspace/slem_compass/data");

        try {

            Directory directory = FSDirectory.open(path.toFile());

            IndexWriter indexWriter = new IndexWriter(directory, config);

            indexWriter.addDocument(document);

            indexWriter.close();

        } catch (IOException e) {

            e.printStackTrace();

        }

    }

}

其中上面的代码中Path是索引库在硬盘上的位置，我这里是放在D盘上的某个文件夹内。

下面如何从索引库中查询数据呢？我写了一个Servlet，用户提交查询关键字，request获取到后，根据关键字从索引库中查询数据。当然也可以用Main方法或者test测试类。

import java.io.IOException;

import java.nio.file.Path;

import java.nio.file.Paths;

import javax.servlet.ServletException;

import javax.servlet.annotation.WebServlet;

import javax.servlet.http.HttpServlet;

import javax.servlet.http.HttpServletRequest;

import javax.servlet.http.HttpServletResponse;

import org.apache.lucene.analysis.Analyzer;

import org.apache.lucene.analysis.standard.StandardAnalyzer;

import org.apache.lucene.document.Document;

import org.apache.lucene.index.DirectoryReader;

import org.apache.lucene.index.IndexReader;

import org.apache.lucene.queryparser.classic.QueryParser;

import org.apache.lucene.search.IndexSearcher;

import org.apache.lucene.search.Query;

import org.apache.lucene.search.ScoreDoc;

import org.apache.lucene.search.TopDocs;

import org.apache.lucene.store.Directory;

import org.apache.lucene.store.FSDirectory;

import org.apache.lucene.util.Version;

@WebServlet("/search")

public class SearchServlet extends HttpServlet {

    private static final long serialVersionUID = 1L;

    public SearchServlet() {

        super();

    }

    protected void doGet(HttpServletRequest request, HttpServletResponse response)

            throws ServletException, IOException {

        request.setCharacterEncoding("utf-8");

        Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_47);

        QueryParser parser = new QueryParser(Version.LUCENE_47, "title", analyzer);

        String title = request.getParameter("title");

        System.out.println("");

        System.out.println("title: " + title);

        try {

            Query query = parser.parse(title);

            Path path = Paths.get("D:/develop/workspace/slem_compass/data");

            Directory directory = FSDirectory.open(path.toFile());

            IndexReader reader = DirectoryReader.open(directory);

            IndexSearcher indexSearcher = new IndexSearcher(reader);

            TopDocs topDocs = indexSearcher.search(query, 10);

            ScoreDoc[] scoreDocs = topDocs.scoreDocs;

            for (ScoreDoc scoreDoc : scoreDocs) {

                int docID = scoreDoc.doc;

                Document doc = indexSearcher.doc(docID);

                System.out.println(doc.get("id") + " " + doc.get("title") + " " + doc.get("sellPoint"));

            }

            System.out.println("");

            reader.close();

        } catch (Exception e) {

            e.printStackTrace();

        }

        response.setContentType("text/html;charset=utf-8");

        response.getWriter().append("Served at: ").append(request.getContextPath());

    }

    protected void doPost(HttpServletRequest request, HttpServletResponse response)

            throws ServletException, IOException {

        doGet(request, response);

    }

}

查询的时候也是从D盘上的索引库中读取相应的信息，然后根据关键字进行查询。

这样就完成了索引库的存储和查询。索引的查询很复杂，上面的demo只是一个比较简单的例子，说明大致的原理，后面继续补充索引的查询。

Lucene学习笔记：基础的更多相关文章

jQuery学习笔记 - 基础知识扫盲入门篇
jQuery学习笔记 - 基础知识扫盲入门篇 2013-06-16 18:42 by 全新时代, 11 阅读, 0 评论, 收藏, 编辑 1.为什么要使用jQuery? 提供了强大的功能函数解决浏览器 ...
Lucene学习笔记（更新）
1.Lucene学习笔记 http://www.cnblogs.com/hanganglin/articles/3453415.html
Python学习笔记基础篇——总览
Python初识与简介[开篇] Python学习笔记——基础篇[第一周]——变量与赋值.用户交互.条件判断.循环控制.数据类型.文本操作 Python学习笔记——基础篇[第二周]——解释器.字符串.列 ...
数论算法剩余系相关学习笔记 (基础回顾,(ex)CRT,(ex)lucas,(ex)BSGS,原根与指标入门,高次剩余,Miller_Rabin+Pollard_Rho)
注:转载本文须标明出处. 原文链接https://www.cnblogs.com/zhouzhendong/p/Number-theory.html 数论算法剩余系相关学习笔记 (基础回顾,(ex ...
《python基础教程（第二版）》学习笔记基础部分（第1章）
<python基础教程(第二版)>学习笔记基础部分(第1章)python常用的IDE:Windows: IDLE(gui), Eclipse+PyDev; Python(command ...
Java学习笔记——基础篇
Tips1:eclipse中会经常用到System.out.println方法,可以先输入syso,然后eclipse就会自动联想出这个语句了!! 学习笔记: *包.权限控制 1.包(package) ...
Apache Lucene学习笔记
Hadoop概述 Apache lucene: 全球第一个开源的全文检索引擎工具包完整的查询引擎和搜索引擎部分文本分析引擎开发人员在此基础建立完整的全文检索引擎以下为转载:http://www ...
iOS学习笔记——基础控件（上）
本篇简单罗列一下一些常用的UI控件以及它们特有的属性,事件等等.由于是笔记,相比起来不会太详细 UIView 所有UI控件都继承于这个UIView,它所拥有的属性必是所有控件都拥有,这些属性都是控件最 ...
Lucene学习笔记
师兄推荐我学习Lucene这门技术,用了两天时间,大概整理了一下相关知识点. 一.什么是Lucene Lucene即全文检索.全文检索是计算机程序通过扫描文章中的每一个词,对每一个词建立一个索引,指明 ...

随机推荐

django之注册登录
清理session数据,自此django的认证登陆登出功能完成,但是此处有个问题,就是当用户在手动关闭浏览器的时候,session数据不会自动失效,数据库的session数据也不会自动删除,所以需要在 ...
Maven 打包项目部署到服务器重启服务插件
1.maven插件wagon-maven-plugin自动部署远程Linux服务器 (http://xxgblog.com/2015/10/23/wagon-maven-plugin/) <p ...
【转】BFG Repo-Cleaner: Removes large or troublesome blobs like git-filter-branch does, but faster.
https://rtyley.github.io/bfg-repo-cleaner/ an alternative to git-filter-branch The BFG is a simpler, ...
[原创] debian 9.3 搭建Jira+Confluence+Bitbucket项目管理工具(一) -- 安装jdk(含jre)及 MySql 5.6.39
[原创] debian 9.3 搭建Jira+Confluence+Bitbucket项目管理工具(一) -- 安装jdk(含jre)及 MySql 5.6.39 回老家已经有一段时间了, 四五线 ...
python安装与pip操作
python安装 1, 下载并解压Python-3.6.2.tar.xz 2,tar xvJf Python-3.6.2.tar.xz 2./configure --prefix=/usr/local ...
windows下vmware配置nat网络
linux学习需要配置网络,可以选择桥接网络,nat网络地址转换. 由于linux的服务,众多需要配置一个固定的ip,因此可以选择静态ip配置. 因此在这里自定义nat网络地址转换,可以固定一台lin ...
python flask 解决中文乱码
response = make_response(output_string)response.headers['Content-Type'] = 'text/plain;charset=UTF-8' ...
python入门（十三）：面向对象（继承、重写、公有、私有）
1. 三种类定义的写法 class P1:#定义类加不加()都可以 pass class P2(): #带(),且括号中为空,类定义 pass ...
基于Dapper写的一个sqlhelp适用于多版本数据库
ConnectionInit方法用于初始化数据库连接对象, 只需要修改databasetype参数即可进行适用各个版本的数据库, ExecuteNonQuery方法用于执行增.删.改操作,返回受影响的 ...
关于Linux 文件权限的思考
Linux文件系统每个文件分为inode和block,inode中包含一些基本信息(文件名,类型,长度,修改时间,权限等待),并且指向包含文件真实内容的block,而目录是文件的一种,其block的内 ...

Lucene学习笔记：基础

Lucene学习笔记：基础的更多相关文章

随机推荐

热门专题