springboot+lucene实现公众号关键词回复智能问答

一、场景简介

　　最近在做公众号关键词回复方面的智能问答相关功能，发现用户输入提问内容和我们运营配置的关键词匹配回复率极低，原因是我们采用的是数据库的Like匹配。

这种模糊匹配首先不是很智能，而且也没有具体的排序功能。为了解决这一问题，我引入了分词器+Lucene来实现智能问答。

二、功能实现

本功能采用springboot项目中引入Lucene相关包，然后实现相关功能。前提大家对springboot要有一定了解。

POM引入Lucene依赖

<!--lucene核心包-->

        <dependency>

            <groupId>org.apache.lucene</groupId>

            <artifactId>lucene-core</artifactId>

            <version>7.6.0</version>

        </dependency>

        <!--对分词索引查询解析-->

        <dependency>

            <groupId>org.apache.lucene</groupId>

            <artifactId>lucene-queryparser</artifactId>

            <version>7.6.0</version>

        </dependency>

        <!-- smartcn中文分词器 -->

        <dependency>

            <groupId>org.apache.lucene</groupId>

            <artifactId>lucene-analyzers-smartcn</artifactId>

            <version>7.6.0</version>

        </dependency>

初始化Lucene相关配置Bean

初始化bean类需要知道的几点：

1.实例化 IndexWriter，IndexSearcher 都需要去加载索引文件夹，实例化是是非常消耗资源的，所以我们希望只实例化一次交给spring管理。

2.IndexSearcher 我们一般通过SearcherManager管理，因为IndexSearcher 如果初始化的时候加载了索引文件夹，那么

后面添加、删除、修改的索引都不能通过IndexSearcher 查出来，因为它没有与索引库实时同步，只是第一次有加载。

3.ControlledRealTimeReopenThread创建一个守护线程，如果没有主线程这个也会消失，这个线程作用就是定期更新让SearchManager管理的search能获得最新的索引库，下面是每25S执行一次。

5.要注意引入的lucene版本，不同的版本用法也不同，许多api都有改变。

/**

 * @author mazhq

 * @Title: LuceneConfig

 * @date 2019/9/5 11:29

 */

@Configuration

public class LuceneConfig {

    /**

     * lucene索引,存放位置

     */

    private static final String LUCENE_INDEX_PATH = "lucene/indexDir/";

    /**

     * 创建一个 Analyzer 实例

     */

    @Bean

    public Analyzer analyzer() {

        return new SmartChineseAnalyzer();

    }

    /**

     * 索引位置

     */

    @Bean

    public Directory directory() throws IOException {

        Path path = Paths.get(LUCENE_INDEX_PATH);

        File file = path.toFile();

        if (!file.exists()) {

            //如果文件夹不存在,则创建

            file.mkdirs();

        }

        return FSDirectory.open(path);

    }

    /**

     * 创建indexWriter

     */

    @Bean

    public IndexWriter indexWriter(Directory directory, Analyzer analyzer) throws IOException {

        IndexWriterConfig indexWriterConfig = new IndexWriterConfig(analyzer);

        IndexWriter indexWriter = new IndexWriter(directory, indexWriterConfig);

        // 清空索引

        indexWriter.deleteAll();

        indexWriter.commit();

        return indexWriter;

    }

    /**

     * SearcherManager管理

     * ControlledRealTimeReopenThread创建一个守护线程，如果没有主线程这个也会消失，

     * 这个线程作用就是定期更新让SearchManager管理的search能获得最新的索引库，下面是每25S执行一次。

     */

    @Bean

    public SearcherManager searcherManager(Directory directory, IndexWriter indexWriter) throws IOException {

        SearcherManager searcherManager = new SearcherManager(indexWriter, false, false, new SearcherFactory());

        ControlledRealTimeReopenThread cRTReopenThead = new ControlledRealTimeReopenThread(indexWriter, searcherManager,

                5.0, 0.025);

        cRTReopenThead.setDaemon(true);

        //线程名称

        cRTReopenThead.setName("更新IndexReader线程");

        // 开启线程

        cRTReopenThead.start();

        return searcherManager;

    }

}

初始化索引库

项目启动后，重建索引库中所有的索引。

@Component

@Order(value = 1)

public class AutoReplyMsgRunner implements ApplicationRunner {

    @Autowired

    private LuceneManager luceneManager;

    @Override

    public void run(ApplicationArguments args) throws Exception {

        luceneManager.createAutoReplyMsgIndex();

    }

}

从数据库中查出所有配置的消息回复内容，并创建这些内容的索引。

索引相关介绍：

我们知道，mysql对每个字段都定义了字段类型，然后根据类型保存相应的值。

那么lucene的存储对象是以document为存储单元，对象中相关的属性值则存放到Field（域）中；

Field类的常用类型

Field类	数据类型	是否分词	index是否索引	Stored是否存储	说明
StringField	字符串	N	Y	Y/N	构建一个字符串的Field,但不会进行分词,将整串字符串存入索引中,适合存储固定(id,身份证号,订单号等)
FloatPoint LongPoint DoublePoint	数值型	Y	Y	N	这个Field用来构建一个float数字型Field，进行分词和索引，比如(价格)
StoredField	重载方法,，支持多种类型	N	N	Y	这个Field用来构建不同类型Field,不分析，不索引，但要Field存储在文档中
TextField	字符串或者流	Y	Y	Y/N	一般此对字段需要进行检索查询

上面是一些常用的数据类型, 6.0后的版本，数值型建立索引的字段都更改为Point结尾，FloatPoint，LongPoint，DoublePoint等，对于浮点型的docvalue是对应的DocValuesField，整型为NumericDocValuesField，FloatDocValuesField等都为NumericDocValuesField的实现类。

commit()的用法

commit()方法,indexWriter.addDocuments(docs);只是将文档放在内存中,并没有放入索引库,没有commit()的文档,我从索引库中是查询不出来的;

许多博客代码中,都没有进行commit(),但仍然能查出来,因为每次插入,他都把IndexWriter关闭.close(),Lucene关闭前,都会把在内存的文档,提交到索引库中,索引能查出来,在spring中IndexWriter是单例的,不关闭,所以每次对索引都更改时,都需要进行commit()操作;

@Service

public class LuceneManager {

    @Autowired

    private IndexWriter indexWriter;

    @Autowired

    private AutoReplyMsgDao autoReplyMsgDao;

    public void createAutoReplyMsgIndex() throws IOException {

        List<AutoReplyMsg> autoReplyMsgList = autoReplyMsgDao.findAllTextConfig();

        if(autoReplyMsgList != null){

            List<Document> docs = new ArrayList<Document>();

            for (AutoReplyMsg autoReplyMsg:autoReplyMsgList) {

                Document doc = new Document();

                doc.add(new StringField("id", autoReplyMsg.getGuid()+"", Field.Store.YES));

                doc.add(new TextField("keywords", autoReplyMsg.getReceiveContent(), Field.Store.YES));

                doc.add(new StringField("replyMsgType", autoReplyMsg.getReplyMsgType()+"", Field.Store.YES));

                doc.add(new StringField("replyContent", autoReplyMsg.getReplyContent()==null?"":autoReplyMsg.getReplyContent(), Field.Store.YES));

                doc.add(new StringField("title", autoReplyMsg.getTitle()==null?"":autoReplyMsg.getTitle(), Field.Store.YES));

                doc.add(new StringField("picUrl", autoReplyMsg.getPicUrl()==null?"":autoReplyMsg.getPicUrl(), Field.Store.YES));

                doc.add(new StringField("url", autoReplyMsg.getUrl()==null?"":autoReplyMsg.getUrl(), Field.Store.YES));

                doc.add(new StringField("mediaId", autoReplyMsg.getMediaId()==null?"":autoReplyMsg.getMediaId(), Field.Store.YES));

                docs.add(doc);

            }

            indexWriter.addDocuments(docs);

            indexWriter.commit();

        }

    }

}

智能查询

searcherManager.maybeRefresh()方法,刷新searcherManager中的searcher,获取到最新的IndexSearcher。

@Service

public class SearchManager {

    @Autowired

    private Analyzer analyzer;

    @Autowired

    private SearcherManager searcherManager;

    public AutoReplyMsg searchAutoReplyMsg(String keyword) throws IOException, ParseException {

        searcherManager.maybeRefresh();

        IndexSearcher indexSearcher = searcherManager.acquire();

        BooleanQuery.Builder builder = new BooleanQuery.Builder();

        builder.add(new QueryParser("keywords", analyzer).parse(keyword), BooleanClause.Occur.MUST);

        TopDocs topDocs = indexSearcher.search(builder.build(), 1);

        ScoreDoc[] hits = topDocs.scoreDocs;

        if(hits != null && hits.length > 0){

            Document doc = indexSearcher.doc(hits[0].doc);

            AutoReplyMsg autoReplyMsg = new AutoReplyMsg();

            autoReplyMsg.setGuid(Long.parseLong(doc.get("id")));

            autoReplyMsg.setReceiveContent(keyword);

            autoReplyMsg.setReceiveMsgType(1);

            autoReplyMsg.setReplyMsgType(Integer.valueOf(doc.get("replyMsgType")));

            autoReplyMsg.setReplyContent(doc.get("replyContent"));

            autoReplyMsg.setTitle(doc.get("title"));

            autoReplyMsg.setPicUrl(doc.get("picUrl"));

            autoReplyMsg.setUrl(doc.get("url"));

            autoReplyMsg.setMediaId(doc.get("mediaId"));

            return autoReplyMsg;

        }

        return null;

    }

}

索引维护~删除更新索引

public int delete(AutoReplyMsg autoReplyMsg){

        int resp = autoReplyMsgDao.delete(autoReplyMsg.getGuid());

        try {

            indexWriter.deleteDocuments(new Term("id", autoReplyMsg.getGuid()+""));

            indexWriter.commit();

        } catch (IOException e) {

            e.printStackTrace();

        }

        return resp;

    }

好了，智能问答查询回复功能基本完成了，大大提高公众号智能回复响应效率。