[solr] - spell check

solr提供了一个spell check，又叫suggestions，可以用于查询输入的自动完成功能auto-complete。

参考文献：

https://cwiki.apache.org/confluence/display/solr/Spell+Checking

http://www.cnblogs.com/ibook360/archive/2011/11/30/2269077.html

方法：

修改core的solrconfig.xml

加入这段到<config />内

    <searchComponent name="spellcheck" class="solr.SpellCheckComponent">

      <lst name="spellchecker">

        <str name="name">wordbreak</str>

        <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>

        <str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>

        <str name="field">content</str>

        <str name="combineWords">true</str>

        <str name="breakWords">true</str>

        <int name="maxChanges">10</int>

      </lst>

    </searchComponent>

    <requestHandler name="/spellcheck" class="org.apache.solr.handler.component.SearchHandler">

      <lst name="defaults">

        <str name="spellcheck">true</str>

        <str name="spellcheck.dictionary">wordbreak</str>

        <str name="spellcheck.count">20</str>

      </lst>

      <arr name="last-components">

        <str>spellcheck</str>

      </arr>

    </requestHandler>

schema.xml配置：

<?xml version="1.0" ?>

<schema name="my core" version="1.1">

    <fieldtype name="string"  class="solr.StrField" sortMissingLast="true" omitNorms="true"/>

    <fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>

    <fieldType name="tdate" class="solr.TrieDateField" precisionStep="6" positionIncrementGap="0"/>

    <fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>

    <fieldType name="float" class="solr.TrieFloatField" precisionStep="0" positionIncrementGap="0"/>

    <fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" positionIncrementGap="0"/>

    <fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"/>

    <fieldtype name="binary" class="solr.BinaryField"/>

    <fieldType name="text_cn" class="solr.TextField">

        <analyzer type="index" class="org.wltea.analyzer.lucene.IKAnalyzer" />

        <analyzer type="query" class="org.wltea.analyzer.lucene.IKAnalyzer" />

        <analyzer>

            <tokenizer class="solr.KeywordTokenizerFactory"/>

            <filter class="solr.LowerCaseFilterFactory"/>

        </analyzer>

    </fieldType>

    <!-- general -->

    <field name="id" type="long" indexed="true" stored="true" multiValued="false" required="true"/>

    <field name="subject" type="text_cn" indexed="true" stored="true" />

    <field name="content" type="text_cn" indexed="true" stored="true" />

    <field name="category_id" type="long" indexed="true" stored="true" />

    <field name="category_name" type="text_cn" indexed="true" stored="true" />

    <field name="last_update_time" type="tdate" indexed="true" stored="true" />

    <field name="_version_" type="long" indexed="true" stored="true"/>

     <!-- field to use to determine and enforce document uniqueness. -->

     <uniqueKey>id</uniqueKey>

     <!-- field for the QueryParser to use when an explicit fieldname is absent -->

     <defaultSearchField>subject</defaultSearchField>

     <!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->

     <solrQueryParser defaultOperator="OR"/>

</schema>

关键在于这句：

        <analyzer>

            <tokenizer class="solr.KeywordTokenizerFactory"/>

            <filter class="solr.LowerCaseFilterFactory"/>

        </analyzer>

意思是词组搜索

设置完xml，重启tomcat，在浏览器中运行：

http://localhost:8899/solr/mycore/spellcheck?spellcheck.build=true

运行结果：

然后在浏览器中运行：

http://localhost:8899/solr/mycore/spellcheck?q=中央&rows=0

运行结果：

Java代码：

Java bean:

package com.my.entity;

import java.util.Date;

import org.apache.solr.client.solrj.beans.Field;

public class Item {

    @Field

    private long id;

    @Field

    private String subject;

    @Field

    private String content;

    @Field("category_id")

    private long categoryId;

    @Field("category_name")

    private String categoryName;

    @Field("last_update_time")

    private Date lastUpdateTime;

    public long getId() {

        return id;

    }

    public void setId(long id) {

        this.id = id;

    }

    public String getSubject() {

        return subject;

    }

    public void setSubject(String subject) {

        this.subject = subject;

    }

    public String getContent() {

        return content;

    }

    public void setContent(String content) {

        this.content = content;

    }

    public long getCategoryId() {

        return categoryId;

    }

    public void setCategoryId(long categoryId) {

        this.categoryId = categoryId;

    }

    public String getCategoryName() {

        return categoryName;

    }

    public void setCategoryName(String categoryName) {

        this.categoryName = categoryName;

    }

    public Date getLastUpdateTime() {

        return lastUpdateTime;

    }

    public void setLastUpdateTime(Date lastUpdateTime) {

        this.lastUpdateTime = lastUpdateTime;

    }

}

测试代码：

package com.my.solr;

import java.io.IOException;

import java.util.ArrayList;

import java.util.Date;

import java.util.List;

import java.util.Map;

import org.apache.solr.client.solrj.SolrQuery;

import org.apache.solr.client.solrj.SolrServerException;

import org.apache.solr.client.solrj.impl.HttpSolrServer;

import org.apache.solr.client.solrj.impl.XMLResponseParser;

import org.apache.solr.client.solrj.response.QueryResponse;

import org.apache.solr.client.solrj.response.SpellCheckResponse;

import org.apache.solr.client.solrj.response.SpellCheckResponse.Collation;

import org.apache.solr.client.solrj.response.SpellCheckResponse.Correction;

import org.apache.solr.client.solrj.response.SpellCheckResponse.Suggestion;

import com.my.entity.Item;

public class TestSolr {

    public static void main(String[] args) throws IOException, SolrServerException {

        String url = "http://localhost:8899/solr/mycore";

        HttpSolrServer core = new HttpSolrServer(url);

        core.setMaxRetries(1);

        core.setConnectionTimeout(5000);

        core.setParser(new XMLResponseParser()); // binary parser is used by default

        core.setSoTimeout(1000); // socket read timeout

        core.setDefaultMaxConnectionsPerHost(100);

        core.setMaxTotalConnections(100);

        core.setFollowRedirects(false); // defaults to false

        core.setAllowCompression(true);

        // ------------------------------------------------------

        // remove all data

        // ------------------------------------------------------

        core.deleteByQuery("*:*");

        List<Item> items = new ArrayList<Item>();

        items.add(makeItem(1, "cpu", "this is intel cpu", 1, "cpu-intel"));

        items.add(makeItem(2, "cpu AMD", "this is AMD cpu", 2, "cpu-AMD"));

        items.add(makeItem(3, "cpu intel", "this is intel-I7 cpu", 1, "cpu-intel"));

        items.add(makeItem(4, "cpu AMD", "this is AMD 5000x cpu", 2, "cpu-AMD"));

        items.add(makeItem(5, "cpu intel I6", "this is intel-I6 cpu", 1, "cpu-intel-I6"));

        items.add(makeItem(6, "处理器", "中央处理器英特儿", 1, "cpu-intel"));

        items.add(makeItem(7, "处理器AMD", "中央处理器AMD", 2, "cpu-AMD"));

        items.add(makeItem(8, "中央处理器", "中央处理器Intel", 1, "cpu-intel"));

        items.add(makeItem(9, "中央空调格力", "格力中央空调", 3, "air"));

        items.add(makeItem(10, "中央空调海尔", "海尔中央空调", 3, "air"));

        items.add(makeItem(11, "中央空调美的", "美的中央空调", 3, "air"));

        core.addBeans(items);

        // commit

        core.commit();

        // ------------------------------------------------------

        // search

        // ------------------------------------------------------

        SolrQuery query = new SolrQuery();

        String token = "中央";

        query.set("qt", "/spellcheck");

        query.set("q", token);

        query.set("spellcheck", "on");

        query.set("spellcheck.build", "true");

        query.set("spellcheck.onlyMorePopular", "true");

        query.set("spellcheck.count", "100");

        query.set("spellcheck.alternativeTermCount", "4");

        query.set("spellcheck.onlyMorePopular", "true");

        query.set("spellcheck.extendedResults", "true");

        query.set("spellcheck.maxResultsForSuggest", "5");

        query.set("spellcheck.collate", "true");

        query.set("spellcheck.collateExtendedResults", "true");

        query.set("spellcheck.maxCollationTries", "5");

        query.set("spellcheck.maxCollations", "3");

        QueryResponse response = null;

        try {

            response = core.query(query);

            System.out.println("查询耗时：" + response.getQTime());

        } catch (SolrServerException e) {

            System.err.println(e.getMessage());

            e.printStackTrace();

        } catch (Exception e) {

            System.err.println(e.getMessage());

            e.printStackTrace();

        } finally {

            core.shutdown();

        }

        SpellCheckResponse spellCheckResponse = response.getSpellCheckResponse();

        if (spellCheckResponse != null) {

            List<Suggestion> suggestionList = spellCheckResponse.getSuggestions();

            for (Suggestion suggestion : suggestionList) {

                System.out.println("Suggestions NumFound: " + suggestion.getNumFound());

                System.out.println("Token: " + suggestion.getToken());

                System.out.print("Suggested: ");

                List<String> suggestedWordList = suggestion.getAlternatives();

                for (String word : suggestedWordList) {

                    System.out.println(word + ", ");

                }

                System.out.println();

            }

            System.out.println();

            Map<String, Suggestion> suggestedMap = spellCheckResponse.getSuggestionMap();

            for (Map.Entry<String, Suggestion> entry : suggestedMap.entrySet()) {

                System.out.println("suggestionName: " + entry.getKey());

                Suggestion suggestion = entry.getValue();

                System.out.println("NumFound: " + suggestion.getNumFound());

                System.out.println("Token: " + suggestion.getToken());

                System.out.print("suggested: ");

                List<String> suggestedList = suggestion.getAlternatives();

                for (String suggestedWord : suggestedList) {

                    System.out.print(suggestedWord + ", ");

                }

                System.out.println("\n\n");

            }

            Suggestion suggestion = spellCheckResponse.getSuggestion(token);

            System.out.println("NumFound: " + suggestion.getNumFound());

            System.out.println("Token: " + suggestion.getToken());

            System.out.print("suggested: ");

            List<String> suggestedList = suggestion.getAlternatives();

            for (String suggestedWord : suggestedList) {

                System.out.print(suggestedWord + ", ");

            }

            System.out.println("\n\n");

            System.out.println("The First suggested word for solr is : " + spellCheckResponse.getFirstSuggestion(token));

            System.out.println("\n\n");

            List<Collation> collatedList = spellCheckResponse.getCollatedResults();

            if (collatedList != null) {

                for (Collation collation : collatedList) {

                    System.out.println("collated query String: " + collation.getCollationQueryString());

                    System.out.println("collation Num: " + collation.getNumberOfHits());

                    List<Correction> correctionList = collation.getMisspellingsAndCorrections();

                    for (Correction correction : correctionList) {

                        System.out.println("original: " + correction.getOriginal());

                        System.out.println("correction: " + correction.getCorrection());

                    }

                    System.out.println();

                }

            }

            System.out.println();

            System.out.println("The Collated word: " + spellCheckResponse.getCollatedResult());

            System.out.println();

        }

        System.out.println("查询耗时：" + response.getQTime());

    }

    private static Item makeItem(long id, String subject, String content, long categoryId, String categoryName) {

        Item item = new Item();

        item.setId(id);

        item.setSubject(subject);

        item.setContent(content);

        item.setLastUpdateTime(new Date());

        item.setCategoryId(categoryId);

        item.setCategoryName(categoryName);

        return item;

    }

}

测试结果：

这种方式可以使用于对现在数据内容的查询拼写检查。

[solr] - spell check的更多相关文章

VIM 拼写/spell check
VIM 拼写检查/spell check 一.Hunspell科普 Hunspell 作为一个拼写检查的工具,已经用在了许多开源的以及商业软件中.包括Google Chrome, Libreoffic ...
Solr 6.7学习笔记（06）-- spell check
拼写检查也是搜索引擎必备的功能.Solr中提供了SpellCheckComponent 来实现此功能.我看过<Solr In Action>,是基于Solr4.X版本的,那时Suggest ...
Word: How to Temporarily Disable Spell Check in Word
link: http://johnlamansky.com/tech/disable-word-spell-check/ 引用: Word 2010 Click the “File” button C ...
1.7.7 Spell Checking -拼写检查
1. SpellCheck SpellCheck组件设计的目的是基于其他,相似,terms来提供内联查询建议.这些建议的依据可以是solr字段中的terms,外部可以创建文本文件, 或者其实lucen ...
solr拼写检查配置
拼写检查功能,能在搜索时,提供一个较好用户体验,所以,主流的搜索引擎都有这个功能. 那么什么是拼写检查,其实很好理解,就是你输入的搜索词,可能是你输错了,也有可能在它的检索库里面根本不存在这个词,但是 ...
Importing/Indexing database (MySQL or SQL Server) in Solr using Data Import Handler--转载
原文地址:https://gist.github.com/maxivak/3e3ee1fca32f3949f052 Install Solr download and install Solr fro ...
Solr 6.7学习笔记（03）-- 样例配置文件 solrconfig.xml
位于:${solr.home}\example\techproducts\solr\techproducts\conf\solrconfig.xml <?xml version="1. ...
Solr基础知识二（导入数据）
上一篇讲述了solr的安装启动过程,这一篇讲述如何导入数据到solr里. 一.准备数据 1.1 学生相关表创建学生表.学生专业关联表.专业表.学生行业关联表.行业表.基础信息表,并创建一条小白的信息 ...
【Nutch2.3基础教程】集成Nutch/Hadoop/Hbase/Solr构建搜索引擎：安装及运行【集群环境】
1.下载相关软件,并解压版本号如下: (1)apache-nutch-2.3 (2) hadoop-1.2.1 (3)hbase-0.92.1 (4)solr-4.9.0 并解压至/opt/jedi ...

随机推荐

JsTree
一.JStree的简单介绍 1.关于jstree jsTree 使用了 jQuery 和 Sarissa,是一个是免费的但是设置灵活的,基于 JavaScript 跨浏览器支持的网页树形部件. jsT ...
OpenAl编程入门：播放一段音频
OpenAl编程入门关于OpenAl我就不多介绍了,这两篇说明对于初步了解已经足够了:http://baike.baidu.com/view/1355367.htmhttp://en.wikiped ...
[笔记]JavaScript获得对象属性个数的方法
//扩展对象的count方法 Object.prototype.count = ( Object.prototype.hasOwnProperty(‘__count__’) ) ? function ...
hdu 5876 ACM/ICPC Dalian Online 1009 Sparse Graph
题目链接分析:这叫补图上的BFS,萌新第一次遇到= =.方法很简单,看了别人的代码后,自己也学会了.方法就是开两个集合,一个A表示在下一次bfs中能够到达的点,另一个B就是下一次bfs中到不了的点. ...
Less基础知识~~~实现css
首先献上我学习网址 http://www.bootcss.com/lesscss.html 就是通过less.js的调用,更好的实现css样式布局,更简易化. 最近看前端的职位要求需简单理解less之 ...
eclipse改变theme
https://github.com/eclipse-color-theme/eclipse-color-theme.git https://github.com/eclipse-color-them ...
Spring 7大功能模块的作用[转]
核心容器(Spring core) 核心容器提供Spring框架的基本功能.Spring以bean的方式组织和管理Java应用中的各个组件及其关系.Spring使用BeanFactory来产生和管理B ...
OC语言基础知识
OC语言基础知识一.面向对象 OC语言是面向对象的,c语言是面向过程的,面向对象和面向过程只是解决问题的两种思考方式,面向过程关注的是解决问题涉及的步骤,面向对象关注的是设计能够实现解决问题所需功能 ...
BestCoder Round #40
T1:Tom and pape (hdu 5224) 题目大意: 给出一个矩形面积N,求周长的最小值.(长&&宽&&面积都是正整数) N<=109 题解: 没啥好 ...
iOS内部跳转问题
//打开地图 NSString*addressText = @" "; //@"1Infinite Loop, Cupertino, CA 95014&quo ...

[solr] - spell check

[solr] - spell check的更多相关文章

随机推荐

热门专题