[solr] - spell check
solr提供了一个spell check,又叫suggestions,可以用于查询输入的自动完成功能auto-complete。
参考文献:
https://cwiki.apache.org/confluence/display/solr/Spell+Checking
http://www.cnblogs.com/ibook360/archive/2011/11/30/2269077.html
方法:
修改core的solrconfig.xml
加入这段到<config />内
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<lst name="spellchecker">
<str name="name">wordbreak</str>
<str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
<str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
<str name="field">content</str>
<str name="combineWords">true</str>
<str name="breakWords">true</str>
<int name="maxChanges">10</int>
</lst>
</searchComponent>
<requestHandler name="/spellcheck" class="org.apache.solr.handler.component.SearchHandler">
<lst name="defaults">
<str name="spellcheck">true</str>
<str name="spellcheck.dictionary">wordbreak</str>
<str name="spellcheck.count">20</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
schema.xml配置:
<?xml version="1.0" ?>
<schema name="my core" version="1.1"> <fieldtype name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="tdate" class="solr.TrieDateField" precisionStep="6" positionIncrementGap="0"/>
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="float" class="solr.TrieFloatField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"/>
<fieldtype name="binary" class="solr.BinaryField"/>
<fieldType name="text_cn" class="solr.TextField">
<analyzer type="index" class="org.wltea.analyzer.lucene.IKAnalyzer" />
<analyzer type="query" class="org.wltea.analyzer.lucene.IKAnalyzer" />
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType> <!-- general -->
<field name="id" type="long" indexed="true" stored="true" multiValued="false" required="true"/>
<field name="subject" type="text_cn" indexed="true" stored="true" />
<field name="content" type="text_cn" indexed="true" stored="true" />
<field name="category_id" type="long" indexed="true" stored="true" />
<field name="category_name" type="text_cn" indexed="true" stored="true" />
<field name="last_update_time" type="tdate" indexed="true" stored="true" />
<field name="_version_" type="long" indexed="true" stored="true"/> <!-- field to use to determine and enforce document uniqueness. -->
<uniqueKey>id</uniqueKey> <!-- field for the QueryParser to use when an explicit fieldname is absent -->
<defaultSearchField>subject</defaultSearchField> <!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
<solrQueryParser defaultOperator="OR"/>
</schema>
关键在于这句:
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
意思是词组搜索
设置完xml,重启tomcat,在浏览器中运行:
http://localhost:8899/solr/mycore/spellcheck?spellcheck.build=true
运行结果:

然后在浏览器中运行:
http://localhost:8899/solr/mycore/spellcheck?q=中央&rows=0
运行结果:

Java代码:
Java bean:
package com.my.entity;
import java.util.Date;
import org.apache.solr.client.solrj.beans.Field;
public class Item {
    @Field
    private long id;
    @Field
    private String subject;
    @Field
    private String content;
    @Field("category_id")
    private long categoryId;
    @Field("category_name")
    private String categoryName;
    @Field("last_update_time")
    private Date lastUpdateTime;
    public long getId() {
        return id;
    }
    public void setId(long id) {
        this.id = id;
    }
    public String getSubject() {
        return subject;
    }
    public void setSubject(String subject) {
        this.subject = subject;
    }
    public String getContent() {
        return content;
    }
    public void setContent(String content) {
        this.content = content;
    }
    public long getCategoryId() {
        return categoryId;
    }
    public void setCategoryId(long categoryId) {
        this.categoryId = categoryId;
    }
    public String getCategoryName() {
        return categoryName;
    }
    public void setCategoryName(String categoryName) {
        this.categoryName = categoryName;
    }
    public Date getLastUpdateTime() {
        return lastUpdateTime;
    }
    public void setLastUpdateTime(Date lastUpdateTime) {
        this.lastUpdateTime = lastUpdateTime;
    }
}
测试代码:
package com.my.solr; import java.io.IOException;
import java.util.ArrayList;
import java.util.Date;
import java.util.List;
import java.util.Map; import org.apache.solr.client.solrj.SolrQuery;
import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.client.solrj.impl.HttpSolrServer;
import org.apache.solr.client.solrj.impl.XMLResponseParser;
import org.apache.solr.client.solrj.response.QueryResponse;
import org.apache.solr.client.solrj.response.SpellCheckResponse;
import org.apache.solr.client.solrj.response.SpellCheckResponse.Collation;
import org.apache.solr.client.solrj.response.SpellCheckResponse.Correction;
import org.apache.solr.client.solrj.response.SpellCheckResponse.Suggestion; import com.my.entity.Item; public class TestSolr { public static void main(String[] args) throws IOException, SolrServerException {
String url = "http://localhost:8899/solr/mycore";
HttpSolrServer core = new HttpSolrServer(url);
core.setMaxRetries(1);
core.setConnectionTimeout(5000);
core.setParser(new XMLResponseParser()); // binary parser is used by default
core.setSoTimeout(1000); // socket read timeout
core.setDefaultMaxConnectionsPerHost(100);
core.setMaxTotalConnections(100);
core.setFollowRedirects(false); // defaults to false
core.setAllowCompression(true); // ------------------------------------------------------
// remove all data
// ------------------------------------------------------
core.deleteByQuery("*:*");
List<Item> items = new ArrayList<Item>();
items.add(makeItem(1, "cpu", "this is intel cpu", 1, "cpu-intel"));
items.add(makeItem(2, "cpu AMD", "this is AMD cpu", 2, "cpu-AMD"));
items.add(makeItem(3, "cpu intel", "this is intel-I7 cpu", 1, "cpu-intel"));
items.add(makeItem(4, "cpu AMD", "this is AMD 5000x cpu", 2, "cpu-AMD"));
items.add(makeItem(5, "cpu intel I6", "this is intel-I6 cpu", 1, "cpu-intel-I6"));
items.add(makeItem(6, "处理器", "中央处理器英特儿", 1, "cpu-intel"));
items.add(makeItem(7, "处理器AMD", "中央处理器AMD", 2, "cpu-AMD"));
items.add(makeItem(8, "中央处理器", "中央处理器Intel", 1, "cpu-intel"));
items.add(makeItem(9, "中央空调格力", "格力中央空调", 3, "air"));
items.add(makeItem(10, "中央空调海尔", "海尔中央空调", 3, "air"));
items.add(makeItem(11, "中央空调美的", "美的中央空调", 3, "air"));
core.addBeans(items);
// commit
core.commit(); // ------------------------------------------------------
// search
// ------------------------------------------------------
SolrQuery query = new SolrQuery();
String token = "中央";
query.set("qt", "/spellcheck");
query.set("q", token);
query.set("spellcheck", "on");
query.set("spellcheck.build", "true");
query.set("spellcheck.onlyMorePopular", "true"); query.set("spellcheck.count", "100");
query.set("spellcheck.alternativeTermCount", "4");
query.set("spellcheck.onlyMorePopular", "true"); query.set("spellcheck.extendedResults", "true");
query.set("spellcheck.maxResultsForSuggest", "5"); query.set("spellcheck.collate", "true");
query.set("spellcheck.collateExtendedResults", "true");
query.set("spellcheck.maxCollationTries", "5");
query.set("spellcheck.maxCollations", "3"); QueryResponse response = null; try {
response = core.query(query);
System.out.println("查询耗时:" + response.getQTime());
} catch (SolrServerException e) {
System.err.println(e.getMessage());
e.printStackTrace();
} catch (Exception e) {
System.err.println(e.getMessage());
e.printStackTrace();
} finally {
core.shutdown();
} SpellCheckResponse spellCheckResponse = response.getSpellCheckResponse();
if (spellCheckResponse != null) {
List<Suggestion> suggestionList = spellCheckResponse.getSuggestions();
for (Suggestion suggestion : suggestionList) {
System.out.println("Suggestions NumFound: " + suggestion.getNumFound());
System.out.println("Token: " + suggestion.getToken());
System.out.print("Suggested: ");
List<String> suggestedWordList = suggestion.getAlternatives();
for (String word : suggestedWordList) {
System.out.println(word + ", ");
}
System.out.println();
}
System.out.println();
Map<String, Suggestion> suggestedMap = spellCheckResponse.getSuggestionMap();
for (Map.Entry<String, Suggestion> entry : suggestedMap.entrySet()) {
System.out.println("suggestionName: " + entry.getKey());
Suggestion suggestion = entry.getValue();
System.out.println("NumFound: " + suggestion.getNumFound());
System.out.println("Token: " + suggestion.getToken());
System.out.print("suggested: "); List<String> suggestedList = suggestion.getAlternatives();
for (String suggestedWord : suggestedList) {
System.out.print(suggestedWord + ", ");
}
System.out.println("\n\n");
} Suggestion suggestion = spellCheckResponse.getSuggestion(token);
System.out.println("NumFound: " + suggestion.getNumFound());
System.out.println("Token: " + suggestion.getToken());
System.out.print("suggested: ");
List<String> suggestedList = suggestion.getAlternatives();
for (String suggestedWord : suggestedList) {
System.out.print(suggestedWord + ", ");
}
System.out.println("\n\n"); System.out.println("The First suggested word for solr is : " + spellCheckResponse.getFirstSuggestion(token));
System.out.println("\n\n"); List<Collation> collatedList = spellCheckResponse.getCollatedResults();
if (collatedList != null) {
for (Collation collation : collatedList) {
System.out.println("collated query String: " + collation.getCollationQueryString());
System.out.println("collation Num: " + collation.getNumberOfHits());
List<Correction> correctionList = collation.getMisspellingsAndCorrections();
for (Correction correction : correctionList) {
System.out.println("original: " + correction.getOriginal());
System.out.println("correction: " + correction.getCorrection());
}
System.out.println();
}
}
System.out.println();
System.out.println("The Collated word: " + spellCheckResponse.getCollatedResult());
System.out.println();
} System.out.println("查询耗时:" + response.getQTime());
} private static Item makeItem(long id, String subject, String content, long categoryId, String categoryName) {
Item item = new Item();
item.setId(id);
item.setSubject(subject);
item.setContent(content);
item.setLastUpdateTime(new Date());
item.setCategoryId(categoryId);
item.setCategoryName(categoryName);
return item;
}
}
测试结果:

这种方式可以使用于对现在数据内容的查询拼写检查。
[solr] - spell check的更多相关文章
- VIM 拼写/spell check
		
VIM 拼写检查/spell check 一.Hunspell科普 Hunspell 作为一个拼写检查的工具,已经用在了许多开源的以及商业软件中.包括Google Chrome, Libreoffic ...
 - Solr 6.7学习笔记(06)-- spell check
		
拼写检查也是搜索引擎必备的功能.Solr中提供了SpellCheckComponent 来实现此功能.我看过<Solr In Action>,是基于Solr4.X版本的,那时Suggest ...
 - Word:  How to Temporarily Disable Spell Check in Word
		
link: http://johnlamansky.com/tech/disable-word-spell-check/ 引用: Word 2010 Click the “File” button C ...
 - 1.7.7 Spell Checking -拼写检查
		
1. SpellCheck SpellCheck组件设计的目的是基于其他,相似,terms来提供内联查询建议.这些建议的依据可以是solr字段中的terms,外部可以创建文本文件, 或者其实lucen ...
 - solr拼写检查配置
		
拼写检查功能,能在搜索时,提供一个较好用户体验,所以,主流的搜索引擎都有这个功能. 那么什么是拼写检查,其实很好理解,就是你输入的搜索词,可能是你输错了,也有可能在它的检索库里面根本不存在这个词,但是 ...
 - Importing/Indexing database (MySQL or SQL Server) in Solr using Data Import Handler--转载
		
原文地址:https://gist.github.com/maxivak/3e3ee1fca32f3949f052 Install Solr download and install Solr fro ...
 - Solr 6.7学习笔记(03)-- 样例配置文件 solrconfig.xml
		
位于:${solr.home}\example\techproducts\solr\techproducts\conf\solrconfig.xml <?xml version="1. ...
 - Solr基础知识二(导入数据)
		
上一篇讲述了solr的安装启动过程,这一篇讲述如何导入数据到solr里. 一.准备数据 1.1 学生相关表 创建学生表.学生专业关联表.专业表.学生行业关联表.行业表.基础信息表,并创建一条小白的信息 ...
 - 【Nutch2.3基础教程】集成Nutch/Hadoop/Hbase/Solr构建搜索引擎:安装及运行【集群环境】
		
1.下载相关软件,并解压 版本号如下: (1)apache-nutch-2.3 (2) hadoop-1.2.1 (3)hbase-0.92.1 (4)solr-4.9.0 并解压至/opt/jedi ...
 
随机推荐
- Canvas与Image互相转换
			
转换 Image为 Canvas要把图片转换为Canvas(画板,画布),可以使用canvas元素 context 的drawImage方法: // 把image 转换为 canvas对象 funct ...
 - 20145224&20145238 《信息安全系统设计基础》 第一次实验
			
20145224&20145238 <信息安全系统设计基础>第一次实验 课程:信息安全系统设计基础 班级:1452 姓名:陈颢文 荆玉茗 学号:20145224 20145238 ...
 - 用JDBC访问MySQL
			
/* 在数据库中创建一个Employee的类 create table Employee( id int primary key, name varchar(20), age int); */ imp ...
 - 今天的学习——关于JS时间获取方面的小问题
			
①JS获取时间 var myDate = new Date(); myDate.getYear(); //获取当前年份(2位) myDate.getFullYear(); //获取完整的年份(4位,1 ...
 - iOS开发多线程篇—自定义NSOperation
			
iOS开发多线程篇—自定义NSOperation 一.实现一个简单的tableView显示效果 实现效果展示: 代码示例(使用以前在主控制器中进行业务处理的方式) 1.新建一个项目,让控制器继承自UI ...
 - git在本地仓库直接使用rm彻底删除文件,服务端还是存在
			
本地仓库:A和B 服务器:C 今天在本地仓库A希望删除一个文件test,于是执行以下命令: 1 2 3 4 $ sudo rm test $ git add . $ git commit -m &qu ...
 - 实现手机扫描二维码页面登录,类似web微信-第三篇,手机客户端
			
转自:http://www.cnblogs.com/fengyun99/p/3541254.html 上一篇,介绍了二维码生成的机制,紧接着,我们就要开发手机客户端来识别这个二维码. 二维码,实际上是 ...
 - github删除带有文件的文件夹
			
1. git pull you git url2. git checkout 3. rm -rf dirName4. git add --all5. git commit -m"remove ...
 - 利用Scrollow写一个下拉刷新
			
利用scrollView滑动的2个监听方法实现 //滑动结束时候 出发的方法 - (void)scrollViewWillEndDragging:(UIScrollView *)scrollView ...
 - Win7下通过eclipse远程连接CDH集群来执行相应的程序以及错误说明
			
最近尝试这用用eclipse连接CDH的集群,由于之前尝试过很多次都没连上,有一次发现Cloudera Manager是将连接的端口修改了,所以才导致连接不上CDH的集群,之前Apache hadoo ...