Lucene 4.x Spellcheck使用说明

　　Spellcheck是Lucene新版本的功能，在介绍spellcheck之前，我们需要弄清楚Spellcheck支持几种数据源。Spellcheck构造函数需要传入Dictionary接口：

package org.apache.lucene.search.spell;

/*

 * Licensed to the Apache Software Foundation (ASF) under one or more

 * contributor license agreements.  See the NOTICE file distributed with

 * this work for additional information regarding copyright ownership.

 * The ASF licenses this file to You under the Apache License, Version 2.0

 * (the "License"); you may not use this file except in compliance with

 * the License.  You may obtain a copy of the License at

 *

 *     http://www.apache.org/licenses/LICENSE-2.0

 *

 * Unless required by applicable law or agreed to in writing, software

 * distributed under the License is distributed on an "AS IS" BASIS,

 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

 * See the License for the specific language governing permissions and

 * limitations under the License.

 */

import java.io.IOException;

import org.apache.lucene.search.suggest.InputIterator;

/**

 * A simple interface representing a Dictionary. A Dictionary

 * here is a list of entries, where every entry consists of

 * term, weight and payload.

 *

 */

public interface Dictionary {

  /**

   * Returns an iterator over all the entries

   * @return Iterator

   */

  InputIterator getEntryIterator() throws IOException;

}

　　常用的Dictionary主要有以下几种，常用的主要有基于文本型的和基于lucene索引构建的：

　　下面是我测试用的一段代码，代码包括索引构建和索引查询：

package com.tianditu.com.search;

import java.io.File;

import java.io.IOException;

import org.apache.lucene.index.DirectoryReader;

import org.apache.lucene.index.IndexWriterConfig;

import org.apache.lucene.search.spell.LuceneDictionary;

import org.apache.lucene.search.spell.SpellChecker;

import org.apache.lucene.store.Directory;

import org.apache.lucene.store.FSDirectory;

import org.apache.lucene.store.MMapDirectory;

import org.apache.lucene.util.Version;

public class GlobalSuggest {

	//拼写检查构建的索引

	private  final String SPELL_CHECK_FOLDER = "c:\\spellcheck\\";

	//根据已有的索引

	private final String GLOBAL_PINYIN_SUGGEST = "O:\\searchwork_custom\\data_index\\pinyin2008\\";

	//构建索引

	public void testIndexPinyin2008() throws IOException{

		long start = System.currentTimeMillis();

		//北京吉威时代软件股份有限公司

		//String indexDir ="O:\\searchwork_custom\\data_index\\GlobalIndex\\";

		Directory direct = new MMapDirectory(new File(GLOBAL_PINYIN_SUGGEST));

		LuceneDictionary ld = new LuceneDictionary(DirectoryReader.open(direct), "name");

		ld.getEntryIterator();

		Directory spd = FSDirectory.open(new File(SPELL_CHECK_FOLDER));

		SpellChecker sc = new SpellChecker(spd);

		//sc.in

		IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_30,null);

		//往spellcheck目录下写索引--------------

		sc.indexDictionary(ld, iwc, true);

		sc.close();

		long end = System.currentTimeMillis();

		System.out.println("索引完毕,耗时:"+(end-start)+"ms");

	}

	public void testIndex() throws IOException{

		long start = System.currentTimeMillis();

		//北京吉威时代软件股份有限公司

		String indexDir ="O:\\searchwork_custom\\data_index\\GlobalIndex\\";

		Directory direct = new MMapDirectory(new File(indexDir));

		LuceneDictionary ld = new LuceneDictionary(DirectoryReader.open(direct), "name");

		ld.getEntryIterator();

		Directory spd = FSDirectory.open(new File(SPELL_CHECK_FOLDER));

		SpellChecker sc = new SpellChecker(spd);

		//sc.in

		IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_30,null);

		sc.indexDictionary(ld, iwc, true);

		sc.close();

		long end = System.currentTimeMillis();

		System.out.println("索引完毕,耗时:"+(end-start)+"ms");

	}

	public void testSearch(String wd) throws IOException{

		//构建Directory

		Directory spd = FSDirectory.open(new File(SPELL_CHECK_FOLDER));

		//实例化 spellcheck组件

		SpellChecker sc = new SpellChecker(spd);

		//根据输入关键字  获得N条最相近的几率 第三个鄙视精确度 越大越匹配 安装实际需要调整

		String[] suggests = sc.suggestSimilar(wd, 10,0.6f);

		if(suggests!=null){

			for(String word:suggests){

				System.out.println("Dou you mean:"+word);

			}

		}

	}

	/**

	 * @param args

	 * @throws IOException

	 */

	public static void main(String[] args) throws IOException {

		GlobalSuggest spellcheck = new GlobalSuggest();

		//spellcheck.testIndexPinyin2008();

		spellcheck.testSearch("beijing京鸭");

		//spellcheck.testSearch("beijng");

	}

}

　　其中索引构建处代码：

	//构建索引

	public void testIndexPinyin2008() throws IOException{

		long start = System.currentTimeMillis();

		//北京吉威时代软件股份有限公司

		//String indexDir ="O:\\searchwork_custom\\data_index\\GlobalIndex\\";

		Directory direct = new MMapDirectory(new File(GLOBAL_PINYIN_SUGGEST));

		LuceneDictionary ld = new LuceneDictionary(DirectoryReader.open(direct), "name");

		ld.getEntryIterator();

		Directory spd = FSDirectory.open(new File(SPELL_CHECK_FOLDER));

		SpellChecker sc = new SpellChecker(spd);

		//sc.in

		IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_30,null);

		//往spellcheck目录下写索引--------------

		sc.indexDictionary(ld, iwc, true);

		sc.close();

		long end = System.currentTimeMillis();

		System.out.println("索引完毕,耗时:"+(end-start)+"ms");

	}

　　此处代码，就是根据已有的索引来构建Spellcheck所需的索引。

Spellcheck查询索引代码片段如下：

//构建Directory

		Directory spd = FSDirectory.open(new File(SPELL_CHECK_FOLDER));

		//实例化 spellcheck组件

		SpellChecker sc = new SpellChecker(spd);

		//根据输入关键字  获得N条最相近的几率 第三个鄙视精确度 越大越匹配 安装实际需要调整

		String[] suggests = sc.suggestSimilar(wd, 10,0.6f);

		if(suggests!=null){

			for(String word:suggests){

				System.out.println("Dou you mean:"+word);

			}

		}

　相关算法：默认是 LevensteinDistance 。

　查询样例：

　　　　1、查询汉字，有错别字情况：

　　　　2、查询拼音：

　　　　3、拼音汉字夹杂：

（备注：发现问题了，拼音和汉字夹杂的情况不行，如果想使用，需要进行某种处理。）

　　　　4、如果处理一长串汉字，中间夹杂错别字：

　　总结：看来spellcheck能力还是有限，如果需要用还可能改造。

Lucene 4.x Spellcheck使用说明的更多相关文章

lucene字典实现原理
http://www.cnblogs.com/LBSer/p/4119841.html 1 lucene字典使用lucene进行查询不可避免都会使用到其提供的字典功能,即根据给定的term找到该te ...
lucene字典实现原理——FST
转自:http://www.cnblogs.com/LBSer/p/4119841.html 1 lucene字典使用lucene进行查询不可避免都会使用到其提供的字典功能,即根据给定的term找到 ...
Elasticsearch .Net Client NEST使用说明 2.x
Elasticsearch .net client NEST使用说明 2.x Elasticsearch.Net与NEST是Elasticsearch为C#提供的一套客户端驱动,方便C#调用Elast ...
Lucene 02 - Lucene的入门程序(Java API的简单使用)
目录 1 准备环境 2 准备数据 3 创建工程 3.1 创建Maven Project(打包方式选jar即可) 3.2 配置pom.xml, 导入依赖 4 编写基础代码 4.1 编写图书POJO 4. ...
Elasticsearch .net client NEST使用说明 2.x -更新版
Elasticsearch .net client NEST使用说明目录: Elasticsearch .net client NEST 5.x 使用总结 elasticsearch_.net_cl ...
solr5.3的spellcheck功能
1.增加schema.xml中的检查字段. <field name="title" type="text_cn" indexed="true&q ...
solr特点四: SpellCheck(拼写检查)
接下来,我将介绍如何向应用程序添加 “您是不是要找……”(拼写检查). 提供拼写建议 Lucene 和 Solr 很久以前就开始提供拼写检查功能了,但直到添加了 SearchComponent架构之后 ...
lucene字典实现原理（转）
原文:https://www.cnblogs.com/LBSer/p/4119841.html 1 lucene字典使用lucene进行查询不可避免都会使用到其提供的字典功能,即根据给定的term找 ...
Atitit.项目修改补丁打包工具使用说明
Atitit.项目修改补丁打包工具使用说明 1.1. 打包工具已经在群里面.打包工具.bat1 1.2. 使用方法:放在项目主目录下,执行即可1 1.3. 打包工具的原理以及要打包的项目列表1 1. ...

随机推荐

js从外部获取图片
图片ping:图片可以从任何URL中加载,所以将img的src设置成其它域的URL,即可以实现简单的跨域,可以使用onload和onerror事件来确定是否接受到了响应 var img=new Ima ...
CamanJS – 提供各种图片处理的 JavaScript 库
CamanJS 是一个基于 Canvas 处理图片的 Javascript 库,结合简单易用的接口和先进高效的图像编辑技术.CamanJS 很容易扩展新的过滤器和插件,并伴随着一系列广泛的图像编辑功能 ...
为你的网页图标（Favicon）添加炫丽的动画和图片
Favico.js 在让你的网页图标显示徽章,图像或视频.你设置可以轻松地在网页图标中使用动画,可以自定义类型的动画,背景颜色和文字颜色.它支持的动画,像幻灯片,渐变,弹出等等. 您可能感兴趣的相关文 ...
easyUI 后台经典框架DEMO下载
采用easyui 1.2.6 + jquery 1.7.2 设计有不明白的朋友加群或加我QQ (709047174) Jquery-EasyUi-demo点击我下载
Python 操作 MySQL 之 pysql 与 ORM(转载)
本文针对 Python 操作 MySQL 主要使用的两种方式讲解: 原生模块 pymsql ORM框架 SQLAchemy 本章内容: pymsql 执行 sql 增\删\改\查语句 pymsql ...
SharePoint 2010升级到sharePoint 2013后，人员失去对网站的权限的原因及解决方法。The reason and solution for permission lost after the upgrading
昨天碰到了一个问题,一个网站在从SharePoint 2010升级到SharePoint 2013后,人员都不能登录了,必须重加赋权,人员才能登录,这样非常麻烦. 原因:是认证方式的问题.在Share ...
mac jdk 6设置
新装的mac 系统10.10 ,jdk是1.8,因为一些工具要使用 jdk 6,以下是设置过程查看版本 java -version 查看java是再哪:在/usr/bin/java whereis ...
2016京东Android研发校招笔试题
一.选择题汇总,具体的记不住啦.. 1.计网:ip的网络前缀.SNMP(报文组成):http://blog.csdn.net/shanzhizi/article/details/11606767 参考 ...
如何使用SSL pinning来使你的iOS APP更加安全
SSL pinning在构建一个高度安全的移动APP上扮演了一个十分重要的角色.然而如今好多用户在使用无线移动设备去访问无数不安全的无线网络. 这篇文章主要覆盖了SSL pinning 技术,来帮助我 ...
在iOS开发过程中你遇到这个问题了么？
1.问题:加载UIWebView底部有黑色边框问题. 设置UIWebView opaque为NO,然后设置其背景色为clearColor. 2.问题:iPhone真机输出[UIScreen mainS ...

Lucene 4.x Spellcheck使用说明

Lucene 4.x Spellcheck使用说明的更多相关文章

随机推荐

热门专题