Elasticsearch 中文分词(elasticsearch-analysis-ik) 安装

由于elasticsearch基于lucene，所以天然地就多了许多lucene上的中文分词的支持，比如 IK, Paoding, MMSEG4J等lucene中文分词原理上都能在elasticsearch上使用。当然前提是有elasticsearch的插件。至于插件怎么开发，这里有一片文章介绍：
http://log.medcl.net/item/2011/07/diving-into-elasticsearch-3-custom-analysis-plugin/
暂时还没时间看，留在以后仔细研究，这里只记录本人使用medcl提供的IK分词插件的集成步骤。

安装步骤：

1、到github网站下载源代码，网站地址为：https://github.com/medcl/elasticsearch-analysis-ik

右侧下方有一个按钮“Download ZIP"，点击下载源代码elasticsearch-analysis-ik-master.zip。

2、解压文件elasticsearch-analysis-ik-master.zip，进入下载目录，执行命令：

unzip elasticsearch-analysis-ik-master.zip

3、因为是源代码，此处需要使用maven打包，进入解压文件夹中，执行命令：

4、将打包后，得到的目录文件target/releases下的elasticsearch-analysis-ik-1.9.4.zip复制到ES安装目录的plugins/analysis-ik目录下。

5、在plugins/analysis-ik目录下解压elasticsearch-analysis-ik-1.9.4.zip

6、在ES的配置文件elasticsearch.yml中增加ik的配置，在最后增加：

index.analysis.analyzer.ik.type: "ik"

7、重新启动elasticsearch服务，这样就完成配置了，收入命令：

curl -XPOST "http://localhost:9200/_analyze?analyzer=ik&pretty=true&text=helloworld,中华人民共和国"

测试结果如下：

{

  "tokens" : [ {

    "token" : "helloworld",

    "start_offset" : 0,

    "end_offset" : 10,

    "type" : "ENGLISH",

    "position" : 0

  }, {

    "token" : "中华人民共和国",

    "start_offset" : 11,

    "end_offset" : 18,

    "type" : "CN_WORD",

    "position" : 1

  }, {

    "token" : "中华人民",

    "start_offset" : 11,

    "end_offset" : 15,

    "type" : "CN_WORD",

    "position" : 2

  }, {

    "token" : "中华",

    "start_offset" : 11,

    "end_offset" : 13,

    "type" : "CN_WORD",

    "position" : 3

  }, {

    "token" : "华人",

    "start_offset" : 12,

    "end_offset" : 14,

    "type" : "CN_WORD",

    "position" : 4

  }, {

    "token" : "人民共和国",

    "start_offset" : 13,

    "end_offset" : 18,

    "type" : "CN_WORD",

    "position" : 5

  }, {

    "token" : "人民",

    "start_offset" : 13,

    "end_offset" : 15,

    "type" : "CN_WORD",

    "position" : 6

  }, {

    "token" : "共和国",

    "start_offset" : 15,

    "end_offset" : 18,

    "type" : "CN_WORD",

    "position" : 7

  }, {

    "token" : "共和",

    "start_offset" : 15,

    "end_offset" : 17,

    "type" : "CN_WORD",

    "position" : 8

  }, {

    "token" : "国",

    "start_offset" : 17,

    "end_offset" : 18,

    "type" : "CN_CHAR",

    "position" : 9

  } ]

}

注意点：

本人绕了很多弯路，网上很多都不行，总结：

一、maven一定要编译，因为elasticsearch和ik各个版本不同，对应编译生成的文件就不同，所以想引用elasticsearch-rtm包的朋友，一定要注意区分。

二、我是通过rpm安装elasticsearch，事实证明字典config目录，可以在plugins目录下，和插件unzip放在一起

参考资料:

elasticsearch中文分词

elasticsearch安装plugin----ik

ElasticSearch中文分词ik安装

Elasticsearch初步使用(安装、Head配置、分词器配置)

Elasticsearch 中文分词(elasticsearch-analysis-ik) 安装的更多相关文章

elasticsearch 中文分词（elasticsearch-analysis-ik）安装
elasticsearch 中文分词(elasticsearch-analysis-ik)安装下载最新的发布版本 https://github.com/medcl/elasticsearch-ana ...
elasticsearch 中文分词、插件的安装和使用(一)
1. 安装elasticsearch.kibana.x-pack #安装elasticsearch wget https://artifacts.elastic.co/downloads/elasti ...
elasticsearch中文分词器（ik）配置
elasticsearch默认的分词:http://localhost:9200/userinfo/_analyze?analyzer=standard&pretty=true&tex ...
Windows ElasticSearch中文分词配置
elasticsearch官方只提供smartcn这个中文分词插件,效果不是很好,好在国内有medcl大神(国内最早研究es的人之一)写的两个中文分词插件,一个是ik的,一个是mmseg的,下面分别介 ...
ElasticSearch(三) ElasticSearch中文分词插件IK的安装
正因为Elasticsearch 内置的分词器对中文不友好,会把中文分成单个字来进行全文检索,所以我们需要借助中文分词插件来解决这个问题. 一.安装maven管理工具 Elasticsearch 要使 ...
ElasticSearch中文分词（IK）
ElasticSearch常用的很受欢迎的是IK,这里稍微介绍下安装过程及测试过程. 1.ElasticSearch官方分词自带的中文分词器很弱,可以体检下: [zsz@VS-zsz ~]$ c ...
ElasticSearch 中文分词插件ik 的使用
下载 IK 的版本要与 Elasticsearch 的版本一致,因此下载 7.1.0 版本. 安装 1.中文分词插件下载地址:https://github.com/medcl/elasticsearc ...
实战ELK（8）安装ElasticSearch中文分词器
安装方法1 - download pre-build package from here: https://github.com/medcl/elasticsearch-analysis-ik/re ...
elasticsearch中文分词器ik-analyzer安装
前面我们介绍了Centos安装elasticsearch 6.4.2 教程,elasticsearch内置的分词器对中文不友好,只会一个字一个字的分,无法形成词语,别急,已经有大拿把中文分词器做好了, ...

随机推荐

打印十字图 queue 搞定
题目描述小明为某机构设计了一个十字型的徽标(并非红十字会啊),如下所示: ..$$$$$$$$$$$$$.. ..$...........$.. $$$.$$$$$$$$$.$$$ $...$... ...
出现“unrecognized selector sent to instance”问题原因之一及解决方法。
对于iPhone开发初学者来说,很想实现自己在iPhone上的第一个小程序,准备工作就绪侯就信心满满的开始了!一般来说大家可能都是从Hello World做起吧. 反正我是的,:),如果按照文 ...
常见编码和编码头BOM
ANSI(American National Standards Institute,美国国家标准学会)ANSI编码标准是指所有从基本ASCII码基础上发展起来的编码标准,比如扩展的ASCII码(12 ...
TortoiseGit状态图标不能显示
一开始网上搜到的办法基本都一样,都试过了,没有效果: 办法一: 注册表中找到 HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\CurrentVersion\ ...
CentOS6.5+nginx+mysql+php(laravel)服务器环境搭建
公司准备迭代会员中心项目,要上laravel框架,替代以前的Ecshop框架,PHP工程师将部分功能页面代码提交,自己也准备着手搭建一个测试环境将项目跑起来: 一. 环境依赖安装设置关闭防火墙 [r ...
cocos2d-x聊天气泡
用cocos2d-x做聊天气泡在网上搜索了一下提示用CCScale9Sprite,这个类可以不缩放边角只缩放中心,正好符合气泡的要求. 说一下思路,头像都是用cocosbuilder做的ccb,在代 ...
关于Aspose强大的应用--EXECL
protected void btnConfirg_Click(object sender, EventArgs e) { genExcel(); } //设置内容文字色表中有一个蓝色文字列和绿色文 ...
OGNL是Object-Graph Navigation Language的缩写，它是一种功能强大的表达式语言
OGNL是Object-Graph Navigation Language的缩写,它是一种功能强大的表达式语言(ExpressionLanguage,简称为EL),通过它简单一致的表达式语法,可以存取 ...
web 汇率
http://www.cnblogs.com/beimeng/p/3789940.html 网站虽小,五脏俱全(干货) 前言最近一个朋友让帮忙做一个汇率换算的网站,用业余时间,到最后总算是实现了 ...
dm8127之核间通信syslink
Last updated: June 23, 2010 Contents [hide] 1 About SysLink 1.1 SysLink Architecture 1.2 SysLink Usa ...

Elasticsearch 中文分词(elasticsearch-analysis-ik) 安装

安装步骤：

Elasticsearch 中文分词(elasticsearch-analysis-ik) 安装的更多相关文章

随机推荐

热门专题