elasticsearch2.x ik插件

先来一个标准分词（standard），配置如下：

curl -XPUT localhost:/local -d '{

    "settings" : {

        "analysis" : {

            "analyzer" : {

                "stem" : {

                    "tokenizer" : "standard",

                    "filter" : ["standard", "lowercase", "stop", "porter_stem"]

                }

            }

        }

    },

    "mappings" : {

        "article" : {

            "dynamic" : true,

            "properties" : {

                "title" : {

                    "type" : "string",

                    "analyzer" : "stem"

                }

            }

        }

    }

}'

index:local

type:article

default analyzer:stem (filter:小写、停用词等)

field:title　　

测试：

# Index Data

curl -XPUT localhost:/local/article/ -d'{"title": "Fight for your life"}'

curl -XPUT localhost:/local/article/ -d'{"title": "Fighting for your life"}'

curl -XPUT localhost:/local/article/ -d'{"title": "My dad fought a dog"}'

curl -XPUT localhost:/local/article/ -d'{"title": "Bruno fights Tyson tomorrow"}'

# search on the title field, which is stemmed on index and search

curl -XGET localhost:/local/_search?q=title:fight

# searching on _all will not do anystemming, unless also configured on the mapping to be stemmed...

curl -XGET localhost:/local/_search?q=fight

例如：

Fight for your life

分词如下：

{"tokens":[

{"token":"fight","start_offset":,"end_offset":,"type":"<ALPHANUM>","position":},<br>{"token":"your","start_offset":,"end_offset":,"type":"<ALPHANUM>","position":},<br>{"token":"life","start_offset":,"end_offset":,"type":"<ALPHANUM>","position":}

]}

部署ik分词器

在elasticsearch.yml中配置 index.analysis.analyzer.ik.type : "ik"

delete之前创建的index，重新配置如下：

curl -XPUT localhost:/local -d '{

    "settings" : {

        "analysis" : {

            "analyzer" : {

                "ik" : {

                    "tokenizer" : "ik"

                }

            }

        }

    },

    "mappings" : {

        "article" : {

            "dynamic" : true,

            "properties" : {

                "title" : {

                    "type" : "string",

                    "analyzer" : "ik"

                }

            }

        }

    }

}'

测试：

curl 'http://localhost:9200/local/_analyze?analyzer=ik&pretty=true' -d'  

{  

    "text":"中华人民共和国国歌" 

}  

'  

{

  "tokens" : [ {

    "token" : "text",

    "start_offset" : ,

    "end_offset" : ,

    "type" : "ENGLISH",

    "position" : 

  }, {

    "token" : "中华人民共和国",

    "start_offset" : ,

    "end_offset" : ,

    "type" : "CN_WORD",

    "position" : 

  }, {

    "token" : "国歌",

    "start_offset" : ,

    "end_offset" : ,

    "type" : "CN_WORD",

    "position" : 

  } ]

}

如果我们想返回最细粒度的分词结果，需要在elasticsearch.yml中配置如下：

index:

  analysis:

    analyzer:

      ik:

          alias: [ik_analyzer]

          type: org.elasticsearch.index.analysis.IkAnalyzerProvider

      ik_smart:

          type: ik

          use_smart: true

      ik_max_word:

          type: ik

          use_smart: false

测试：

curl 'http://localhost:9200/index/_analyze?analyzer=ik_max_word&pretty=true' -d'  

{  

    "text":"中华人民共和国国歌" 

}  

'  

{

  "tokens" : [ {

    "token" : "text",

    "start_offset" : ,

    "end_offset" : ,

    "type" : "ENGLISH",

    "position" : 

  }, {

    "token" : "中华人民共和国",

    "start_offset" : ,

    "end_offset" : ,

    "type" : "CN_WORD",

    "position" : 

  }, {

    "token" : "中华人民",

    "start_offset" : ,

    "end_offset" : ,

    "type" : "CN_WORD",

    "position" : 

  }, {

    "token" : "中华",

    "start_offset" : ,

    "end_offset" : ,

    "type" : "CN_WORD",

    "position" : 

  }, {

    "token" : "华人",

    "start_offset" : ,

    "end_offset" : ,

    "type" : "CN_WORD",

    "position" : 

  }, {

    "token" : "人民共和国",

    "start_offset" : ,

    "end_offset" : ,

    "type" : "CN_WORD",

    "position" : 

  }, {

    "token" : "人民",

    "start_offset" : ,

    "end_offset" : ,

    "type" : "CN_WORD",

    "position" : 

  }, {

    "token" : "共和国",

    "start_offset" : ,

    "end_offset" : ,

    "type" : "CN_WORD",

    "position" : 

  }, {

    "token" : "共和",

    "start_offset" : ,

    "end_offset" : ,

    "type" : "CN_WORD",

    "position" : 

  }, {

    "token" : "国",

    "start_offset" : ,

    "end_offset" : ,

    "type" : "CN_CHAR",

    "position" : 

  }, {

    "token" : "国歌",

    "start_offset" : ,

    "end_offset" : ,

    "type" : "CN_WORD",

    "position" : 

  } ]

}

elasticsearch2.x ik插件的更多相关文章

ElasticSearch搜索引擎安装配置中文分词器IK插件
近几篇ElasticSearch系列: 1.阿里云服务器Linux系统安装配置ElasticSearch搜索引擎 2.Linux系统中ElasticSearch搜索引擎安装配置Head插件 3.Ela ...
elasticsearch 口水篇（8）分词中文分词 ik插件
先来一个标准分词(standard),配置如下: curl -XPUT localhost:9200/local -d '{ "settings" : { "analys ...
Elastic ik插件配置热更新功能
ik github地址:https://github.com/medcl/elasticsearch-analysis-ik 官网说明: 热更新 IK 分词使用方法目前该插件支持热更新 IK 分词, ...
【自定义IK词典】Elasticsearch之中文分词器插件es-ik的自定义词库
Elasticsearch之中文分词器插件es-ik 针对一些特殊的词语在分词的时候也需要能够识别有人会问,那么,例如: 如果我想根据自己的本家姓氏来查询,如zhouls,姓氏“周”. 如 ...
Elasticsearch安装ik中文分词插件（四）
一.IK简介 IK Analyzer是一个开源的,基于java语言开发的轻量级的中文分词工具包.从2006年12月推出1.0版开始, IKAnalyzer已经推出了4个大版本.最初,它是以开源项目Lu ...
在ElasticSearch中使用 IK 中文分词插件
我这里集成好了一个自带IK的版本,下载即用, https://github.com/xlb378917466/elasticsearch5.2.include_IK 添加了IK插件意味着你可以使用ik ...
ElasticSearch(三) ElasticSearch中文分词插件IK的安装
正因为Elasticsearch 内置的分词器对中文不友好,会把中文分成单个字来进行全文检索,所以我们需要借助中文分词插件来解决这个问题. 一.安装maven管理工具 Elasticsearch 要使 ...
ES之一：Elasticsearch6.4 windows安装 head插件ik分词插件安装
准备安装目标:1.Elasticsearch6.42.head插件3.ik分词插件第一步:安装Elasticsearch6.4 下载方式:1.官网下载 https://www.elastic.co/ ...
Elastic Stack 笔记（二）Elasticsearch5.6 安装 IK 分词器和 Head 插件
博客地址:http://www.moonxy.com 一.前言 Elasticsearch 作为开源搜索引擎服务器,其核心功能在于索引和搜索数据.索引是把文档写入 Elasticsearch 的过程, ...

随机推荐

python字典方法
本文参考自<python基础教程 (第二版)> 操作语法举例结果建立字典 dict() 1.以关键字参数建立字典 2.以其他映射作为参数建立字典 1.d = dict(name=' ...
poj 1185 状压dp+优化
http://poj.org/problem?id=1185 炮兵阵地 Time Limit: 2000MS Memory Limit: 65536K Total Submissions: 291 ...
吐槽XE3中的BUG：无法调试32位的应用程序
我想用的功能在XE5中有BUG, 无奈转移到XE3中测试,发现了XE3还有另外一个问题:无法DEBUG 32位的应用程序,这算什么事啊?有人说把项目属性中的link with dynamic RTL去 ...
NAT123之类的软件是如何实现访问域名然后穿透到内网主机的80端口？——有公网ip就是动态域名解析，没有就是穿透+代理转发
实际上两种都用到了:1,服务提供商(123NAT,花生壳)做Proxy转发 2,两个私网地址直接连接:STUN 和TURN 使用动态域名解析还是端口映射什么情况下使用动态域名解析?什么情况下使用 ...
VSCode设置中文语言显示
Vscode是一款开源的跨平台编辑器.默认情况下,vscode使用的语言为英文(us),如何将其显示语言修改成中文了? 1)打开vscode工具: 2)使用快捷键组合[Ctrl+Shift+p],在搜 ...
deep learning 自编码算法详细理解与代码实现（超详细）
在有监督学习中,训练样本是有类别标签的.现在假设我们只有一个没有带类别标签的训练样本集合 ,其中 .自编码神经网络是一种无监督学习算法,它使用了反向传播算法,并让目标值等于输入值,比如 .下图是一个自 ...
mysql 初识数据库
一数据库管理软件的由来基于我们之前所学,数据要想永久保存,都是保存于文件中,毫无疑问,一个文件仅仅只能存在于某一台机器上. 如果我们暂且忽略直接基于文件来存取数据的效率问题,并且假设程序所有的组件 ...
C语言编译全过程
编译的概念:编译程序读取源程序(字符流),对之进行词法和语法的分析,将高级语言指令转换为功能等效的汇编代码,再由汇编程序转换为机器语言,并且按照操作系统对可执行文件格式的要求链接生成可执行程序. ...
C#进阶之路（六）：表达式进行类的赋值
好久没更新这个系列了,最近看.NET CORE源码的时候,发现他的依赖注入模块的很多地方用了表达式拼接实现的.比如如下代码 private Expression<Func<ServiceP ...
Communication System（动态规划）
个人心得:百度推荐的简单DP题,自己做了下发现真得水,看了题解发现他们的思维真得比我好太多太多, 这是一段漫长的锻炼路呀. 关于这道题,我最开始用DP的思路,找子状态,发现自己根本就不会找DP状态数组 ...

elasticsearch2.x ik插件

elasticsearch2.x ik插件的更多相关文章

随机推荐

热门专题