elasticsearch2.x ik插件

先来一个标准分词（standard），配置如下：

curl -XPUT localhost:/local -d '{

    "settings" : {

        "analysis" : {

            "analyzer" : {

                "stem" : {

                    "tokenizer" : "standard",

                    "filter" : ["standard", "lowercase", "stop", "porter_stem"]

                }

            }

        }

    },

    "mappings" : {

        "article" : {

            "dynamic" : true,

            "properties" : {

                "title" : {

                    "type" : "string",

                    "analyzer" : "stem"

                }

            }

        }

    }

}'

index:local

type:article

default analyzer:stem (filter:小写、停用词等)

field:title　　

测试：

# Index Data

curl -XPUT localhost:/local/article/ -d'{"title": "Fight for your life"}'

curl -XPUT localhost:/local/article/ -d'{"title": "Fighting for your life"}'

curl -XPUT localhost:/local/article/ -d'{"title": "My dad fought a dog"}'

curl -XPUT localhost:/local/article/ -d'{"title": "Bruno fights Tyson tomorrow"}'

# search on the title field, which is stemmed on index and search

curl -XGET localhost:/local/_search?q=title:fight

# searching on _all will not do anystemming, unless also configured on the mapping to be stemmed...

curl -XGET localhost:/local/_search?q=fight

例如：

Fight for your life

分词如下：

{"tokens":[

{"token":"fight","start_offset":,"end_offset":,"type":"<ALPHANUM>","position":},<br>{"token":"your","start_offset":,"end_offset":,"type":"<ALPHANUM>","position":},<br>{"token":"life","start_offset":,"end_offset":,"type":"<ALPHANUM>","position":}

]}

部署ik分词器

在elasticsearch.yml中配置 index.analysis.analyzer.ik.type : "ik"

delete之前创建的index，重新配置如下：

curl -XPUT localhost:/local -d '{

    "settings" : {

        "analysis" : {

            "analyzer" : {

                "ik" : {

                    "tokenizer" : "ik"

                }

            }

        }

    },

    "mappings" : {

        "article" : {

            "dynamic" : true,

            "properties" : {

                "title" : {

                    "type" : "string",

                    "analyzer" : "ik"

                }

            }

        }

    }

}'

测试：

curl 'http://localhost:9200/local/_analyze?analyzer=ik&pretty=true' -d'  

{  

    "text":"中华人民共和国国歌" 

}  

'  

{

  "tokens" : [ {

    "token" : "text",

    "start_offset" : ,

    "end_offset" : ,

    "type" : "ENGLISH",

    "position" : 

  }, {

    "token" : "中华人民共和国",

    "start_offset" : ,

    "end_offset" : ,

    "type" : "CN_WORD",

    "position" : 

  }, {

    "token" : "国歌",

    "start_offset" : ,

    "end_offset" : ,

    "type" : "CN_WORD",

    "position" : 

  } ]

}

如果我们想返回最细粒度的分词结果，需要在elasticsearch.yml中配置如下：

index:

  analysis:

    analyzer:

      ik:

          alias: [ik_analyzer]

          type: org.elasticsearch.index.analysis.IkAnalyzerProvider

      ik_smart:

          type: ik

          use_smart: true

      ik_max_word:

          type: ik

          use_smart: false

测试：

curl 'http://localhost:9200/index/_analyze?analyzer=ik_max_word&pretty=true' -d'  

{  

    "text":"中华人民共和国国歌" 

}  

'  

{

  "tokens" : [ {

    "token" : "text",

    "start_offset" : ,

    "end_offset" : ,

    "type" : "ENGLISH",

    "position" : 

  }, {

    "token" : "中华人民共和国",

    "start_offset" : ,

    "end_offset" : ,

    "type" : "CN_WORD",

    "position" : 

  }, {

    "token" : "中华人民",

    "start_offset" : ,

    "end_offset" : ,

    "type" : "CN_WORD",

    "position" : 

  }, {

    "token" : "中华",

    "start_offset" : ,

    "end_offset" : ,

    "type" : "CN_WORD",

    "position" : 

  }, {

    "token" : "华人",

    "start_offset" : ,

    "end_offset" : ,

    "type" : "CN_WORD",

    "position" : 

  }, {

    "token" : "人民共和国",

    "start_offset" : ,

    "end_offset" : ,

    "type" : "CN_WORD",

    "position" : 

  }, {

    "token" : "人民",

    "start_offset" : ,

    "end_offset" : ,

    "type" : "CN_WORD",

    "position" : 

  }, {

    "token" : "共和国",

    "start_offset" : ,

    "end_offset" : ,

    "type" : "CN_WORD",

    "position" : 

  }, {

    "token" : "共和",

    "start_offset" : ,

    "end_offset" : ,

    "type" : "CN_WORD",

    "position" : 

  }, {

    "token" : "国",

    "start_offset" : ,

    "end_offset" : ,

    "type" : "CN_CHAR",

    "position" : 

  }, {

    "token" : "国歌",

    "start_offset" : ,

    "end_offset" : ,

    "type" : "CN_WORD",

    "position" : 

  } ]

}

elasticsearch2.x ik插件的更多相关文章

ElasticSearch搜索引擎安装配置中文分词器IK插件
近几篇ElasticSearch系列: 1.阿里云服务器Linux系统安装配置ElasticSearch搜索引擎 2.Linux系统中ElasticSearch搜索引擎安装配置Head插件 3.Ela ...
elasticsearch 口水篇（8）分词中文分词 ik插件
先来一个标准分词(standard),配置如下: curl -XPUT localhost:9200/local -d '{ "settings" : { "analys ...
Elastic ik插件配置热更新功能
ik github地址:https://github.com/medcl/elasticsearch-analysis-ik 官网说明: 热更新 IK 分词使用方法目前该插件支持热更新 IK 分词, ...
【自定义IK词典】Elasticsearch之中文分词器插件es-ik的自定义词库
Elasticsearch之中文分词器插件es-ik 针对一些特殊的词语在分词的时候也需要能够识别有人会问,那么,例如: 如果我想根据自己的本家姓氏来查询,如zhouls,姓氏“周”. 如 ...
Elasticsearch安装ik中文分词插件（四）
一.IK简介 IK Analyzer是一个开源的,基于java语言开发的轻量级的中文分词工具包.从2006年12月推出1.0版开始, IKAnalyzer已经推出了4个大版本.最初,它是以开源项目Lu ...
在ElasticSearch中使用 IK 中文分词插件
我这里集成好了一个自带IK的版本,下载即用, https://github.com/xlb378917466/elasticsearch5.2.include_IK 添加了IK插件意味着你可以使用ik ...
ElasticSearch(三) ElasticSearch中文分词插件IK的安装
正因为Elasticsearch 内置的分词器对中文不友好,会把中文分成单个字来进行全文检索,所以我们需要借助中文分词插件来解决这个问题. 一.安装maven管理工具 Elasticsearch 要使 ...
ES之一：Elasticsearch6.4 windows安装 head插件ik分词插件安装
准备安装目标:1.Elasticsearch6.42.head插件3.ik分词插件第一步:安装Elasticsearch6.4 下载方式:1.官网下载 https://www.elastic.co/ ...
Elastic Stack 笔记（二）Elasticsearch5.6 安装 IK 分词器和 Head 插件
博客地址:http://www.moonxy.com 一.前言 Elasticsearch 作为开源搜索引擎服务器,其核心功能在于索引和搜索数据.索引是把文档写入 Elasticsearch 的过程, ...

随机推荐

【codesmith】初次试用（未发布）
一直都有听过codesmith,一个很好用的软件,减少大量代码的输入,比如你生成一个list,并添加5个项. ArrayList list1=new ArrayList(); list1.Add( ...
levelDB, TokuDB, BDB等kv存储引擎性能对比——wiredtree, wiredLSM，LMDB读写很强啊
在:http://www.lmdb.tech/bench/inmem/ 2. Small Data Set Using the laptop we generate a database with 2 ...
App自动化测试探索（一）借助Appium实现APP的自动化测试
移动应用测试十大要领: 选择系统平台选择测试设备的品牌注意行业和设备区分关注Android的更新不要忘记老设备灵活使用Web分析工具注意区分地区.运营商和网络技术掌握智能手机的屏幕分辨率 ...
条款35：考虑virtual函数以外的其他选择
有一部分人总是主张virtual函数几乎总应该是private:例如下面这个例子,例子时候游戏,游戏里面的任务都拥有健康值这一属性: class GameCharacter{ public: int ...
python3中zip()函数的用法
>>>a = [1,2,3] >>> b = [4,5,6] >>> c = [4,5,6,7,8] >>> zipped = ...
InnoDB引擎的特点及优化方法
1.什么是InnoDB引擎? InnoDB引擎是MySQL数据库的另一个重要的存储引擎,正成为目前MySQL AB所发行的新版的标准,被包含在所有二进制安装包里,和其他存储引擎相比,Inno ...
[BZOJ1022][SHOI2008]小约翰的游戏
bzoj luogu sol 显然这个玩意儿和普通\(Nim\)游戏是有区别的. 形式化的,\(Nim\)游戏的关键在于决策集合为空者负,而这里的决策集合为空者胜. 所以就显然不能直接用\(SG\)函 ...
Websphere中的几个常用概念
什么是单元(Cell)?什么是节点(Node)?Node.Profile 与 Server 之间的关系是什么? 答: 单元: 单元是整个分布式网络中一个或多个节点的逻辑分组.单元是一个配置概念,是管理 ...
secret CRT 会话光标不闪烁问题
点击选项->会话选项然后在取消即可,就有了闪烁的光标,应该是个bug.
对oracle中date/timestamp的操作
设置oracle中date的会话格式为 'yyyy-mm-dd hh24:mi:ss' alter session set nls_date_format='yyyy-mm-dd hh24:mi:ss ...

elasticsearch2.x ik插件

elasticsearch2.x ik插件的更多相关文章

随机推荐

热门专题