Smart Chinese Analysis插件将Lucene的Smart Chinese分析模块集成到Elasticsearch中，用于分析中文或中英文混合文本。支持的分析器在大型训练语料库上使用基于隐马尔可夫（Markov）模型的概率知识来查找简体中文文本的最佳分词。它使用的策略是首先将输入文本分解为句子，然后对句子进行切分以获得单词。该插件提供了一个称为smartcn分析器的分析器，以及一个称为smartcn_tokenizer的标记器。请注意，两者均不能使用任何参数进行配置。

要将smartcn Analysis插件安装在Elasticsearch Docker容器中，请使用以下屏幕截图中显示的命令。然后，我们重新启动容器以使插件生效：

./bin/elasticsearch-plugin install analysis-smartcn

在Elasticsearch的安装目录运行上面的命令。显示的结果如下：

    $ ./bin/elasticsearch-plugin install analysis-smartcn

    -> Downloading analysis-smartcn from elastic

    [=================================================] 100%

    WARNING: An illegal reflective access operation has occurred

    WARNING: Illegal reflective access by org.bouncycastle.jcajce.provider.drbg.DRBG (file:/Users/liuxg/elastic/elasticsearch-7.3.0/lib/tools/plugin-cli/bcprov-jdk15on-1.61.jar) to constructor sun.security.provider.Sun()

    WARNING: Please consider reporting this to the maintainers of org.bouncycastle.jcajce.provider.drbg.DRBG

    WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations

    WARNING: All illegal access operations will be denied in a future release

    -> Installed analysis-smartcn

    (base) localhost:elasticsearch-7.3.0 liuxg$ ./bin/elasticsearch-plugin list

    analysis-icu

    analysis-ik

    analysis-smartcn

    pinyin

上面显示我们已经成功地把analysis-smartcn安装成功了。针对docker的安装，我们可以通过如下的命令来进入到docker里，再进行安装：

    $ docker exec -it es01 /bin/bash

    [root@ec4d19f59a7d elasticsearch]# ls

    LICENSE.txt  README.textile  config  jdk  logs     plugins

    NOTICE.txt   bin             data    lib  modules

    [root@ec4d19f59a7d elasticsearch]#

在这里es01是docker中的Elasticsearch实例。具体安装请参阅我的文章“Elastic：用Docker部署Elastic栈”。

注意：在我们安装好smartcn分析器后，我们必须重新启动Elasticsearch使它开始起作用。

实例

在下面，我们在Kibana中用一个实例来展示这个用法：

    POST _analyze

    {

      "text": "股市，投资，稳，赚，不，赔，必修课，如何，做，好，仓，位，管理，和，情绪，管理",

      "analyzer": "smartcn"

    }

显示结果：

    {

      "tokens" : [

        {

          "token" : "股市",

          "start_offset" : 0,

          "end_offset" : 2,

          "type" : "word",

          "position" : 0

        },

        {

          "token" : "投资",

          "start_offset" : 3,

          "end_offset" : 5,

          "type" : "word",

          "position" : 2

        },

        {

          "token" : "稳",

          "start_offset" : 6,

          "end_offset" : 7,

          "type" : "word",

          "position" : 4

        },

        {

          "token" : "赚",

          "start_offset" : 8,

          "end_offset" : 9,

          "type" : "word",

          "position" : 6

        },

        {

          "token" : "不",

          "start_offset" : 10,

          "end_offset" : 11,

          "type" : "word",

          "position" : 8

        },

        {

          "token" : "赔",

          "start_offset" : 12,

          "end_offset" : 13,

          "type" : "word",

          "position" : 10

        },

        {

          "token" : "必修课",

          "start_offset" : 14,

          "end_offset" : 17,

          "type" : "word",

          "position" : 12

        },

        {

          "token" : "如何",

          "start_offset" : 18,

          "end_offset" : 20,

          "type" : "word",

          "position" : 14

        },

        {

          "token" : "做",

          "start_offset" : 21,

          "end_offset" : 22,

          "type" : "word",

          "position" : 16

        },

        {

          "token" : "好",

          "start_offset" : 23,

          "end_offset" : 24,

          "type" : "word",

          "position" : 18

        },

        {

          "token" : "仓",

          "start_offset" : 25,

          "end_offset" : 26,

          "type" : "word",

          "position" : 20

        },

        {

          "token" : "位",

          "start_offset" : 27,

          "end_offset" : 28,

          "type" : "word",

          "position" : 22

        },

        {

          "token" : "管理",

          "start_offset" : 29,

          "end_offset" : 31,

          "type" : "word",

          "position" : 24

        },

        {

          "token" : "和",

          "start_offset" : 32,

          "end_offset" : 33,

          "type" : "word",

          "position" : 26

        },

        {

          "token" : "情绪",

          "start_offset" : 34,

          "end_offset" : 36,

          "type" : "word",

          "position" : 28

        },

        {

          "token" : "管理",

          "start_offset" : 37,

          "end_offset" : 39,

          "type" : "word",

          "position" : 30

        }

      ]

    }

Elasticsearch：Smart Chinese Analysis plugin的更多相关文章

Elasticsearch：Pinyin 分词器
Elastic的Medcl提供了一种搜索Pinyin搜索的方法.拼音搜索在很多的应用场景中都有被用到.比如在百度搜索中,我们使用拼音就可以出现汉字: 对于我们中国人来说,拼音搜索也是非常直接的.那么在 ...
Elasticsearch：IK中文分词器
Elasticsearch内置的分词器对中文不友好,只会一个字一个字的分,无法形成词语,比如: POST /_analyze { "text": "我爱北京天安门&quo ...
ElasticSearch：分析器
ElasticSearch入门第七篇:分析器这是ElasticSearch 2.4 版本系列的第七篇: ElasticSearch入门第一篇:Windows下安装ElasticSearch El ...
Elasticsearch：如何对PDF文件进行搜索
Elasticsearch 通常用于字符串,数字,日期等数据类型的检索,但是在 HCM.ERP 和电子商务等应用程序中经常存在对办公文档进行搜索的需求.今天的这篇文章中我们来讲一下如何实现 PDF.D ...
Elasticsearch：定制分词器（analyzer）及相关性
转载自:https://elasticstack.blog.csdn.net/article/details/114278163 在许多的情况下,我们使用现有的分词器已经足够满足我们许多的业务需求,但 ...
Elasticsearch：如何实现对 emoji 表情符号进行搜索
转摘自:https://elasticstack.blog.csdn.net/article/details/114261636 Elasticsearch 是一个应用非常广泛的搜索引擎.它可以对文字 ...
DBA应用技巧：如何升级InnoDB Plugin
DBA应用技巧:如何升级InnoDB Plugin 2011-03-23 10:09 康凯 ITPUB 字号:T | T 本文中,我们将向读者详细介绍如何升级动态InnoDB Plugin和升级静态编 ...
Elasticsearch：运用search_after来进行深度分页
在上一篇文章 "Elasticsearch:运用scroll接口对大量数据实现更好的分页",我们讲述了如何运用scroll接口来对大量数据来进行有效地分页.在那篇文章中,我们讲述了 ...
Elasticsearch：Index生命周期管理入门
如果您要处理时间序列数据,则不想将所有内容连续转储到单个索引中. 取而代之的是,您可以定期将数据滚动到新索引,以防止数据过大而又缓慢又昂贵. 随着索引的老化和查询频率的降低,您可能会将其转移到价格较低 ...

随机推荐

Mysql数据库的默认引擎
InnoDB的优势在于提供了良好的事务处理.崩溃修复能力和并发控制.缺点是读写效率较差,占用的数据空间相对较大. ①InnoDB:支持事务处理,支持外键,支持崩溃修复能力和并发控制.如果需要对事务的完 ...
react配置postcss-pxtorem适配
适配移动端操作如下: 安装 postcss-pxtorem .amfe-flexible npm i postcss-pxtorem npm i amfe-flexible amfe-flexible ...
DENIED Redis is running in protected mode because protected mode is enabled
DENIED Redis is running in protected mode because protected mode is enabled redisson连接错误 Unable to i ...
【百度飞桨】手写数字识别模型部署Paddle Inference
从完成一个简单的『手写数字识别任务』开始,快速了解飞桨框架 API 的使用方法. 模型开发『手写数字识别』是深度学习里的 Hello World 任务,用于对 0 ~ 9 的十类数字进行分类,即输入 ...
背包问题学习笔记 / Dynamic Programming（updating）
01背包问题朴素版:(二维数组) 状态表示: dp[i][j]:从前i个物品中选择(每个物品只能选0或1个)且总体积不超过j的集合的最大价值,则dp[n][m]就是最终答案(n:物品数量,m ...
Odoo4 tree视图左上角新增Button
# 一.直接在tree根元素中新增.这种有个限制就是必须要勾选一或多条记录的时候按钮才会显示 <tree> <header> <button type="obj ...
我分析30w条数据后发现，西安新房公摊最低的竟是这里？
前两天一个邻居发出了灵魂质问:"为什么我买的180平和你的169平看上去一样大?" "因为咱俩的套内面积都是138平......" 我们去看房子,比较不同楼盘的 ...
Flutter-填平菜鸟和高手之间的沟壑
Flutter-填平菜鸟和高手之间的沟壑准备写作中... 1.Flutter-skia-影像,Flutter skia-图形渲染层.应用渲染层2.方法通道使用示例,用于演示如何使用方法通道实现与原生 ...
Luogu2073 送花（平衡树）
打感叹号处为傻逼处 #include <iostream> #include <cstdio> #include <cstring> #include <al ...
Http 前端向后端传递List参数
场景在日常项目开发中,前端向后端传参时,可能会遇到需要传 List 类型的参数.比如批量删除时将多个 ID 以集合的形式传给后台. 前端传参此时前端传参有两种方式: 1.多个同名 key key ...

Elasticsearch：Smart Chinese Analysis plugin

实例

Elasticsearch：Smart Chinese Analysis plugin的更多相关文章

随机推荐

热门专题