你还不会ES的CUD吗？

近端时间在搬砖过程中对es进行了操作，但是对es查询文档不熟悉，所以这两周都在研究es，简略看了《Elasticsearch权威指南》，摸摸鱼又是一天。

es是一款基于Lucene的实时分布式搜索和分析引擎，今天咱不聊其应用场景，聊一下es索引增删改。

环境：Centos 7，Elasticsearch6.8.3，jdk8

（最新的es是7版本，7版本需要jdk11以上，所以装了es6.8.3版本。）

下面都将以student索引为例

一、创建索引

PUT   http://192.168.197.100:9200/student

{

    "mapping":{

      "_doc":{ //“_doc”是类型type，es6中一个索引下只有一个type，不能有其它type

        "properties":{

          "id": {

              "type": "keyword"

          },

          "name":{

            "type":"text",

            "index":"analyzed",

            "analyzer":"standard"

          },

          "age":{

            "type":"integer",

            "fields": {

              "keyword": {

                "type": "keyword",

                "ignore_above":256

              }

            }

          },

          "birthday":{

            "type":"date"

          },

          "gender":{

            "type":"keyword"

          },

          "grade":{

            "type":"text",

            "fields":{

              "keyword":{

                "type":"keyword",

                 "ignore_above":256

              }

            }

          },

          "class":{

            "type":"text",

            "fields":{

              "keyword":{

                "type":"keyword",

                 "ignore_above":256

              }

            }

          }

        }

      }

    },

    "settings":{

      //主分片数量

      "number_of_shards" : 1,

      //分片副本数量

      "number_of_replicas" : 1

    }

}

type属性是text和keyword的区别：

（1）text在查询的时候会被分词，用于搜索

（2）keyword在查询的时候不会被分词，用于聚合

index属性是表示字符串以何种方式被索引，有三种值

（1）analyzed：字段可以被模糊匹配，类似于sql中的like

（2）not_analyzed：字段只能精确匹配，类似于sql中的“=”

（3）no：字段不提供搜索

analyzer属性是设置分词器，中文的话一般是ik分词器，也可以自定义分词器。

number_of_shards属性是主分片数量，默认是5，创建之后不能修改

number_of_replicas属性时分片副本数量，默认是1，可以修改

创建成功之后会返回如下json字符串

{    "acknowledged": true,    "shards_acknowledged": true,    "index": "student"}

创建之后如何查看索引的详细信息呢？

GET http://192.168.197.100:9200/student/_mapping

es6版本，索引之下只能有一个类型，例如上文中的“_doc”。

es跟关系型数据库比较：

二、修改索引

//修改分片副本数量为2

PUT http://192.168.197.100:9200/student/_settings

{

  "number_of_replicas":2

}

三、删除索引

//删除单个索引

DELETE http://192.168.197.100:9200/student

//删除所有索引

DELETE  http://192.168.197.100:9200/_all

四、默认分词器standard和ik分词器比较

es默认的分词器是standard，它对英文的分词是以空格分割的，中文则是将一个词分成一个一个的文字，所以其不适合作为中文分词器。

例如：standard对英文的分词

//此api是查看文本分词情况的

POST http://192.168.197.100:9200/_analyze

{

  "text":"the People's Republic of China",

  "analyzer":"standard"

}

结果如下：

{

    "tokens": [

        {

            "token": "the",

            "start_offset": 0,

            "end_offset": 3,

            "type": "<ALPHANUM>",

            "position": 0

        },

        {

            "token": "people's",

            "start_offset": 4,

            "end_offset": 12,

            "type": "<ALPHANUM>",

            "position": 1

        },

        {

            "token": "republic",

            "start_offset": 13,

            "end_offset": 21,

            "type": "<ALPHANUM>",

            "position": 2

        },

        {

            "token": "of",

            "start_offset": 22,

            "end_offset": 24,

            "type": "<ALPHANUM>",

            "position": 3

        },

        {

            "token": "china",

            "start_offset": 25,

            "end_offset": 30,

            "type": "<ALPHANUM>",

            "position": 4

        }

    ]

}

对中文的分词：

POST http://192.168.197.100:9200/_analyze

{

  "text":"中华人民共和国万岁",

  "analyzer":"standard"

}

结果如下：

{

    "tokens": [

        {

            "token": "中",

            "start_offset": 0,

            "end_offset": 1,

            "type": "<IDEOGRAPHIC>",

            "position": 0

        },

        {

            "token": "华",

            "start_offset": 1,

            "end_offset": 2,

            "type": "<IDEOGRAPHIC>",

            "position": 1

        },

        {

            "token": "人",

            "start_offset": 2,

            "end_offset": 3,

            "type": "<IDEOGRAPHIC>",

            "position": 2

        },

        {

            "token": "民",

            "start_offset": 3,

            "end_offset": 4,

            "type": "<IDEOGRAPHIC>",

            "position": 3

        },

        {

            "token": "共",

            "start_offset": 4,

            "end_offset": 5,

            "type": "<IDEOGRAPHIC>",

            "position": 4

        },

        {

            "token": "和",

            "start_offset": 5,

            "end_offset": 6,

            "type": "<IDEOGRAPHIC>",

            "position": 5

        },

        {

            "token": "国",

            "start_offset": 6,

            "end_offset": 7,

            "type": "<IDEOGRAPHIC>",

            "position": 6

        },

        {

            "token": "万",

            "start_offset": 7,

            "end_offset": 8,

            "type": "<IDEOGRAPHIC>",

            "position": 7

        },

        {

            "token": "岁",

            "start_offset": 8,

            "end_offset": 9,

            "type": "<IDEOGRAPHIC>",

            "position": 8

        }

    ]

}

ik分词器是支持对中文进行词语分割的，其有两个分词器，分别是ik_smart和ik_max_word。

（1）ik_smart：对中文进行最大粒度的划分，简略划分

例如：

POST http://192.168.197.100:9200/_analyze

{

  "text":"中华人民共和国万岁",

  "analyzer":"ik_smart"

}

结果如下：

{

    "tokens": [

        {

            "token": "中华人民共和国",

            "start_offset": 0,

            "end_offset": 7,

            "type": "CN_WORD",

            "position": 0

        },

        {

            "token": "万岁",

            "start_offset": 7,

            "end_offset": 9,

            "type": "CN_WORD",

            "position": 1

        }

    ]

}

（2）ik_max_word：对中文进行最小粒度的划分，将文本划分尽量多的词语

例如：

POST http://192.168.197.100:9200/_analyze

{

  "text":"中华人民共和国万岁",

  "analyzer":"ik_max_word"

}

结果如下：

{

    "tokens": [

        {

            "token": "中华人民共和国",

            "start_offset": 0,

            "end_offset": 7,

            "type": "CN_WORD",

            "position": 0

        },

        {

            "token": "中华人民",

            "start_offset": 0,

            "end_offset": 4,

            "type": "CN_WORD",

            "position": 1

        },

        {

            "token": "中华",

            "start_offset": 0,

            "end_offset": 2,

            "type": "CN_WORD",

            "position": 2

        },

        {

            "token": "华人",

            "start_offset": 1,

            "end_offset": 3,

            "type": "CN_WORD",

            "position": 3

        },

        {

            "token": "人民共和国",

            "start_offset": 2,

            "end_offset": 7,

            "type": "CN_WORD",

            "position": 4

        },

        {

            "token": "人民",

            "start_offset": 2,

            "end_offset": 4,

            "type": "CN_WORD",

            "position": 5

        },

        {

            "token": "共和国",

            "start_offset": 4,

            "end_offset": 7,

            "type": "CN_WORD",

            "position": 6

        },

        {

            "token": "共和",

            "start_offset": 4,

            "end_offset": 6,

            "type": "CN_WORD",

            "position": 7

        },

        {

            "token": "国",

            "start_offset": 6,

            "end_offset": 7,

            "type": "CN_CHAR",

            "position": 8

        },

        {

            "token": "万岁",

            "start_offset": 7,

            "end_offset": 9,

            "type": "CN_WORD",

            "position": 9

        },

        {

            "token": "万",

            "start_offset": 7,

            "end_offset": 8,

            "type": "TYPE_CNUM",

            "position": 10

        },

        {

            "token": "岁",

            "start_offset": 8,

            "end_offset": 9,

            "type": "COUNT",

            "position": 11

        }

    ]

}

ik分词器对英文的分词：

POST http://192.168.197.100:9200/_analyze

{

  "text":"the People's Republic of China",

  "analyzer":"ik_smart"

}

结果如下：会将不重要的词去掉，但standard分词器会保留（英语水平已经退化到a an the都不知道是属于什么类型的词了，身为中国人，这个不能骄傲）

{

    "tokens": [

        {

            "token": "people",

            "start_offset": 4,

            "end_offset": 10,

            "type": "ENGLISH",

            "position": 0

        },

        {

            "token": "s",

            "start_offset": 11,

            "end_offset": 12,

            "type": "ENGLISH",

            "position": 1

        },

        {

            "token": "republic",

            "start_offset": 13,

            "end_offset": 21,

            "type": "ENGLISH",

            "position": 2

        },

        {

            "token": "china",

            "start_offset": 25,

            "end_offset": 30,

            "type": "ENGLISH",

            "position": 3

        }

    ]

}

五、添加文档

可以任意添加字段

//1是“_id”的值，唯一的，也可以随机生成

POST http://192.168.197.100:9200/student/_doc/1

{

  "id":1,

  "name":"tom",

  "age":20,

  "gender":"male",

  "grade":"7",

  "class":"1"

}

六、更新文档

POST http://192.168.197.100:9200/student/_doc/1/_update

{

  "doc":{

    "name":"jack"

  }

}

七、删除文档

//1是“_id”的值

DELETE http://192.168.197.100:9200/student/_doc/1

上述就是简略的对es进行索引创建，修改，删除，文档添加，删除，修改等操作，为避免篇幅太长，文档查询操作将在下篇进行更新。

你还不会ES的CUD吗？的更多相关文章

OpenGL ES 正反面设置指令
在OpenGL ES 中,仅有一种表面网格表示方式,那就是三角形. 三角形的三个顶点,可以组几个面?有答 1 的没有?有!那就是还不懂OpenGL ES 的我. 事实上,一张纸是有正反面的,那么一个三 ...
2017 ES GZ Meetup分享：Data Warehouse with ElasticSearch in Datastory
以下是我在2017 ES 广州 meetup的分享 ppt:https://elasticsearch.cn/slides/11#page=22 摘要 ES最多使用的场景是搜索和日志分析,然而ES强大 ...
让node支持es模块化(export、import)的方法
node版本v7.9.0,支持了大部分es6的功能,但还不支持es6模块化(export.import). 检测ES6 可以使用es-checker来检测当前Node.js对ES6的支持情况. 使用命 ...
第3章 ES文档和故障处理
第3章 ES文档和故障处理一.ES网络配置表 ES网络配置表是ES的硬件和软件组成的列表.ES网络配置常包括以下项目: 分级项目杂项信息系统名.系统厂商/型号.CPU速率.RAM.存储器.系统 ...
ES内存持续上升问题定位
https://discuss.elastic.co/t/memory-usage-of-the-machine-with-es-is-continuously-increasing/23537/ ...
公司ES升级带来的坑怎么填？
前言公司的ES最近需要全部进行升级,目的是方便维护和统一管理.以前的版本不统一,这次准备统一升级到一个固定的版本. 同时还会给ES加上权限控制,虽然都是部署在内网,为了防止误操作,加上权限还是有必要 ...
Elasticsearch ES索引
ES是一个基于RESTful web接口并且构建在Apache Lucene之上的开源分布式搜索引擎. 同时ES还是一个分布式文档数据库,其中每个字段均可被索引,而且每个字段的数据均可被搜索,能够横向 ...
ES读写数据过程及原理
ES读写数据过程及原理倒排索引首先来了解一下什么是倒排索引倒排索引,就是建立词语与文档的对应关系(词语在什么文档出现,出现了多少次,在什么位置出现) 搜索的时候,根据搜索关键词,直接在索引中找到 ...
ES[7.6.x]学习笔记（九）搜索
搜索是ES最最核心的内容,没有之一.前面章节的内容,索引.动态映射.分词器等都是铺垫,最重要的就是最后点击搜索这一下.下面我们就看看点击搜索这一下的背后,都做了哪些事情. 分数(score) ES的搜 ...

随机推荐

jQuery 筛选方法
前言在jQuery中所有的东西全部都包含在jQuery对象中,并没有单独的DOM元素这一说法. 要想获取单独的DOM元素请用[index]获取,下面介绍的所有方法都会返回新的jQuery对象,而不是 ...
操作系统-I/O（5）I/O软件的层次结构
IO软件的设计目标: (1)高效率:改善设备效率,尤其是磁盘I/O操作的效率 (2)通用性:用统一的标准来管理所有设备 IO软件的设计思路: 把软件组织成层次结构,低层软件用来屏蔽硬件细节,高层软件向 ...
第七天Scrum冲刺博客
1.会议照片 2.项目进展团队成员昨日计划任务今日计划任务梁天龙学习课程页面建议页面黄岳康定义个人课程登陆页面吴哲翰完成页面的与后端的沟通交流继续保持确认功能齐全 ...
dd 命令切割合并文件
dd 命令切割合并文件 /tmp # dd if=a.bin of=c.bin bs=128k skip=18 //一个块为128K,跳过前18块. 18+1 records in 18+1 reco ...
四维dp,传纸条，方格取数
四维dp例题四维dp便是维护4个状态的dp方式拿题来说吧. 1. 洛谷P1004 方格取数 #include<iostream> #include<cstdio> usin ...
力扣Leetcode 11. 盛最多水的容器
盛最多水的容器给你 n 个非负整数 a1,a2,...,an,每个数代表坐标中的一个点 (i, ai) .在坐标内画 n 条垂直线,垂直线 i 的两个端点分别为 (i, ai) 和 (i, 0).找 ...
k8s部署mysql主从复制
Mysql主从准备环境一,准备软件官方docker_image :Mysql5.7.28 Docker Version: 19.03.4 K8s api-version: ...
51,N皇后
from typing import List# 这道题还是比较经典的深搜递归调用的问题.# 只需要保证二维列表的每一行,每一列,每一对角线只有一个皇后就好了.class Solution: def ...
ugui 自定义字体
Unity/UI —— 使用字符图片自定义字体(Custom Font) ---[佳] https://blog.csdn.net/qq_28849871/article/details/777190 ...
amd、cmd、CommonJS以及ES6模块化
AMD.CMD.CommonJs.ES6的对比他们都是用于在模块化定义中使用的,AMD.CMD.CommonJs是ES5中提供的模块化编程的方案,import/export是ES6中定义新增的什么 ...

你还不会ES的CUD吗？

你还不会ES的CUD吗？的更多相关文章

随机推荐

热门专题