ElasticSearch 7.x 学习

ElasticSearch 7.x

ElasticSearch 7.x

一、前言

ElasticSearch 是一个基于 Lunece 的开源搜索引擎。Lunece 是目前为止最先进、性能最好、功能最全的搜索引擎库。

ElasticSearch 用 RESTful API 来隐藏 Lunece 的复杂性。

ES 创始人：Shay Banon，项目起源是老婆想弄一个搜索菜谱的软件。

Spring 项目的创始人：Rod Jhonson。ES 项目早期的投资人。

doug cutting：是 Apache Lunece 的作者，也是 hadoop 的创始人之一。

Lucene 是一套信息检索工具包！ jar 包！不包含搜索引擎系统！

Lucene 和 ElasticSearch 的关系：

ElasticSearch 是基于 Lucune 做了一些封装和增强。

1.1、正向索引和倒排索引

正向索引和倒排索引是在搜索领域中非常重要的名词。

1.txt

我是小明，我喜欢看剧和打球

2.txt

我是小张，我喜欢看剧和玩游戏

正向索引和倒排索引在使用时都会进行分词处理，什么是分词处理呢？

分词处理：将一句话拆成中文语义的一个个词。

1.1.1、正向索引

如果使用的是正向索引，经过分词处理后的两个文件内容是：

1.txt

我 是 小明 我 喜欢 看 看剧 和 打 打球

2.txt

我 是 小张 我 喜欢 看 看剧 和 玩游戏 游戏

然后如果我们查询条件是喜欢，那么会从文章开头开始找喜欢，找到就将文档加入结果集，如果文本内容很多，会耗费大量时间。

1.1.2、倒排索引

如果使用的是倒排索引，经过分词处理后，会按照分词和文档进行映射：

关键词	文档
我	1.txt, 2.txt
是	1.txt, 2.txt
小明	1.txt
小张	2.txt
喜欢	1.txt, 2.txt
看	1.txt, 2.txt
看剧	1.txt, 2.txt
和	1.txt, 2.txt
打	1.txt
打球	1.txt
玩游戏	2.txt
游戏	2.txt

然后搜索时先找关键字，比如搜索喜欢，直接在关键词找到喜欢这个词。然后找到对应的文档。

二、安装

ElasticSearch：分布式搜索引擎

Kibana：ElasticSearch 可视化界面

Logstash：对数据进行采集、过滤、和输出。

movielens 数据集：https://grouplens.org/datasets/movielens/

导入数据：

三、ES 基本概念

3.1、索引

某一类文档的集合，可以类比一个数据库（或者数据库表）

3.2、文档

具体的一条数据。

{
        "_index" : "movies",		// 索引名
        "_type" : "_doc",			// 类型
        "_id" : "42015",			// id
        "_score" : 1.0,
        "_source" : {				// 存放具体的数据
          "id" : "42015",
          "genre" : [
            "Action",
            "Adventure",
            "Comedy",
            "Drama",
            "Romance"
          ],
          "title" : "Casanova",
          "year" : 2005,
          "@version" : "1"
        }
      }

3.4、mapping

mapping 是 ES 每一个文档的约束信息，例如属性的类型，是否能被索引等。

3.5、DSL

DSL 是 ES 的查询语言。

3.6 传统关系型数据库和 ES 的对比

关系型数据库与 ES 进行对比。

DBMS	ElasticSearch
database	Index
table	type(在7.0之后都是固定值_doc)
Row	Docment
Column	Field
Schema	Mapping
SQL	DSL

一个索引想像成一个数据库，一个数据库只有一张表 _doc，一张索引有多个文档（记录），一个文档有多个字段（列）。

四

4.1、基本 CRUD

查看所有索引：

GET _cat/indices

查看某个索引的数据：

# GET 索引名/_search
GET movies/_search

查看某个索引下有多少条数据：

# 语法：GET 索引名/_count
GET movies/_count

查看指定 id 的文档的数据：

# 语法：GET 索引名/_doc/ID
GET movies/_doc/38701

添加一个文档：

POST 索引名/_doc/文档ID
{
   "key1": "value1",
   "key2": "value2"
   ...
}

添加一个文档，指定 ID，如果 ID 已经有了会报错，没有会创建一个文档：

POST 索引名/_create/文档ID
{
   "key1": "value1",
   "key2": "value2"
   ...
}

修改一个文档结构：

POST 索引名/_update/文档ID
{
  "doc": {
 	"key1": "value1",
   	"key2": "value2"
   	...
  }
}

删除指定文档 ID 的文档：

DELETE 索引名/_doc/文档ID

删除索引：

DELETE 索引名

批量插入:

POST user/_bulk
{"index": {"_id": "1"}}
{"firstname": "li"}
{"index": {"_id": "2"}}
{"firstname": "ya"}
{"index": {}}
{"firstname": "ya"}

查询，数组返回

GET _mget
{
  "docs": [
    {"_index": "user", "_id": "1"},
    {"_index": "user", "_id": "2"},
    {"_index": "user", "_id": "kgu6mHYB-tSK-znPeUte"}
    ]
}

4.2、ES 的 URI 查询

查询内容有 2012 的文档：

GET movies/_search?q=2012

查询 title 属性有 2012 的文档：

GET movies/_search?q=2012&df=title

上面可以简写为：

GET movies/_search?q=title:2012

包含 beautiful 或 mind 的：

GET movies/_search?q=title:beautiful mind

只包含 beautiful 不包含 mind：

GET movies/_search?q=title:(beautiful -mind)

既包含 beautiful 又包含 mind:

GET movies/_search?q=title:(beautiful AND mind)

AND 必须大写，小写会当成一个条件。

获取一个短语是 beautiful mind 的：

GET movies/_search?q=title:"beautiful mind"

分页,from=从第几条开始，从第一条开始 from=0，size=总共要查出几条数据

GET movies/_search?q=title:2012&from=1&size=3

范围查询：

# 语法：GET 索引名/_search?q=属性名:比较运算符 条件
GET movies/_search?q=year:>=2016

查询电影名字包含 beautiful 或 mind，并且上映的年份在 [1990, 1992] 的所有电影

GET movies/_search?q=year:(>=1990 AND <=1992) AND title:beautiful mind

查询年份大于 2011 小于等于 2013，后边的括号只能是中括号：

GET movies/_search?q=year:{2011 TO 2013]

查看 title 为 min加上一个随机的字符的数据：

GET movies/_search?q=title:min?

？可以代表一个字符。

查看 title 为 min加上一个或多个随机的字符的数据：

GET movies/_search?q=title:mi*

可以代表多个字符。

五、Analysis

Analysis 只是一个概念，就是文本分析，将全文本转换为一系列单词的过程，也叫分词。Analysis 是通过 analyzer（分词器）来实现的，ES 内部有很多定义好的分词器，也可以使用自定义的分词器。

除了在数据写入的时候将词条进行转换，查询的时候也会使用分析器对语句进行分析。

Aynalysis 是由三部分组成：

<p>Hello a World, the world is beautiful</p>

1、Character Filter：将文本中的 html 标签剔除掉。

2、Tokenizer：按照规则进行分词，在英文中按照空格分词。

3、Token Filter：去掉 stop word（停顿词，a an the is are等），然后转换为小写。

5.1、内置分词器

5.2、内置分词器使用示例

GET _analyze
{
  "analyzer":"standard",
  "text":"Then I will take care of your life"
}

结果：

{
  "tokens" : [
    {
      "token" : "then",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "i",
      "start_offset" : 5,
      "end_offset" : 6,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "will",
      "start_offset" : 7,
      "end_offset" : 11,
      "type" : "<ALPHANUM>",
      "position" : 2
    },
    {
      "token" : "take",
      "start_offset" : 12,
      "end_offset" : 16,
      "type" : "<ALPHANUM>",
      "position" : 3
    },
    {
      "token" : "care",
      "start_offset" : 17,
      "end_offset" : 21,
      "type" : "<ALPHANUM>",
      "position" : 4
    },
    {
      "token" : "of",
      "start_offset" : 22,
      "end_offset" : 24,
      "type" : "<ALPHANUM>",
      "position" : 5
    },
    {
      "token" : "your",
      "start_offset" : 25,
      "end_offset" : 29,
      "type" : "<ALPHANUM>",
      "position" : 6
    },
    {
      "token" : "life",
      "start_offset" : 30,
      "end_offset" : 34,
      "type" : "<ALPHANUM>",
      "position" : 7
    }
  ]
}

六、ResquestBody 深入探索

ES 是进行不了复杂的查询的，所以有了 ResquestBody 查询。

GET movies/_search
{
  "query": {
    "match": {
      "title": "beautiful mind"
    }
  }
}

在说 Analysis 的时候，说不仅仅在输入的时候会做分词处理，查询的时候也会。在查询的时候会将 beautiful mind 按照 Ananlysis 的三步分词处理的方式，最后会拆成``beautiful 和mind，然后进行查询，所以只要结果中包含 beautiful或mind` 就可以。

6.1、term 查询

1、term 和 terms（不进行分词处理，完全匹配）

term 查询不会对输入进行分词处理。match 会做分词处理。

title 包含 beanutiful 的文档：

GET movies/_search
{
  "query": {
    "term": {
      "title":
          "beautiful"
    }
  }
}

title 包含 beanutiful 或 mind 的文档：

GET movies/_search
{
  "query": {
    "terms": {
      "title": [
          "beautiful",
          "mind"
        ]
    }
  }
}

一个查询的单词用 term，多个用 terms。

2、range (范围查询)

查询上映在 2016 到 2018 年的所有电影，再根据上映的时间倒叙进行排序。

GET movies/_search
{
  "query": {
    "range": {
      "year":{
        "gte": 2016,
        "lte": 2018
      }
    }
  },
  "sort": [
    {
      "year": {
        "order": "desc"
      }
    }
  ]
}

查（query）什么，查的是范围（range），查什么范围，年份（year），结果按照顺序（order），按照倒叙（desc）。

gte：大于等于。

lte：小于等于。

复合查询：match 和 range 同时使用。

GET movies/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "title": "beautiful mind"
          }
        },
        {
          "range": {
            "year": {
              "gte": 1990,
              "lte": 1992
            }
          } 
        }
      ]
    }
  }
}

3、Constant Score（不进行相关性算分，查询的结果会进行缓存）

不会进行相关性算分，会节省大量时间。查询的结果会进行缓存。Constant Score 只能用 term 查询。

查询 title 中包含 beautiful 的所有电影

GET movies/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "title":  "beautiful"
        }
      }
    }
  }
}

会发现查询结果的 _score:1.0。没有进行相关性算分。

"hits" : [
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "66701",
        "_score" : 1.0,
        "_source" : {
          "id" : "66701",
          "genre" : [
            "Comedy",
            "Drama"
          ],
          "title" : "Beautiful Ohio",
          "year" : 2006,
          "@version" : "1"
        }
      },
      ]

6.2、全文查询

全文查询的种类有: Match Query、Match Phrase Query、Query String Query 等

6.2.1、match（完全匹配）

查询包含输入条件的内容。

查询电影名包含 beautiful 的电影，每页 10 条，取第 2 页的数据。

GET movies/_search
{
  "query": {
    "match": {
      "title": "beautiful"
    }
  },
  "from": 10,
  "size": 10
}

查询电影名包含 beautiful 的电影,但是查询的结果只显示 id 和 title

GET movies/_search
{
  "_source": ["title", "id"],
  "query": {
    "match": {
      "title": "beautiful"
    }
  }
}

只想查部分属性，使用 _source。

6.2.2、match_phrase（短语完全匹配）

匹配短语，整个输入都做一个整体查询条件。

GET movies/_search
{
  "query": {
    "match_phrase": {
      "title": "beautiful mind"
    }
  }
}

查出来的结果：

{
  "took" : 15,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 13.509613,
    "hits" : [
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "4995",
        "_score" : 13.509613,
        "_source" : {
          "id" : "4995",
          "genre" : [
            "Drama",
            "Romance"
          ],
          "title" : "Beautiful Mind, A",
          "year" : 2001,
          "@version" : "1"
        }
      }
    ]
  }
}

6.2.3、 multi_match（条件与多结果完全匹配）

查询的条件和多个结果匹配的的结果。


GET movies/_search
{
  "query": {
    "multi_match": {
      "query": "beautiful",
      "fields": ["title", "genre"]
    }
  }
}

6.2.4、match_all（查询所有）

查询所有的数据。

GET movies/_search
{
  "query": {
    "match_all": {
    }
  }
}

这个就相当于：

GET movies/_search

6.2.5、query_string（AND 或 OR 条件组合）

条件用 AND 组合时可以用这个， OR 也可以。

GET movies/_search
{
  "query": {
    "query_string": {
      "default_field": "title",
      "query": "beautiful OR mind"
    }
  }
}
GET movies/_search
{
  "query": {
    "query_string": {
      "default_field": "title",
      "query": "beautiful AND mind"
    }
  }
}

AND 和 OR 一定要大写，不然会被当作查询条件进行查询。

6.2.6、simple_query_string

GET movies/_search
{
  "query": {
    "simple_query_string": {
      "query": "beautiful+ -mind",
      "fields": ["title"]
    }
  }
}

6.3、模糊查询-fuzzy

关键字：fuzzy。

GET movies/_search
{
  "query": {
    "fuzzy": {
      "title": {
        "value": "neverendogn",
        "fuzziness": 2
      }
    }
  }
}

fuzziness 的值只能为大于等于0，小于等于2，有小数点。值为 0 时，必须是完全匹配，值为 1 时，可以有 1 个字母和搜索中的不一样，值为 2 时，可以有 2 个字母和搜索中的不一样。

6.4、多条件查询

1、must

多个条件同时满足。会进行相关性算分。

GET movies/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "title": "beautiful"
          }
        },
        {
           "range": {
            "year": {
              "gte": 1990,
              "lte": 2000
            }
          }
        }
      ]
    }
  }
}

语法是：

GET 索引名/_search
{
  "query": {
    "bool": {
      "must": [
        {
		  条件查询1
        },
        {
           条件查询2
        }
        ...
      ]
    }
  }
}

2、must_not

多个条件同时不满足。不会进行相关性算分。

GET movies/_search
{
  "query": {
    "bool": {
     "must_not": [
        {
          "match": {
            "title": "mind"
          }
        }
      ]
    }
  }
}

语法是：

GET 索引名/_search
{
  "query": {
    "bool": {
      "must_not": [
        {
		  条件查询1
        },
        {
           条件查询2
        }
        ...
      ]
    }
  }
}

{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : 0.0,
    "hits" : [
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "42015",
        "_score" : 0.0,
        "_source" : {
          "id" : "42015",
          "genre" : [
            "Action",
            "Adventure",
            "Comedy",
            "Drama",
            "Romance"
          ],
          "title" : "Casanova",
          "year" : 2005,
          "@version" : "1"
        }
      },
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "42018",
        "_score" : 0.0,
        "_source" : {
          "id" : "42018",
          "genre" : [
            "Comedy",
            "Drama"
          ],
          "title" : "Mrs. Henderson Presents",
          "year" : 2005,
          "@version" : "1"
        }
      }
    ]
  }
}

"_score" : 0.0 都是一样的，没有进行相关性算分。

3、filter

多个条件同时满足。不会进行相关性算分。

GET movies/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "match": {
            "title": "beautiful"
          }
        },
        {
           "range": {
            "year": {
              "gte": 1990,
              "lte": 2000
            }
          }
        }
      ]
    }
  }
}

"_score" : 0.0 ，都是一样的，和 must 的功能不一样，都是多条件同时满足，但是不会进行相关性算分。

4、should

或者的关系，会对查询的结果进行相关性算分。

GET movies/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "title": "beautiful"
          }
        },
        {
          "match": {
            "title": "mind"
          }
        }
      ]
    }
  }
}

语法是：

GET 索引名/_search
{
  "query": {
    "bool": {
      "should": [
        {
          查询条件1
        },
        {
          查询条件2
        }
      ]
    }
  }
}

七、Mapping

Mapping 的作用：

定义字段的数据类型。
设置该字段是否可以被倒排索引。

7.1、数据类型

类型名	描述名
Text/Keyword	字符串，默认的类型
Date	日期类型
Integer/Float/Long	数字类型
Boolean	布尔类型

Text 和 KeyWord：是否分词。https://www.cnblogs.com/qlqwjy/p/13462750.html

7.2、定义 Mapping

语法如下：

PUT users
{
  "mappings": {
    	// 在这里定义 mapping
  }
}

定义 mapping 的建议方式：推荐导入样本文档到临时索引，ES 会自动生成 mapping 信息，用 api 查询 mapping 的定义，根据自己的需求修改这份 mapping 信息，然后创建索引，删除临时索引。

ES 的 mapping 设置好后，是不可以修改的，所以不像关系型数据库那样，可以直接修改，所以只能这样从左。

示例

添加一个索引一个文档：

PUT users/_doc/1
{
  "name": "Jack",
  "age": 18,
  "height": 180.0,
  "isRich": "true"
}

索引的 mapping 会自动生成，我们看一下自动生成的索引，使用命令查询：

GET users/_mapping

查询结果：

{
  "users" : {
    "mappings" : {
      "properties" : {
        "age" : {
          "type" : "long"
        },
        "height" : {
          "type" : "float"
        },
        "isRich" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "name" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

把 mappings 后面内容复制一下：

"mappings" : {
      "properties" : {
        "age" : {
          "type" : "long"
        },
        "height" : {
          "type" : "float"
        },
        "isRich" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "name" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }

去掉 fields 这些：

"mappings" : {
      "properties" : {
        "age" : {
          "type" : "long"
        },
        "height" : {
          "type" : "float"
        },
        "isRich" : {
          "type" : "text"
        },
        "name" : {
          "type" : "text"
        }
      }
    }

将 isRich 的类型改为 boolean，加上 PUT users

PUT users
{
  "mappings" : {
      "properties" : {
        "age" : {
          "type" : "long"
        },
        "height" : {
          "type" : "float"
        },
        "isRich" : {
          "type" : "boolean"
        },
        "name" : {
          "type" : "text"
        }
      }
    }
}

然后删除索引

DELETE users

执行上面的创建 mapping 语句，再重新建立索引添加一条文档：

PUT users
{
  "mappings" : {
      "properties" : {
        "age" : {
          "type" : "long"
        },
        "height" : {
          "type" : "float"
        },
        "isRich" : {
          "type" : "boolean"
        },
        "name" : {
          "type" : "text"
        }
      }
    }
}

PUT users/_doc/1
{
  "name": "Jack",
  "age": 18,
  "height": 180.0,
  "isRich": "true"
}

查看现在的索引 mapping：

GET users/_mapping

查询结果：

{
  "users" : {
    "mappings" : {
      "properties" : {
        "age" : {
          "type" : "long"
        },
        "height" : {
          "type" : "float"
        },
        "isRich" : {
          "type" : "boolean"
        },
        "name" : {
          "type" : "text"
        }
      }
    }
  }
}

发现每个字段所对应的类型是我们提前定义好的类型。

搜索时匹配

7.3、常见参数

7.3.1、index

可以给属性添加一个布尔类型的 index 属性，可以控制属性是否被倒排索引。

PUT test
{
  "mappings" : {
      "properties" : {
        "name" : {
          "type" : "text",
          "index": false
        }
      }
    }
}

为 false 的时候不可以进行倒排索引。

插入两条数据

POST test/_doc/1
{
  "name": "zhangsan"
}
POST test/_doc/2
{
  "name": "xiaoqi"
}

查询：

GET test/_search
{
  "query": {
    "match": {
      "name": "zhangsan"
    }
  }
}

会报错。

{
  "error" : {
    "root_cause" : [
      {
        "type" : "query_shard_exception",
        "reason" : "failed to create query: Cannot search on field [name] since it is not indexed.",
        "index_uuid" : "YCjgmE81QkSHLohZdY2JBw",
        "index" : "test"
      }
    ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [
      {
        "shard" : 0,
        "index" : "test",
        "node" : "zTWkfiA2RG2GUVGMgRnuhQ",
        "reason" : {
          "type" : "query_shard_exception",
          "reason" : "failed to create query: Cannot search on field [name] since it is not indexed.",
          "index_uuid" : "YCjgmE81QkSHLohZdY2JBw",
          "index" : "test",
          "caused_by" : {
            "type" : "illegal_argument_exception",
            "reason" : "Cannot search on field [name] since it is not indexed."
          }
        }
      }
    ]
  },
  "status" : 400
}

7.3.2、null_value

字段的值如果是 null，是无法进行搜索的，这个时候我们可以使用 null_value，null_value 会将属性转化为 null_value 的值。

使用 null_value 时，只能用 keyword 类型。

示例

添加数据：

POST test/_doc/1
{
  "name": "zhangsan"
}
POST test/_doc/2
{
  "name": "xiaoqi"
}
POST test/_doc/3
{
  "name": null
}

搜索值为 null :

GET test/_search
{
  "query": {
    "match": {
      "name": null
    }
  }
}

运行结果会报错：

{
  "error" : {
    "root_cause" : [
      {
        "type" : "parsing_exception",
        "reason" : "No text specified for text query",
        "line" : 5,
        "col" : 5
      }
    ],
    "type" : "parsing_exception",
    "reason" : "No text specified for text query",
    "line" : 5,
    "col" : 5
  },
  "status" : 400
}

我们可以使用 null_value:

PUT test
{
  "mappings" : {
      "properties" : {
        "name" : {
          "type" : "keyword",
          "null_value": "null"
        }
      }
    }
}

然后插入数据：

POST test/_doc/1
{
  "name": "zhangsan"
}
POST test/_doc/2
{
  "name": "xiaoqi"
}
POST test/_doc/3
{
  "name": null
}

然后查询：

GET test/_search
{
  "query": {
    "match": {
      "name": "null"
    }
  }
}

查询结果：

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.9808291,
    "hits" : [
      {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.9808291,
        "_source" : {
          "name" : null
        }
      }
    ]
  }
}

八、ES 聚合查询

语法格式：

GET indexName/_search {
"aggs": {
	"聚合名称（自定义的）": {
    	"聚合类型（ES 提供的）": {
			"field": "要做聚合的字段名"
        }
    }
}

创建索引类型，新增文档：

PUT employee
{
    "mappings": {
        "properties": {
            "id": {
            "type": "integer" },
            "name": {
            "type": "keyword" },
            "job": {
            "type": "keyword" },
            "age": {
            "type": "integer" },
            "gender": {
            "type": "keyword" }
        }
    }
} 
PUT employee/_bulk
{"index": {"_id": 1}}
{"id": 1, "name": "Bob", "job": "java", "age": 21, "sal": 8000, "gender": "female"}
{"index": {"_id": 2}}
{"id": 2, "name": "Rod", "job": "html", "age": 31, "sal": 18000, "gender": "female"}
{"index": {"_id": 3}}
{"id": 3, "name": "Gaving", "job": "java", "age": 24, "sal": 12000, "gender": "male"}
{"index": {"_id": 4}}
{"id": 4, "name": "King", "job": "dba", "age": 26, "sal": 15000, "gender": "female"}
{"index": {"_id": 5}}
{"id": 5, "name": "Jonhson", "job": "dba", "age": 29, "sal": 16000, "gender": "male"}
{"index": {"_id": 6}}
{"id": 6, "name": "Douge", "job": "java", "age": 41, "sal": 20000, "gender": "female"}
{"index": {"_id": 7}}
{"id": 7, "name": "cutting", "job": "dba", "age": 27, "sal": 7000, "gender": "male"}
{"index": {"_id": 8}}
{"id": 8, "name": "Bona", "job": "html", "age": 22, "sal": 14000, "gender": "female"}
{"index": {"_id": 9}}
{"id": 9, "name": "Shyon", "job": "dba", "age": 20, "sal": 19000, "gender": "female"}
{"index": {"_id": 10}}
{"id": 10, "name": "James", "job": "html", "age": 18, "sal": 22000, "gender": "male"}
{"index": {"_id": 11}}
{"id": 11, "name": "Golsling", "job": "java", "age": 32, "sal": 23000, "gender": "female"}
{"index": {"_id": 12}}
{"id": 12, "name": "Lily", "job": "java", "age": 24, "sal": 2000, "gender": "male"}
{"index": {"_id": 13}}
{"id": 13, "name": "Jack", "job": "html", "age": 23, "sal": 3000, "gender": "female"}
{"index": {"_id": 14}}
{"id": 14, "name": "Rose", "job": "java", "age": 36, "sal": 6000, "gender": "female"}
{"index": {"_id": 15}}
{"id": 15, "name": "Will", "job": "dba", "age": 38, "sal": 4500, "gender": "male"}
{"index": {"_id": 16}}
{"id": 16, "name": "smith", "job": "java", "age": 32, "sal": 23000, "gender": "male"}

8.1、单值输出

8.1.1、总和

查员工工资的总和

# 查询工资的总和
GET employee/_search
{
  "size": 0,
  "aggs": {
  	"sum_sal": {
      	"sum": {
      	  "field": "sal"
          }
      }
    }
}

8.1.2、平均

查询员工的平均工资

GET employee/_search
{
  "size": 0,
  "aggs": {
    "other_avg": {
      "avg": {
        "field": "sal"
      }
    }
  }
}

8.1.3、总数

查询总共有多少岗位

# 查询总共有多少岗位
GET employee/_search
{
  "size": 0,
  "aggs": {
    "jon_sum": {
      "cardinality": {
        "field": "job"
      }
    }
  }
}

8.1.4、最大值最小值

查询工资的最大值、最小值

GET employee/_search
{
  "size": 0,
  "aggs": {
    "sal_max": {
      "max": {
        "field": "sal"
      }
    }
  }
}
GET employee/_search
{
  "size": 0,
  "aggs": {
    "sal_min": {
      "min": {
        "field": "sal"
      }
    }
  }
}

8.2、多值的输出

8.2.1、查询信息(stats)

信息包括：总数量、最小值、最大值、平均值、总和。stats 使用时属性只能是类型。

查询员工工资信息

# 查询员工工资信息
GET employee/_search
{
  "size": 0,
  "aggs": {
    "sal_info": {
      "stats": {
        "field": "sal"
      }
    }
  }
}

运行结果：

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 16,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "sal_info" : {
      "count" : 16,
      "min" : 2000.0,
      "max" : 23000.0,
      "avg" : 13281.25,
      "sum" : 212500.0
    }
  }
}

8.2.2、分组查询（terms）

查询到达不同国家的航班数量

GET kibana_sample_data_flights/_search
{
  "size": 0,
  "aggs": {
    "dest_country_info": {
      "terms": {
        "field": "DestCountry"
      }
    }
  }
}

查询结果：

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "dest_country_info" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 3187,
      "buckets" : [
        {
          "key" : "IT",
          "doc_count" : 2371
        },
        {
          "key" : "US",
          "doc_count" : 1987
        },
        {
          "key" : "CN",
          "doc_count" : 1096
        },
        {
          "key" : "CA",
          "doc_count" : 944
        },
        {
          "key" : "JP",
          "doc_count" : 774
        },
        {
          "key" : "RU",
          "doc_count" : 739
        },
        {
          "key" : "CH",
          "doc_count" : 691
        },
        {
          "key" : "GB",
          "doc_count" : 449
        },
        {
          "key" : "AU",
          "doc_count" : 416
        },
        {
          "key" : "PL",
          "doc_count" : 405
        }
      ]
    }
  }
}

8.2.3、子查询

查询不同目的地航班次数以及不同目的地不同天气的统计信息

# 查询不同目的地航班次数以及不同目的地天气的统计信息
GET kibana_sample_data_flights/_search
{
  "size": 0,
  "aggs": {
    "dest_country_info": {
      "terms": {
        "field": "DestCountry"
      },
      "aggs": {
        "dest_country_weather_info": {
          "terms": {
            "field": "DestWeather"
          }
        }
      }
    }
  }
}

运行结果：

{
  "took" : 28,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "dest_country_info" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 3187,
      "buckets" : [
        {
          "key" : "IT",
          "doc_count" : 2371,
          "dest_country_weather_info" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "Clear",
                "doc_count" : 428
              },
              {
                "key" : "Sunny",
                "doc_count" : 424
              },
              {
                "key" : "Rain",
                "doc_count" : 417
              },
              {
                "key" : "Cloudy",
                "doc_count" : 414
              },
              {
                "key" : "Heavy Fog",
                "doc_count" : 182
              },
              {
                "key" : "Damaging Wind",
                "doc_count" : 173
              },
              {
                "key" : "Hail",
                "doc_count" : 169
              },
              {
                "key" : "Thunder & Lightning",
                "doc_count" : 164
              }
            ]
          }
        },
        {
          "key" : "US",
          "doc_count" : 1987,
          "dest_country_weather_info" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "Rain",
                "doc_count" : 371
              },
              {
                "key" : "Clear",
                "doc_count" : 346
              },
              {
                "key" : "Sunny",
                "doc_count" : 345
              },
              {
                "key" : "Cloudy",
                "doc_count" : 330
              },
              {
                "key" : "Heavy Fog",
                "doc_count" : 157
              },
              {
                "key" : "Thunder & Lightning",
                "doc_count" : 155
              },
              {
                "key" : "Hail",
                "doc_count" : 142
              },
              {
                "key" : "Damaging Wind",
                "doc_count" : 141
              }
            ]
          }
        },
        {
          "key" : "CN",
          "doc_count" : 1096,
          "dest_country_weather_info" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "Sunny",
                "doc_count" : 209
              },
              {
                "key" : "Rain",
                "doc_count" : 207
              },
              {
                "key" : "Clear",
                "doc_count" : 192
              },
              {
                "key" : "Cloudy",
                "doc_count" : 173
              },
              {
                "key" : "Thunder & Lightning",
                "doc_count" : 86
              },
              {
                "key" : "Hail",
                "doc_count" : 81
              },
              {
                "key" : "Heavy Fog",
                "doc_count" : 79
              },
              {
                "key" : "Damaging Wind",
                "doc_count" : 69
              }
            ]
          }
        },
        {
          "key" : "CA",
          "doc_count" : 944,
          "dest_country_weather_info" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "Clear",
                "doc_count" : 197
              },
              {
                "key" : "Rain",
                "doc_count" : 173
              },
              {
                "key" : "Cloudy",
                "doc_count" : 156
              },
              {
                "key" : "Sunny",
                "doc_count" : 148
              },
              {
                "key" : "Damaging Wind",
                "doc_count" : 80
              },
              {
                "key" : "Thunder & Lightning",
                "doc_count" : 69
              },
              {
                "key" : "Heavy Fog",
                "doc_count" : 62
              },
              {
                "key" : "Hail",
                "doc_count" : 59
              }
            ]
          }
        },
        {
          "key" : "JP",
          "doc_count" : 774,
          "dest_country_weather_info" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "Rain",
                "doc_count" : 152
              },
              {
                "key" : "Sunny",
                "doc_count" : 138
              },
              {
                "key" : "Clear",
                "doc_count" : 130
              },
              {
                "key" : "Cloudy",
                "doc_count" : 123
              },
              {
                "key" : "Damaging Wind",
                "doc_count" : 66
              },
              {
                "key" : "Heavy Fog",
                "doc_count" : 58
              },
              {
                "key" : "Thunder & Lightning",
                "doc_count" : 57
              },
              {
                "key" : "Hail",
                "doc_count" : 50
              }
            ]
          }
        },
        {
          "key" : "RU",
          "doc_count" : 739,
          "dest_country_weather_info" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "Cloudy",
                "doc_count" : 149
              },
              {
                "key" : "Rain",
                "doc_count" : 128
              },
              {
                "key" : "Clear",
                "doc_count" : 122
              },
              {
                "key" : "Sunny",
                "doc_count" : 117
              },
              {
                "key" : "Thunder & Lightning",
                "doc_count" : 62
              },
              {
                "key" : "Hail",
                "doc_count" : 56
              },
              {
                "key" : "Damaging Wind",
                "doc_count" : 55
              },
              {
                "key" : "Heavy Fog",
                "doc_count" : 50
              }
            ]
          }
        },
        {
          "key" : "CH",
          "doc_count" : 691,
          "dest_country_weather_info" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "Cloudy",
                "doc_count" : 135
              },
              {
                "key" : "Sunny",
                "doc_count" : 134
              },
              {
                "key" : "Clear",
                "doc_count" : 128
              },
              {
                "key" : "Rain",
                "doc_count" : 115
              },
              {
                "key" : "Heavy Fog",
                "doc_count" : 51
              },
              {
                "key" : "Hail",
                "doc_count" : 46
              },
              {
                "key" : "Damaging Wind",
                "doc_count" : 41
              },
              {
                "key" : "Thunder & Lightning",
                "doc_count" : 41
              }
            ]
          }
        },
        {
          "key" : "GB",
          "doc_count" : 449,
          "dest_country_weather_info" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "Rain",
                "doc_count" : 93
              },
              {
                "key" : "Sunny",
                "doc_count" : 81
              },
              {
                "key" : "Clear",
                "doc_count" : 77
              },
              {
                "key" : "Cloudy",
                "doc_count" : 71
              },
              {
                "key" : "Heavy Fog",
                "doc_count" : 34
              },
              {
                "key" : "Hail",
                "doc_count" : 32
              },
              {
                "key" : "Damaging Wind",
                "doc_count" : 31
              },
              {
                "key" : "Thunder & Lightning",
                "doc_count" : 30
              }
            ]
          }
        },
        {
          "key" : "AU",
          "doc_count" : 416,
          "dest_country_weather_info" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "Rain",
                "doc_count" : 80
              },
              {
                "key" : "Cloudy",
                "doc_count" : 75
              },
              {
                "key" : "Clear",
                "doc_count" : 73
              },
              {
                "key" : "Sunny",
                "doc_count" : 57
              },
              {
                "key" : "Hail",
                "doc_count" : 38
              },
              {
                "key" : "Thunder & Lightning",
                "doc_count" : 34
              },
              {
                "key" : "Heavy Fog",
                "doc_count" : 32
              },
              {
                "key" : "Damaging Wind",
                "doc_count" : 27
              }
            ]
          }
        },
        {
          "key" : "PL",
          "doc_count" : 405,
          "dest_country_weather_info" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "Clear",
                "doc_count" : 74
              },
              {
                "key" : "Rain",
                "doc_count" : 71
              },
              {
                "key" : "Cloudy",
                "doc_count" : 67
              },
              {
                "key" : "Sunny",
                "doc_count" : 66
              },
              {
                "key" : "Thunder & Lightning",
                "doc_count" : 37
              },
              {
                "key" : "Damaging Wind",
                "doc_count" : 30
              },
              {
                "key" : "Hail",
                "doc_count" : 30
              },
              {
                "key" : "Heavy Fog",
                "doc_count" : 30
              }
            ]
          }
        }
      ]
    }
  }
}

一个查询中只能直接嵌套一个查询。但是可以间接嵌套多个。

嵌套查询练习

# 查询每个岗位下工资的信息（平均工资、最高工资、最少工资）
GET employee/_search
{
  "size": 0,
  "aggs": {
    "every_job": {
      "terms": {
        "field": "job"
      }
      , "aggs": {
        "sal_info": {
          "stats": {
            "field": "sal"
          }
        }
      }
    }
  }
}
# 查询不同工种男女员工的数量，然后统计不同工种下的男女员工的工资信息。
GET employee/_search
{
  "size": 0,
  "aggs": {
    "diff_job": {
      "terms": {
        "field": "job"
      }, "aggs": {
        "gender_cnt": {
          "terms": {
            "field": "gender"
          }, "aggs": {
            "employee_info": {
              "stats": {
                "field": "sal"
              }
            }
          }
        }
      }
    }
  }
}

8.2.4、限制查询（top_hits）

查询年龄最大的两位员工的信息

GET employee/_search
{
  "size": 0,
  "aggs": {
    "older_two_emp": {
      "top_hits": {
        "size": 2,
        "sort": [
          {
            "age": {
              "order": "desc"
            }
          }
          ]
      }
    }
  }
}

在一个查询结果中前几个，就用 top_hits。

8.2.5、范围查询-range

range 的区间是前闭后开的。

# 查询不同工资区间员工工资的统计信息
GET employee/_search
{
  "size": 0,
  "aggs": {
    "range_sal_info": {
      "range": {
        "field": "sal",
        "ranges": [
          {
            "key": "0 <= sal < 10001",
            "to": 10001
          },
          {
            "key": "10001 <= sal < 20001",
            "from": 10001,
            "to": 20001
          },
          {
            "key": "20001 <= sal < 30001",
            "from": 20001,
            "to": 30001
          }
        ]
      }
    }
  }
}

程序结果：

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 16,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "range_sal_info" : {
      "buckets" : [
        {
          "key" : "0 <= sal < 10001",
          "to" : 10001.0,
          "doc_count" : 6
        },
        {
          "key" : "10001 <= sal < 20001",
          "from" : 10001.0,
          "to" : 20001.0,
          "doc_count" : 7
        },
        {
          "key" : "20001 <= sal < 30001",
          "from" : 20001.0,
          "to" : 30001.0,
          "doc_count" : 3
        }
      ]
    }
  }
}

{
            "key": "20001 <= sal < 30001",
            "from": 20001,
            "to": 30001
          }

范围就是 [20001, 30001)，range 是前开后闭区间

8.2.6、查询的结果直方图显示出来-histogram

# 以直方图的方式，每 3000 元为一个区间查询员工信息
GET employee/_search
{
  "size": 0,
  "aggs": {
    "range_sal_info": {
      "histogram": {
        "field": "sal",
        "interval": 3000,
        "extended_bounds": {
          "min": 0,
          "max": 15000
        }
      }
    }
  }
}

field：字段名

interval：指定多少为一个区间

extended_bounds：有属性 max，指定最大的区间，不够会补。

运行结果：

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 16,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "range_sal_info" : {
      "buckets" : [
        {
          "key" : 0.0,
          "doc_count" : 1
        },
        {
          "key" : 3000.0,
          "doc_count" : 2
        },
        {
          "key" : 6000.0,
          "doc_count" : 3
        },
        {
          "key" : 9000.0,
          "doc_count" : 0
        },
        {
          "key" : 12000.0,
          "doc_count" : 2
        },
        {
          "key" : 15000.0,
          "doc_count" : 2
        },
        {
          "key" : 18000.0,
          "doc_count" : 3
        },
        {
          "key" : 21000.0,
          "doc_count" : 3
        }
      ]
    }
  }
}

8.2.7、结果中找最低的-min_bucket

查询平均工资最低的工种

# 查询平均工资最低的工种
GET employee/_search
{
  "size": 0,
  "aggs": {
    "job_info": {
      "terms": {
        "field": "job"
      },
      "aggs": {
        "diff_job_avg_sal": {
          "avg": {
            "field": "sal"
          }
        }
      }
    },
    "min_avg_sal_job": {
      "min_bucket": {
        "buckets_path": "job_info>diff_job_avg_sal"
      }
    }
  }
}

buckets_path：是聚合的路径，聚合名字 a>聚合名字 b...

运行结果：

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 16,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "job_info" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "java",
          "doc_count" : 7,
          "diff_job_avg_sal" : {
            "value" : 13428.57142857143
          }
        },
        {
          "key" : "dba",
          "doc_count" : 5,
          "diff_job_avg_sal" : {
            "value" : 12300.0
          }
        },
        {
          "key" : "html",
          "doc_count" : 4,
          "diff_job_avg_sal" : {
            "value" : 14250.0
          }
        }
      ]
    },
    "min_avg_sal_job" : {
      "value" : 12300.0,
      "keys" : [
        "dba"
      ]
    }
  }
}

8.3、聚合之全局过滤与局部过滤-filter

上面用的都是全局过滤。做一个局部过滤

求 30 岁以上的员工的平均工资和所有员工的平均工资

# 求 30 岁以上的员工的平均工资和所有员工的平均工资
GET employee/_search
{
  "size": 0,
  "aggs": {
    "all_emp_avg_sal": {
      "avg": {
        "field": "sal"
      }
    },
    "gt_30_emp_avg_info":{
      "filter": {
        "range": {
          "age": {
            "gte": 30
          }
        }
      },
      "aggs": {
      "gt_30_emp_avg_sal": {
        "avg": {
          "field": "sal"
        }
      }
    }
    }
  }
}

查询结果：

{
  "took" : 13,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 16,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "all_emp_avg_sal" : {
      "value" : 13281.25
    },
    "gt_30_emp_avg_info" : {
      "doc_count" : 6,
      "gt_30_emp_avg_sal" : {
        "value" : 15750.0
      }
    }
  }
}

ES 建议搜索

语法

GET 索引名/_search
{
  "suggest": {
    "自定义的名字": {
      "text": "想要查询建议搜索值",
      "term": {
        "FIELD": "字段名"
      }
    }
  }
}

查询电影中名字为 beauti 的建议搜索

GET movies/_search
{
  "suggest": {
    "title_suggest": {
      "text": "beauti",
      "term": {
        "field": "title"
      }
    }
  }
}

查询结果：

{
  "took" : 7,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "suggest" : {
    "title_suggest" : [
      {
        "text" : "beauti",
        "offset" : 0,
        "length" : 6,
        "options" : [
          {
            "text" : "beauty",
            "score" : 0.8333333,
            "freq" : 66
          },
          {
            "text" : "beati",
            "score" : 0.8,
            "freq" : 1
          },
          {
            "text" : "beasts",
            "score" : 0.6666666,
            "freq" : 9
          },
          {
            "text" : "beauties",
            "score" : 0.6666666,
            "freq" : 5
          },
          {
            "text" : "beastie",
            "score" : 0.6666666,
            "freq" : 2
          }
        ]
      }
    ]
  }
}

options 后跟的是建议的结果。注意，用上面的查询，只有在值在数据中没有时，ES 才会给出建议搜索值，否则是无法给出的。比如将 beauti 换成 beauty，结果集中有，options 中就不会显示。

GET movies/_search
{
  "suggest": {
    "title_suggest": {
      "text": "beauty",
      "term": {
        "field": "title"
      }
    }
  }
}

查询结果：

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "suggest" : {
    "title_suggest" : [
      {
        "text" : "beauty",
        "offset" : 0,
        "length" : 6,
        "options" : [ ]
      }
    ]
  }
}

此时 ES 就没有给出建议搜索。可以使用 suggest_mode 来控制。

GET movies/_search
{
  "suggest": {
    "title_suggest": {
      "text": "beauty",
      "term": {
        "field": "title",
        "suggest_mode": "always"
      }
    }
  }
}

suggest_mode 代表 ES 的建议模式，有 3 个值：

missing：倒排索引没有的时候才给建议
always：无论倒排索引中有没有都给建议
popular：搜索建议中比较流行常用的单词

ES 的自动补全功能

我们在浏览器搜索引擎中经常见到 ES 的自动补全功能：

我们输入一个词，就会从数据中匹配。所以自动补全功能对性能要求很高。针对这个，ES 没有采取倒排索引的方式，数据类型要改为 completion

定义 mapping

PUT movies
{
  "mappings" : {
      "properties" : {
        "@version" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "genre" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "id" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "title" : {
          "type" : "completion"
        },
        "year" : {
          "type" : "long"
        }
      }
    }
}

将 title 的数据类型设置为 completion。这样才可以使用 ES 的自动补全功能。

前缀搜索

GET movies/_search
{
  "_source": [""],
  "suggest": {
    "title_prefix_suggest": {
      "prefix": "bea",
      "completion": {
        "field" : "title",
        "skip_duplicates": true,
        "size":10
      }
    }
  }
}

prefix：前缀，ES 会自动补全。

completion：

field：需要自动补全的值对应的的字段。
skip_duplicates：去掉重复内容。
size：查出的结果数量，默认值是 5。

ES 高亮显示-highlight

平时在使用浏览器搜索时就会看到查询的内容高亮显示

将 title 和 genere 中的所有 romance 进行一个高亮显示

GET movies/_search
{
  "query": {
    "multi_match": {
      "query": "romance",
      "fields": ["title", "genre"]
    }
  },
  "highlight": {
    "pre_tags": "<span>",
    "post_tags": "</span>",
    "fields": {
      "title": {},
      "genre": {
        "pre_tags": "<em>",
        "post_tags": "</em>"
      }
    }
  }
}

先用了 query 查询，然后使用 highlight 高亮显示。

pre_tags：默认值是 <em>，会加在高亮词的前面。

post_tags：默认值是 </em>，会加在高亮词的后面。

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 7734,
      "relation" : "eq"
    },
    "max_score" : 9.840833,
    "hits" : [
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "2894",
        "_score" : 9.840833,
        "_source" : {
          "year" : 1999,
          "id" : "2894",
          "title" : "Romance",
          "@version" : "1",
          "genre" : [
            "Drama",
            "Romance"
          ]
        },
        "highlight" : {
          "genre" : [
            "<em>Romance</em>"
          ],
          "title" : [
            "<span>Romance</span>"
          ]
        }
      },
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "116867",
        "_score" : 9.840833,
        "_source" : {
          "year" : 1930,
          "id" : "116867",
          "title" : "Romance",
          "@version" : "1",
          "genre" : [
            "Drama",
            "Romance"
          ]
        },
        "highlight" : {
          "genre" : [
            "<em>Romance</em>"
          ],
          "title" : [
            "<span>Romance</span>"
          ]
        }
      },
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "124991",
        "_score" : 9.840833,
        "_source" : {
          "year" : 2008,
          "id" : "124991",
          "title" : "Romance",
          "@version" : "1",
          "genre" : [
            "Romance"
          ]
        },
        "highlight" : {
          "genre" : [
            "<em>Romance</em>"
          ],
          "title" : [
            "<span>Romance</span>"
          ]
        }
      },
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "555",
        "_score" : 8.284594,
        "_source" : {
          "year" : 1993,
          "id" : "555",
          "title" : "True Romance",
          "@version" : "1",
          "genre" : [
            "Crime",
            "Thriller"
          ]
        },
        "highlight" : {
          "title" : [
            "True <span>Romance</span>"
          ]
        }
      },
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "3501",
        "_score" : 8.284594,
        "_source" : {
          "year" : 1985,
          "id" : "3501",
          "title" : "Murphy's Romance",
          "@version" : "1",
          "genre" : [
            "Comedy",
            "Romance"
          ]
        },
        "highlight" : {
          "genre" : [
            "<em>Romance</em>"
          ],
          "title" : [
            "Murphy's <span>Romance</span>"
          ]
        }
      },
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "5769",
        "_score" : 8.284594,
        "_source" : {
          "year" : 1981,
          "id" : "5769",
          "title" : "Modern Romance",
          "@version" : "1",
          "genre" : [
            "Comedy",
            "Romance"
          ]
        },
        "highlight" : {
          "genre" : [
            "<em>Romance</em>"
          ],
          "title" : [
            "Modern <span>Romance</span>"
          ]
        }
      },
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "40342",
        "_score" : 8.284594,
        "_source" : {
          "year" : 2005,
          "id" : "40342",
          "title" : "Romance & Cigarettes",
          "@version" : "1",
          "genre" : [
            "Comedy",
            "Drama",
            "Musical",
            "Romance"
          ]
        },
        "highlight" : {
          "genre" : [
            "<em>Romance</em>"
          ],
          "title" : [
            "<span>Romance</span> & Cigarettes"
          ]
        }
      },
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "133712",
        "_score" : 8.284594,
        "_source" : {
          "year" : 1977,
          "id" : "133712",
          "title" : "Office Romance",
          "@version" : "1",
          "genre" : [
            "Comedy",
            "Romance"
          ]
        },
        "highlight" : {
          "genre" : [
            "<em>Romance</em>"
          ],
          "title" : [
            "Office <span>Romance</span>"
          ]
        }
      },
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "149446",
        "_score" : 8.284594,
        "_source" : {
          "year" : 2010,
          "id" : "149446",
          "title" : "Petty Romance",
          "@version" : "1",
          "genre" : [
            "Comedy",
            "Drama"
          ]
        },
        "highlight" : {
          "title" : [
            "Petty <span>Romance</span>"
          ]
        }
      },
      {
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "150016",
        "_score" : 8.284594,
        "_source" : {
          "year" : 2012,
          "id" : "150016",
          "title" : "Brasserie Romance",
          "@version" : "1",
          "genre" : [
            "Comedy",
            "Drama"
          ]
        },
        "highlight" : {
          "title" : [
            "Brasserie <span>Romance</span>"
          ]
        }
      }
    ]
  }
}

将 2012 年电影的名字中包含 romance 的电影，将 title 中 romance 进行高亮显示，同时将这些电影中 genre 包含 children 的单词进行高亮显示。

# 将 2012 年电影的名字中包含 romance 的电影，将 title 中 romance 进行高亮显示，同时将这些电影中 genre 包含 children 的单词进行高亮显示。
GET movies/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "year": "2012"
          }
        },
        {
         "match": {
           "title": "romance"
         }
        }
      ]
    }
  },
  "highlight": {
    "fields": {
      "title": {},
      "genre": {
        "pre_tags": "<sapn>",
        "post_tags": "</span>",
        "highlight_query":{
          "match": {
            "genre": "children"
          }
        }
        }
      }
    }
}

九、ES springboot 结合

版本 4.0 以来已弃用类‎TransportClient``TransportClient

版本对应关系：

普通查询

引入依赖

sprigboot 2.4.1 + jdk8 + spring-boot-starter-data-elasticsearch 4.1.0

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>2.4.1</version>
        <relativePath/> <!-- lookup parent from repository -->
    </parent>
    <groupId>com.passerbywl</groupId>
    <artifactId>esspringboot</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <name>esspringboot</name>
    <description>Demo project for Spring Boot</description>
    <properties>
        <java.version>1.8</java.version>
    </properties>
    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>
        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
            <optional>true</optional>
        </dependency>
<!--        引入 elasticsearch springboot start-->
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>
    </dependencies>
    <build>
        <plugins>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
                <configuration>
                    <excludes>
                        <exclude>
                            <groupId>org.projectlombok</groupId>
                            <artifactId>lombok</artifactId>
                        </exclude>
                    </excludes>
                </configuration>
            </plugin>
        </plugins>
    </build>
</project>

Client

package com.passerbywl.esspringboot.config;
import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.data.elasticsearch.client.ClientConfiguration;
import org.springframework.data.elasticsearch.client.RestClients;
import org.springframework.data.elasticsearch.config.AbstractElasticsearchConfiguration;
/**
 * @author liyanan
 * @create 2021-01-04 17:45
 */
@Configuration
public class RestClientConfig extends AbstractElasticsearchConfiguration {
    @Override
    @Bean
    public RestHighLevelClient elasticsearchClient() {
        final ClientConfiguration clientConfiguration = ClientConfiguration.builder()
                .connectedTo("localhost:9200")
                .build();
        return RestClients.create(clientConfiguration).rest();
    }
}

实体类

package com.passerbywl.esspringboot.entity;
import lombok.Builder;
import lombok.Data;
import org.springframework.data.annotation.Id;
import org.springframework.data.elasticsearch.annotations.Document;
import java.util.List;
/**
 * @author liyanan
 * @create 2020-12-28 19:36
 */
@Data
@Builder
@Document(indexName = "movies")
public class Movies {
    @Id
    private String id;
    private String title;
    private List<String> genre;
    private long year;
}

Controller

package com.passerbywl.esspringboot.controller;
import com.passerbywl.esspringboot.entity.Movies;
import com.passerbywl.esspringboot.repository.MoviesRepository;
import org.springframework.data.domain.Page;
import org.springframework.data.domain.PageRequest;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
import java.util.List;
/**
 * @author liyanan
 * @create 2020-12-28 20:01
 */
@RestController
@RequestMapping("/movies")
public class MoviesController {
    private MoviesRepository moviesRepository;
    public MoviesController(MoviesRepository moviesRepository) {
        this.moviesRepository = moviesRepository;
    }
    @GetMapping("/getByName")
    public List<Movies> getPageData(@RequestParam String title,
                                    @RequestParam(defaultValue = "0") Integer page,
                                    @RequestParam(defaultValue = "10") Integer size) {
        PageRequest pageable = PageRequest.of(page, size);
        Page<Movies> pageData = moviesRepository.findByTitle(title, pageable);
        return pageData.getContent();
    }
}

运行 springboot 程序，然后测试：

Repository

package com.passerbywl.esspringboot.repository;
import com.passerbywl.esspringboot.entity.Movies;
import org.springframework.data.domain.Page;
import org.springframework.data.domain.Pageable;
import org.springframework.data.elasticsearch.repository.ElasticsearchRepository;
/**
 * @author liyanan
 * @create 2020-12-28 19:39
 * ElasticsearchRepository<实体类型, id类型>
 */
public interface MoviesRepository extends ElasticsearchRepository<Movies, String> {
    Page<Movies> findByTitle(String title, Pageable pageable);
}

ElasticsearchRepository<实体类型, id 类型>

ES + SpringBoot 实现自动补全简单实现

ES 语句：

GET movies/_search
{
  "_source": [""],
  "suggest": {
    "title_prefix_suggest": {
      "prefix": "bea",
      "completion": {
        "field" : "title",
        "skip_duplicates": true,
        "size":10
      }
    }
  }
}

对应的建议语句：

package com.passerbywl.esspringboot.controller;
import org.elasticsearch.search.suggest.Suggest;
import org.elasticsearch.search.suggest.SuggestBuilder;
import org.elasticsearch.search.suggest.completion.CompletionSuggestionBuilder;
import org.springframework.data.elasticsearch.core.ElasticsearchOperations;
import org.springframework.data.elasticsearch.core.mapping.IndexCoordinates;
import org.springframework.web.bind.annotation.CrossOrigin;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
import java.util.ArrayList;
import java.util.List;
/**
 * @author liyanan
 * @create 2021-01-07 10:30
 */
@RestController
@CrossOrigin("*")
public class MoviesSuggestController {
    private ElasticsearchOperations  elasticsearchOperations;
    public MoviesSuggestController(ElasticsearchOperations elasticsearchOperations) {
        this.elasticsearchOperations = elasticsearchOperations;
    }
    @GetMapping("/movie/suggest")
    public List<String> suggest(@RequestParam String prefix) {
        List<String> result = new ArrayList<>();
        CompletionSuggestionBuilder completionSuggestionBuilder =
                new CompletionSuggestionBuilder("title")
                .skipDuplicates(true)
                .prefix(prefix)
                .size(10);
        SuggestBuilder suggestBuilder = new SuggestBuilder().addSuggestion("title_suggest", completionSuggestionBuilder);
        // 获取 Suggest
        Suggest suggest = elasticsearchOperations.suggest(suggestBuilder, IndexCoordinates.of("movies")).getSuggest();
        List<? extends Suggest.Suggestion.Entry<? extends Suggest.Suggestion.Entry.Option>> entries = suggest.getSuggestion("title_suggest").getEntries();
        entries.forEach(entry -> {
            List<? extends Suggest.Suggestion.Entry.Option> options = entry.getOptions();
            options.forEach(op -> {
                result.add(op.getText().toString());
            });
        });
        return result;
    }
}

Controller 上面要配置跨域，否则下面的 html 无法访问到 SpringBoot 服务。

index.html

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Title</title>
    <style>
        * {
            margin: 0;
            padding: 0;
        }
        input {
            width: 400px;
            height: 24px;
            margin-left: 100px;
        }
        div {
            width: 400px;
            height: 500px;
            margin-left: 100px;
            background: bisque;
        }
    </style>
</head>
<body>
    <input type="text" oninput="getAutoCompletedHints(this)">
    <div id="content"></div>
</body>
<script src="https://cdn.bootcdn.net/ajax/libs/jquery/2.2.1/jquery.js"></script>
<script>
    var content = document.getElementById("content");
    function getAutoCompletedHints(inputDom) {
        let prefix = inputDom.value;
        if (prefix.trim()) {
            content.innerHTML = '';
            $.getJSON("http://localhost:8080/movie/suggest/?prefix=" + prefix.trim(), {},
                function (_data) {
                    for (let i = 0; i < _data.length; i++) {
                        let pTag = document.createElement('p');
                        pTag.innerText = _data[i];
                        content.appendChild(pTag);
                    }
            });
        }
    }
</script>
</html>