elasticsearch多种搜索方式

简要

1、query string search
2、query DSL
3、query filter
4、full-text search
5、phrase search
6、highlight search

1、query string search

搜索全部商品：GET /ecommerce/product/_search

query string search的由来，因为search参数都是以http请求的query string来附带的。

搜索商品名称中包含yagao的商品，而且按照售价降序排序：GET /ecommerce/product/_search?q=name:yagao&sort=price:desc

适用于临时的在命令行使用一些工具，比如curl，快速的发出请求，来检索想要的信息；

但是如果查询请求很复杂，是很难去构建的，在生产环境中，几乎很少使用query string search。

took：耗费了几毫秒
timed_out：是否超时，这里是没有
_shards：数据拆成了5个分片，所以对于搜索请求，会打到所有的primary shard（或者是它的某个replica shard也可以）
hits.total：查询结果的数量，3个document
hits.max_score：score的含义，就是document对于一个search的相关度的匹配分数，越相关，就越匹配，分数也高
hits.hits：包含了匹配搜索的document的详细数据

{

  "took": 2,

  "timed_out": false,

  "_shards": {

    "total": 5,

    "successful": 5,

    "failed": 0

  },

  "hits": {

    "total": 3,

    "max_score": 1,

    "hits": [

      {

        "_index": "ecommerce",

        "_type": "product",

        "_id": "2",

        "_score": 1,

        "_source": {

          "name": "jiajieshi yagao",

          "desc": "youxiao fangzhu",

          "price": 25,

          "producer": "jiajieshi producer",

          "tags": [

            "fangzhu"

          ]

        }

      },

      {

        "_index": "ecommerce",

        "_type": "product",

        "_id": "1",

        "_score": 1,

        "_source": {

          "name": "gaolujie yagao",

          "desc": "gaoxiao meibai",

          "price": 30,

          "producer": "gaolujie producer",

          "tags": [

            "meibai",

            "fangzhu"

          ]

        }

      },

      {

        "_index": "ecommerce",

        "_type": "product",

        "_id": "3",

        "_score": 1,

        "_source": {

          "name": "zhonghua yagao",

          "desc": "caoben zhiwu",

          "price": 40,

          "producer": "zhonghua producer",

          "tags": [

            "qingxin"

          ]

        }

      }

    ]

  }

}

GET /test_index/test_type/_search?q=test_field:test

GET /test_index/test_type/_search?q=+test_field:test

GET /test_index/test_type/_search?q=-test_field:test

一个是掌握q=field:search content的语法，还有一个是掌握+和-的含义,+是必须包含，-是不包含

_all metadata的原理和作用

GET /test_index/test_type/_search?q=test

直接可以搜索所有的field，任意一个field包含指定的关键字就可以搜索出来。

2、query DSL

DSL：Domain Specified Language，特定领域的语言

http request body：请求体，可以用json的格式来构建查询语法，比较方便，可以构建各种复杂的语法，比query string search肯定强大多了

查询所有的商品

GET /ecommerce/product/_search

{

"query": { "match_all": {} }

}

查询名称包含yagao的商品，同时按照价格降序排序

GET /ecommerce/product/_search

{

"query" : {

"match" : {

"name" : "yagao"

}

"sort": [

{ "price": "desc" }

]

}

分页查询

分页查询商品，总共3条商品，假设每页就显示1条商品，现在显示第2页，所以就查出来第2个商品.from://从第几个商品开始查

GET /ecommerce/product/_search

{

"query": { "match_all": {} },

"from": 1,

"size": 1

}

指定要查询出来商品的名称和价格就可以

GET /ecommerce/product/_search

{

"query": { "match_all": {} },

"_source": ["name", "price"]

}

更加适合生产环境的使用，可以构建复杂的查询

Scoll滚动搜索

如果一次性要查出来比如10万条数据，那么性能会很差，此时一般会采取用scoll滚动查询，一批一批的查，直到所有数据都查询完处理完

使用scoll滚动搜索，可以先搜索一批数据，然后下次再搜索一批数据，以此类推，直到搜索出全部的数据来

scoll搜索会在第一次搜索的时候，保存一个当时的视图快照，之后只会基于该旧的视图快照提供数据搜索，如果这个期间数据变更，是不会让用户看到的

采用基于_doc进行排序的方式，性能较高

每次发送scroll请求，我们还需要指定一个scoll参数，指定一个时间窗口，每次搜索请求只要在这个时间窗口内能完成就可以了

每次取3条

GET /test_index/test_type/_search?scroll=1m

{

"query": {

"match_all": {}

"sort": [ "_doc" ],

"size": 3

}

获得的结果会有一个scoll_id，下一次再发送scoll请求的时候，必须带上这个scoll_id

GET /_search/scroll

{

"scroll": "1m",

"scroll_id" : "DnF1ZXJ5VGhlbkZldGNoBQAAAAAAACxeFjRvbnNUWVZaVGpHdklqOV9zcFd6MncAAAAAAAAsYBY0b25zVFlWWlRqR3ZJajlfc3BXejJ3AAAAAAAALF8WNG9uc1RZVlpUakd2SWo5X3NwV3oydwAAAAAAACxhFjRvbnNUWVZaVGpHdklqOV9zcFd6MncAAAAAAAAsYhY0b25zVFlWWlRqR3ZJajlfc3BXejJ3"

}

scoll，看起来挺像分页的，但是其实使用场景不一样。分页主要是用来一页一页搜索，给用户看的；scoll主要是用来一批一批检索数据，让系统进行处理的。

组合多个搜索条件

GET /website/article/_search

{

"query": {

"bool": {

"must": [ //title必须包含elasticsearch

{

"match": {

"title": "elasticsearch"

}

"should": [ //content可以包含elasticsearch也可以不包含

{

"match": {

"content": "elasticsearch"

}

"must_not": [ //author_id必须不为111

{

"match": {

"author_id": 111

}

]

}

1、match all

GET /_search

{

"query": {

"match_all": {}

}

2、match

GET /_search

{

"query": { "match": { "title": "my elasticsearch article" }}

}

3、multi match

GET /test_index/test_type/_search

{

"query": {

"multi_match": {

"query": "test", //搜索的文本

"fields": ["test_field", "test_field1"] //多个field上面搜索

}

4、range query

GET /company/employee/_search

{

"query": {

"range": {

"age": {

"gte": 30

}

5、term query

//把这个字段当成exact value去查询(前提条件：手动创建mapping的时候需要指定no_analy不分词去建立索引，这样才可以用test hello在term搜到)

GET /test_index/test_type/_search

{

"query": {

"term": {

"test_field": "test hello"

}

6、terms query

GET /_search

{

"query": { "terms": { "tag": [ "search", "full_text", "nosql" ] }} //对tag字段指定多个搜索词

}

3、query filter

搜索商品名称包含yagao，而且售价大于25元的商品

GET /ecommerce/product/_search

{

"query" : {

"bool" : {

"must" : {

"match" : {

"name" : "yagao"

}

"filter" : {

"range" : {

"price" : { "gt" : 25 }

}

{

"bool": {

"must": { "match": { "title": "how to make millions" }},

"must_not": { "match": { "tag": "spam" }},

"should": [

{ "match": { "tag": "starred" }}

"filter": {

"range": { "date": { "gte": "2014-01-01" }}

}

{

"bool": {

"must": { "match": { "title": "how to make millions" }},

"must_not": { "match": { "tag": "spam" }},

"should": [

{ "match": { "tag": "starred" }}

"filter": {

"bool": {

"must": [

{ "range": { "date": { "gte": "2014-01-01" }}},

{ "range": { "price": { "lte": 29.99 }}}

"must_not": [

{ "term": { "category": "ebooks" }}

]

}

GET /company/employee/_search

{

"query": {

"constant_score": { //constant_score是固定语法单纯使用filter的时候需要加上的

"filter": {

"range": {

"age": {

"gte": 30

}

4、full-text search

GET /ecommerce/product/_search

{

"query" : {

"match" : {

"producer" : "yagao producer"

}

5、phrase search（短语搜索）

跟全文检索相对应，相反，全文检索会将输入的搜索串拆解开来，去倒排索引里面去一一匹配，只要能匹配上任意一个拆解后的单词，就可以作为结果返回

phrase search，要求输入的搜索串，必须在指定的字段文本中，完全包含一模一样的，才可以算匹配，才能作为结果返回

GET /ecommerce/product/_search

{

"query" : {

"match_phrase" : {

"producer" : "yagao producer"

}

6、highlight search

GET /ecommerce/product/_search

{

"query" : {

"match" : {

"producer" : "producer"

}

"highlight": {

"fields" : {

"producer" : {}

}

7、判断搜索是否合法

//判断搜索是否合法，如果不合法问题在哪里

GET /test_index/test_type/_validate/query?explain

{

"query": {

"math": {

"test_field": "test"

}

{

"valid": false,

"error": "org.elasticsearch.common.ParsingException: no [query] registered for [math]"

}

8、排序

1、默认排序规则

默认情况下，是按照_score降序排序的

然而，某些情况下，可能没有有用的_score，比如说filter

GET /_search

{

"query" : {

"bool" : {

"filter" : {

"term" : {

"author_id" : 1

}

当然，也可以是constant_score

GET /_search

{

"query" : {

"constant_score" : {

"filter" : {

"term" : {

"author_id" : 1

}

2、定制排序规则

GET /company/employee/_search

{

"query": {

"constant_score": {

"filter": {

"range": {

"age": {

"gte": 30

}

"sort": [

{

"join_date": {

"order": "asc"

}

]

}

问题：如果对一个string field进行排序，结果往往不准确，因为分词后是多个单词，再排序就不是我们想要的结果了

通常解决方案是，将一个string field建立两次索引，一个分词，用来进行搜索；一个不分词，用来进行排序

PUT /website

{

"mappings": {

"article": {

"properties": {

"title": {

"type": "text", //分词索引

"fields": {

"raw": { //不分词索引

"type": "string",

"index": "not_analyzed"

}

"fielddata": true //正排索引

"content": {

"type": "text"

"post_date": {

"type": "date"

"author_id": {

"type": "long"

}

PUT /website/article/1

{

"title": "first article",

"content": "this is my second article",

"post_date": "2017-01-01",

"author_id": 110

}

{

"took": 2,

"timed_out": false,

"_shards": {

"total": 5,

"successful": 5,

"failed": 0

"hits": {

"total": 3,

"max_score": 1,

"hits": [

{

"_index": "website",

"_type": "article",

"_id": "2",

"_score": 1,

"_source": {

"title": "first article",

"content": "this is my first article",

"post_date": "2017-02-01",

"author_id": 110

}

{

"_index": "website",

"_type": "article",

"_id": "1",

"_score": 1,

"_source": {

"title": "second article",

"content": "this is my second article",

"post_date": "2017-01-01",

"author_id": 110

}

{

"_index": "website",

"_type": "article",

"_id": "3",

"_score": 1,

"_source": {

"title": "third article",

"content": "this is my third article",

"post_date": "2017-03-01",

"author_id": 110

}

]

}

GET /website/article/_search

{

"query": {

"match_all": {}

"sort": [

{

"title.raw": { //拿未分词索引的去排，上面有创建了

"order": "desc"

}

]

}