Elasticsearch结构化搜索与查询

Elasticsearch 的功能之一就是搜索，搜索主要分为两种类型，结构化搜索和全文搜索。结构化搜索是指有关查询那些具有内在结构数据的过程。比如日期、时间和数字都是结构化的：它们有精确的格式，我们可以对这些格式进行逻辑操作。比较常见的操作包括比较数字或时间的范围，或判定两个值的大小。

导入学习数据：

curl -XPOST 'http://hadoop01:9200/school/student/_bulk' -d '
{ "index": { "_id": 1 }}
{ "name" : "liubei", "age" : 20 , "sex": "boy", "birth": "1996-01-02" , "about": "i like diaocan he girl" }
{ "index": { "_id": 2 }}
{ "name" : "guanyu", "age" : 21 , "sex": "boy", "birth": "1995-01-02" , "about": "i like diaocan" }
{ "index": { "_id": 3 }}
{ "name" : "zhangfei", "age" : 18 , "sex": "boy", "birth": "1998-01-02" , "about": "i like trivel" }
{ "index": { "_id": 4 }}
{ "name" : "diaocan", "age" : 20 , "sex": "girl", "birth": "1996-01-02" , "about": "i like trivel and sport" }
{ "index": { "_id": 5 }}
{ "name" : "panjinlian", "age" : 25 , "sex": "girl", "birth": "1991-01-02" , "about": "i like trivel and wusong" }
{ "index": { "_id": 6 }}
{ "name" : "caocao", "age" : 30 , "sex": "boy", "birth": "1988-01-02" , "about": "i like xiaoqiao" }
{ "index": { "_id": 7 }}
{ "name" : "zhaoyun", "age" : 31 , "sex": "boy", "birth": "1997-01-02" , "about": "i like trivel and music" }
{ "index": { "_id": 8 }}
{ "name" : "xiaoqiao", "age" : 18 , "sex": "girl", "birth": "1998-01-02" , "about": "i like caocao" }
{ "index": { "_id": 9 }}
{ "name" : "daqiao", "age" : 20 , "sex": "girl", "birth": "1996-01-02" , "about": "i like trivel and history" }
'

1：使用match_all做查询

curl -XGET 'hadoop01:9200/school/student/_search?pretty' -d '   
{
    "query": {
        "match_all": {}
    }
}'

问题：通过match_all匹配后，会把所有的数据检索出来，但是往往真正的业务需求并非要找全部的数据，而是检索出自己想要的；

并且对于es集群来说，直接检索全部的数据，很容易造成GC现象

所以，我们要学会如何进行高效的检索数据

2：通过关键字段进行查询

查询喜欢旅游的人：

curl -XGET 'hadoop01:9200/school/student/_search?pretty' -d '   
{
    "query": {
         "match": {"about": "trivel"}
     }
}'

如果此时想查询喜欢旅游的，并且不能是男孩的，怎么办？

curl -XGET 'hadoop01:9200/school/student/_search?pretty' -d '
{

    "query": {
         "match": {
           "about": "trivel",
           "sex": "girl"
         }
     }
}'

【这种方式是错误的，因为一个match下，不能出现多个字段值[match] query doesn't support multiple fields】，需要使用复合查询

3：bool的复合查询

当出现多个查询语句组合的时候，可以用bool来包含。bool包含：must，must_not或者should， should表示or的意思

例子：查询非男性中喜欢旅行的人

curl -XGET 'hadoop01:9200/school/student/_search?pretty' -d '
{
"query": {
   "bool": {
      "must": { "match": {"about": "trivel"}},
      "must_not": {"match": {"sex": "boy"}}
     }
  }
}'

bool的复合查询中的should语句：

should表示可有可无的（如果should匹配到了就展示，否则就不展示）

例子：查询喜欢旅行的，如果有男性的则显示，否则不显示

curl -XGET 'hadoop01:9200/school/student/_search?pretty' -d '
{
"query": {
   "bool": {
      "must": { "match": {"about": "trivel"}},
      "should": {"match": {"sex": "boy"}}         
     }
  }
}'

4： term匹配

使用term进行精确匹配（比如数字，日期，布尔值或 not_analyzed的字符串(未经分析的文本数据类型)）

语法

{ "term": { "age": 20 }}

{ "term": { "date": "2018-04-01" }}

{ "term": { "sex": “boy” }}

{ "term": { "about": "trivel" }}

例子：查询喜欢旅行的

curl -XGET 'hadoop01:9200/school/student/_search?pretty' -d '
{
"query": {
   "bool": {
      "must": { "term": {"about": "trivel"}},
      "should": {"term": {"sex": "boy"}}         
     }}
}'

5：使用terms匹配多个值

curl -XGET 'hadoop01:9200/school/student/_search?pretty' -d '
{
"query": {
   "bool": {
      "must": { "terms": {"about": ["trivel","history"]}}          
     }
  }
}'

term主要是用于精确的过滤比如说：”我爱你”

在match下面匹配可以为包含：我、爱、你、我爱等等的解析器

在term语法下面就精准匹配到：”我爱你”

例子：使用match方式，会匹配包含：and、history、and history

curl -XGET 'hadoop01:9200/school/student/_search?pretty' -d '
{
"query": {
   "bool": {
      "must": { "match": {"about": "and history"}}              
     }
  }
}'

如果使用term进行精确匹配，那么会精确匹配and history：

curl -XGET 'hadoop01:9200/school/student/_search?pretty' -d '
{
"query": {
   "bool": {
      "must": { "term": {"about": "and history"}}              
     }
  }
}'

返回结果：
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  }
}
如果直接查询history
curl -XGET 'hadoop01:9200/school/student/_search?pretty' -d '
{
"query": {
   "bool": {
      "must": { "term": {"about": "history"}}              
     }
  }
}'
会返回结果，这就说明使用term查询，会精确匹配，如果没有这个词项，就匹配不到

6：Range过滤

Range过滤允许我们按照指定的范围查找一些数据：操作范围：gt::大于，gae::大于等于,lt::小于，lte::小于等于

例子：查找出大于20岁，小于等于25岁的学生

curl -XGET 'hadoop01:9200/school/student/_search?pretty' -d '
{
"query": {
   "range": {
    "age": {"gt":20,"lte":25}
         }
      }
   }
}'

7：exists和 missing过滤

通过exists和missing过滤可以找到文档中是否包含某个字段或者是没有某个字段

例子：查找字段中包含age的文档

curl -XGET 'hadoop01:9200/school/student/_search?pretty' -d '
{
"query": {
   "exists": {
    "field": "age"  
         }
      }
   }
}'

8：bool的多条件过滤

用bool也可以像之前match一样来过滤多行条件：

must :: 多个查询条件的完全匹配,相当于 and 。
must_not :: 多个查询条件的相反匹配，相当于 not 。
should :: 至少有一个查询条件匹配, 相当于 or

例子：过滤出about字段包含trivel并且年龄大于20岁小于30岁的同学

curl -XGET 'hadoop01:9200/school/student/_search?pretty' -d '
{
"query": {
   "bool": {
   "must": { "term": {"about": "trivel"}},
   "must": {"range": {"age": {"gt": 20,"lt":30}}}
     }
  }
}'

9：查询与过滤条件合并

通常复杂的查询语句，我们也要配合过滤语句来实现缓存，用filter语句就可以来实现

例子：查询出喜欢旅行的，并且年龄是20岁的文档

curl -XGET 'hadoop01:9200/school/student/_search?pretty' -d '
{
  "query": {
   "bool": {
     "must": {"match": {"about": "trivel"}},     
     "filter": [{"term":{"age": 20}}]
     }
  }
}'