Elasticsearch 常用基本查询

安装启动很简单，参考官网步骤：https://www.elastic.co/downloads/elasticsearch

　　为了介绍Elasticsearch中的不同查询类型，我们将对带有下列字段的文档进行搜索：title（标题），authors（作者），summary（摘要），release date（发布时间）以及number of reviews（评论数量），首先，让我们创建一个新的索引，并通过bulk API查询文档：　　

　　为了展示Elasticsearch中不同查询的用法，首先在Elasticsearch里面创建了employee相关的documents，每本书主要涉及以下字段： first_name, last_name, age,about,interests,操作如下：

 curl -XPUT 'localhost:9200/megacorp/employee/3' -d '{ "first_name" : "Douglas", "last_name" : "Fir", "age" : 35, "about" : "I like to build cabinets", "interests": "forestry" }'

 curl -XPUT 'localhost:9200/megacorp/employee/2' -d '{ "first_name" : "Jane", "last_name" : "Smith", "age" : 32, "about" : "I like to collect rock albums", "interests": "music" }'

 curl -XPUT 'localhost:9200/megacorp/employee/1' -d '{ "first_name" : "John", "last_name" : "Smith", "age" : 25, "about" : "I love to go rock climbing", "interests": [ "sports", "music" ] }'

1. 基本匹配查询(Basic Match Query)

　　基本匹配查询主要有两种形式：（1）使用Search Lite API，并将所有的搜索参数都通过URL传递；

　　　　　　　　　　　　　　　　（2）使用Elasticsearch DSL，其可以通过传递一个JSON请求来获取结果。下面是在所有的字段中搜索带有"John"的结果

 curl -XGET 'localhost:9200/megacorp/employee/_search?q=John'

如果我们使用Query DSL来展示出上面一样的结果可以这么来写：

curl -XGET 'localhost:9200/megacorp/_search' -d '

{

    "query": {

        "multi_match" : {

            "query" : "John",

            "fields" : ["_all"]

        }

    }

}'

　　其输出和上面使用/_search?q=john的输出一样。上面的multi_match关键字通常在查询多个fields的时候作为match关键字的简写方式。fields属性指定需要查询的字段，如果我们想查询所有的字段，这时候可以使用_all关键字，正如上面的一样。以上两种方式都允许我们指定查询哪些字段。比如，我们想查询interest中出现music的员工，那么我们可以这么查询：

 curl -XGET 'localhost:9200/megacorp/employee/_search?q=interests:music'

　　然而，DSL方式提供了更加灵活的方式来构建更加复杂的查询（我们将在后面看到），甚至指定你想要的返回结果。下面的例子中，我将指定需要返回结果的数量，开始的偏移量（这在分页的情况下非常有用），需要返回document中的哪些字段以及高亮关键字：

curl -XGET 'localhost:9200/megacorp/employee/_search?pretty' -d '{"query": { "match" : { "interests" : "music" }},"size": 2,"from": 0,"_source": [ "first_name", "last_name", "interests" ],"highlight": {"fields" : { "interests" : { } } } }'

　　需要注意的是：对于查询多个关键字，match关键字允许我们使用and操作符来代替默认的or操作符。你也可以指定minimum_should_match操作符来调整返回结果的相关性(tweakrelevance)。

2. Multi-field Search

　　正如我们之前所看到的，想在一个搜索中查询多个 document field （比如使用同一个查询关键字同时在title和summary中查询），你可以使用multi_match查询，使用如下：

curl -XGET 'localhost:9200/megacorp/employee/_search' -d '

{

    "query": {

        "multi_match" : {

            "query" : "rock",

            "fields": ["about", "interests"]

        }

    }

}'

3. Boosting
　　我们上面使用同一个搜索请求在多个field中查询，你也许想提高某个field的查询权重,在下面的例子中，我们把interests的权重调成3，这样就提高了其在结果中的权重，这样把_id=4的文档相关性大大提高了，如下：

curl -XGET 'localhost:9200/megacorp/employee/_search' -d '

{

    "query": {

        "multi_match" : {

            "query" : "rock",

            "fields": ["about", "interests^3"]

        }

    }

}'

Boosting不仅仅意味着计算出来的分数(calculated score)直接乘以boost factor，最终的boost value会经过归一化以及其他一些内部的优化

4. Bool Query
　　我们可以在查询条件中使用AND/OR/NOT操作符，这就是布尔查询(Bool Query)。布尔查询可以接受一个must参数(等价于AND)，一个must_not参数(等价于NOT)，以及一个should参数(等价于OR)。比如，我想查询about中出现music或者climb关键字的员工，员工的名字是John，但姓氏不是smith，我们可以这么来查询：

curl -XGET 'localhost:9200/megacorp/employee/_search' -d '

{

    "query": {

        "bool": {

                "must": {

                    "bool" : {

                        "should": [

                            { "match": { "about": "music" }},

                            { "match": { "about": "climb" }} ]

                    }

                },

                "must": {

                    "match": { "first_nale": "John" }

                },

                "must_not": {

                    "match": {"last_name": "Smith" }

                }

            }

    }

}'

5. Fuzzy Queries（模糊查询）

　　模糊查询可以在Match和 Multi-Match查询中使用以便解决拼写的错误，模糊度是基于Levenshteindistance计算与原单词的距离。使用如下：

curl -XGET 'localhost:9200/megacorp/employee/_search' -d '

{

    "query": {

        "multi_match" : {

            "query" : "rock climb",

            "fields": ["about", "interests"],

            "fuzziness": "AUTO"

        }

    },

    "_source": ["about", "interests", "first_name"],

    "size":

}'

　　上面我们将fuzziness的值指定为AUTO，其在term的长度大于5的时候相当于指定值为2，然而80%的人拼写错误的编辑距离(edit distance)为1，所有如果你将fuzziness设置为1可能会提高你的搜索性能

6. Wildcard Query(通配符查询)

　　通配符查询允许我们指定一个模式来匹配，而不需要指定完整的trem。?将会匹配如何字符；*将会匹配零个或者多个字符。比如我们想查找所有名字中以J字符开始的记录，我们可以如下使用：

curl -XGET 'localhost:9200/megacorp/employee/_search' -d '

{

    "query": {

            "wildcard" : {

                "first_name" : "s*"

            }

        },

        "_source": ["first_name", "last_name"],

    "highlight": {

            "fields" : {

                "first_name" : {}

            }

        }

}'

7. Regexp Query(正则表达式查询)
　　ElasticSearch还支持正则表达式查询，此方式提供了比通配符查询更加复杂的模式。比如我们先查找作者名字以J字符开头，中间是若干个a-z之间的字符，并且以字符n结束的记录，可以如下查询：

curl -XGET 'localhost:9200/megacorp/employee/_search' -d '

{

    "query": {

        "regexp" : {

            "first_name" : "J[a-z]*n"

        }

    },

    "_source": ["first_name", "age"],

    "highlight": {

        "fields" : {

            "first_name" : {}

        }

    }

}'

8. Match Phrase Query(匹配短语查询)
　　匹配短语查询要求查询字符串中的trems要么都出现Document中、要么trems按照输入顺序依次出现在结果中。在默认情况下，查询输入的trems必须在搜索字符串紧挨着出现，否则将查询不到。不过我们可以指定slop参数，来控制输入的trems之间有多少个单词仍然能够搜索到，如下所示：

curl -XGET 'localhost:9200/megacorp/employee/_search' -d '

{

    "query": {

        "multi_match": {

            "query": "climb rock",

            "fields": [

                "about",

                "interests"

            ],

            "type": "phrase",

            "slop":

        }

    },

    "_source": [

        "title",

        "about",

        "interests"

    ]

}'

　　从上面的例子可以看出，id为4的document被搜索（about字段里面精确匹配到了climb rock），并且分数比较高；而id为1的document也被搜索到了，虽然其about中的climb和rock单词并不是紧挨着的，但是我们指定了slop属性，所以被搜索到了。如果我们将"slop":3条件删除，那么id为1的文档将不会被搜索到。

9. Match Phrase Prefix Query(匹配短语前缀查询)
　　匹配短语前缀查询可以指定单词的一部分字符前缀即可查询到该单词，和match phrase query一样我们也可以指定slop参数；同时其还支持max_expansions参数限制被匹配到的terms数量来减少资源的使用,使用如下：

curl -XGET 'localhost:9200/megacorp/employee/_search' -d '

{

    "query": {

        "match_phrase_prefix": {

            "summary": {

                "query": "cli ro",

                "slop": ,

                "max_expansions":

            }

        }

    },

    "_source": [

        "about",

        "interests",

        "first_name"

    ]

}'

10. Query String
　　query_string查询提供了一种手段可以使用一种简洁的方式运行multi_match queries, bool queries, boosting, fuzzy matching, wildcards, regexp以及range queries的组合查询。在下面的例子中，我们运行了一个模糊搜索(fuzzy search)，搜索关键字是search algorithm，并且作者包含grant ingersoll或者tom morton。并且搜索了所有的字段，其中summary字段的权重为2：

curl -XGET 'localhost:9200/megacorp/employee/_search' -d '

{

    "query": {

        "query_string" : {

            "query": "(saerch~1 algorithm~1) AND (grant ingersoll) OR (tom morton)",

            "fields": ["_all", "summary^2"]

        }

    },

    "_source": [ "title", "summary", "authors" ],

    "highlight": {

        "fields" : {

            "summary" : {}

        }

    }

}'

11. Simple Query String(简单查询字符串)
　　simple_query_string是query_string的另一种版本，其更适合为用户提供一个搜索框中，因为其使用+/|/- 分别替换AND/OR/NOT，如果用输入了错误的查询，其直接忽略这种情况而不是抛出异常。使用如下：

curl -POST 'localhost:9200/megacorp/employee/_search' -d '

{

    "query": {

        "simple_query_string" : {

        "query": "(saerch~1 algorithm~1) + (grant ingersoll) | (tom morton)",

        "fields": ["_all", "summary^2"]

        }

    },

    "_source": [ "title", "summary", "authors" ],

    "highlight": {

        "fields" : {

            "summary" : {}

        }

    }

}'

12. Term/Terms Query
　　前面的例子中我们已经介绍了全文搜索(full-text search)，但有时候我们对结构化搜索中能够精确匹配并返回搜索结果更感兴趣。这种情况下我们可以使用term和terms查询。在下面例子中，我们想搜索所有兴趣中有music的人：

curl -POST 'localhost:9200/megacorp/employee/_search' -d '

{

    "query": {

        "term" : {

            "interests": "music"

        }

    },

    "_source" : ["first_name","last_name","interests"]

}'

我们还可以使用terms关键字来指定多个terms，如下：

{

    "query": {

        "terms" : {

            "publisher": ["oreilly", "packt"]

        }

    }

}

13. Term Query - Sorted

　　查询结果和其他查询结果一样可以很容易地对其进行排序，而且我们可以对输出结果按照多层进行排序：

curl -XPOST 'localhost:9200/megacorp/employee/_search' -d '

{

    "query": {

        "term" : {

            "interests": "music"

        }

    },

    "_source" : ["interests","first_name","about"],

    "sort": [

        { "publish_date": {"order":"desc"}},

        { "id": { "order": "desc" }}

    ]

}'

14. Range Query(范围查询)
另一种结构化查询就是范围查询。在下面例子中，我们搜索所有发行年份为2015的图书：

curl -XPOST 'localhost:9200/person/worker/_search?pretty' -d '

{

    "query": {

        "range" : {

            "birthday": {

                "gte": "2017-02-01",

                "lte": "2017-05-01"

            }

        }

    },

    "_source" : ["first_name","last_name","birthday"]

}'

范围查询可以应用于日期，数字以及字符类型的字段。

15. Filtered Query(过滤查询)
　　过滤查询允许我们对查询结果进行筛选。比如：我们查询about和interests中包含music关键字的员工，但是我们想过滤出birthday大于2017/02/01的结果，可以如下使用：

curl -XPOST :/megacorp/employee/_search?pretty' -d '

{

    "query": {

        "filtered": {

            "query" : {

                "multi_match": {

                    "query": "music",

                    "fields": ["about","interests"]

                }

            },

            "filter": {

                "range" : {

                    "birthday": {

                        "gte": 2017-02-01

                    }

                }

            }

        }

    },

    "_source" : ["first_name","last_name","about", "interests"]

}'

注意：过滤查询(Filtered queries)并不强制过滤条件中指定查询,如果没有指定查询条件，则会运行match_all查询，其将会返回index中所有文档，然后对其进行过滤，在实际运用中，过滤器应该先被执行，这样可以减少需要查询的范围，而且，第一次使用fliter之后其将会被缓存，这样会对性能代理提升。Filtered queries在即将发行的Elasticsearch 5.0中移除了，我们可以使用bool查询来替换他，下面是使用bool查询来实现上面一样的查询效果，返回结果一样：

curl -XPOST 'localhost:9200/megacorp/employee/_search?pretty' -d '

{

    "query": {

        "bool": {

            "must" : {

                "multi_match": {

                    "query": "music",

                    "fields": ["about","interests"]

                }

            },

            "filter": {

                "range" : {

                    "birthday": {

                        "gte": 2017-02-01

                    }

                }

            }

        }

    },

    "_source" : ["first_name","last_name","about", "interests"]

}'

16. Multiple Filters(多过滤器查询)
　　多过滤器查询可以通过结合使用bool过滤查询实现。下面的示例中，我们将筛选出返回的结果必须至少有20条评论，必须是在2015年之前发布的，而且应该是由O'Reilly出版的，首先建立索引iteblog_book_index并向其插入数据，如下所示：

curl -XPOST 'localhost:9200/iteblog_book_index/book/1' -d '{ "title": "Elasticsearch: The Definitive Guide", "authors": ["clinton gormley", "zachary tong"], "summary" : "A distibuted real-time search and analytics engine", "publish_date" : "2015-02-07","num_reviews": 20, "publisher": "oreilly" }'

curl -XPOST 'localhost:9200/iteblog_book_index/book/2' -d '{ "title": "Taming Text: How to Find, Organize, and Manipulate It", "authors": ["grant ingersoll", "thomas morton", "drew farris"], "summary" : "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization", "publish_date" : "2013-01-24", "num_reviews": 12, "publisher": "manning" }'

curl -XPOST 'localhost:9200/iteblog_book_index/book/3' -d '{ "title": "Elasticsearch in Action", "authors": ["radu gheorge", "matthew lee hinman", "roy russo"], "summary" : "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms", "publish_date" : "2015-12-03", "num_reviews": 18, "publisher": "manning" }'

curl -XPOST 'localhost:9200/iteblog_book_index/book/4' -d '{ "title": "Solr in Action", "authors": ["trey grainger", "timothy potter"], "summary" : "Comprehensive guide to implementing a scalable search engine using Apache Solr", "publish_date" : "2014-04-05", "num_reviews": 23, "publisher": "manning" }'

然后执行如下查询语句：

curl -XPOST 'localhost:9200/iteblog_book_index/book/_search?pretty' -d '

{

    "query": {

        "filtered": {

            "query" : {

                "multi_match": {

                "query": "elasticsearch",

                "fields": ["title","summary"]

                }

            },

            "filter": {

                "bool": {

                    "must": {

                        "range" : { "num_reviews": { "gte":  } }

                    },

                    "must_not": {

                        "range" : { "publish_date": { "lte": "2014-12-31" } }

                    },

                    "should": {

                        "term": { "publisher": "oreilly" }

                    }

                }

            }

        }

    },

    "_source" : ["title","summary","publisher", "num_reviews", "publish_date"]

}'

17. Function Score: Field Value Factor
　　在某些场景下，你可能想对某个特定字段设置一个因子(factor)，并通过这个因子计算某个文档的相关度(relevance score)。这是典型地基于文档(document)的重要性来抬高其相关性的方式。在下面例子中，我们想找到更受欢迎的图书(是通过图书的评论实现的)，并将其权重抬高，这里可以通过使用field_value_factor来实现

curl -XPOST 'localhost:9200/iteblog_book_index/book/_search?pretty' -d '

{

    "query": {

        "function_score": {

            "query": {

                "multi_match" : {

                    "query" : "search engine",

                    "fields": ["title", "summary"]

                }

            },

            "field_value_factor": {

                "field" : "num_reviews",

                "modifier": "log1p",

                "factor" :

            }

        }

    },

    "_source": ["title", "summary", "publish_date", "num_reviews"]

}'

Elasticsearch 常用基本查询的更多相关文章

ElasticSearch 常用的查询过滤语句
query 和 filter 的区别请看: http://www.cnblogs.com/ghj1976/p/5292740.html Filter DSL term 过滤 term主要用于精确匹配 ...
[转] ElasticSearch 常用的查询过滤语句
备忘remark https://www.cnblogs.com/ghj1976/p/5293250.html query 和 filter 的区别请看: http://www.cnblogs.co ...
elasticsearch 基础 —— Common Terms Query常用术语查询
常用术语查询该common术语查询是一个现代的替代提高了精确度和搜索结果的召回(采取禁用词进去),在不牺牲性能的禁用词. 问题查询中的每个术语都有成本.搜索"The brown fox& ...
elasticsearch GIS空间查询问题解决
在GIS行业的应用越来越广泛,GIS最常用根据区域进行空间数据查询我定义了两个方法,一起来看一下: /** * geodistance filter * 一个过滤器来过滤基于一个特定的距离从 ...
Elasticsearch(GEO)空间检索查询
Elasticsearch(GEO)空间检索查询python版本 1.Elasticsearch ES的强大就不用多说了,当你安装上插件,搭建好集群,你就拥有了一个搜索系统. 当然,ES的集群优化和查 ...
elasticsearch（四）之 elasticsearch常用的一些集群命令
目录 elasticsearch常用的一些集群命令查看集群健康状态查看集群的节点列表查看所有的索引删除索引查询索引的某个文档内容更新文档删除文档自动创建索引定时删除索引 elasti ...
Elasticsearch 常用API
1. Elasticsearch 常用API 1.1.数据输入与输出 1.1.1.Elasticsearch 文档 #在 Elasticsearch 中,术语文档有着特定的含义.它是指最顶 ...
kibana和ElasticSearch的信息查询检索
使用kibana来进行ElasticSearch的信息查询检索大家经常会听到使用ELK搭建日志管理平台.完成日志聚合检索的功能,那么这个平台到底是个什么概念,怎么搭建,怎么使用呢? ELK包括Ela ...
Elasticsearch Query DSL查询入门
本篇为学习DSL时做的笔记,适合ES新手,大佬请略过~ Query DSL又叫查询表达式,是一种非常灵活又富有表现力的查询语言,采用JSON接口的方式实现丰富的查询,并使你的查询语句更灵活.更精确.更 ...

随机推荐

2012关闭ECN
Windows Server 2012 关闭TCP ECN (2014-03-20 18:22:42) 转载▼ 标签: it 分类: windows
全栈设计模式套餐MVVM, RESTful, MVC的历史探索
众所周知, 软件开发时遵守一个规范的设计模式非常重要, 学习行业内主流的design pattern往往能够为你节省大部分时间. 根据我2年的全栈经验, 在Web应用程序领域最流行的, 并且若干年内不 ...
Sphinx速成指南
目录 1. Sphinx简介 1.1. 什么是全文检索 1.2. 介绍 1.3. Sphinx的特性 2. Sphinx安装(For MySQL) 2.1. Windows下安装 2.2. Linux ...
mysql 创建新用户并添加权限
1.添加用户 1.1 添加一个新用户: mysql>grant usage on *.* to " with grant option; 上面这种只支持mysql服务器本地登录. 1. ...
css鼠标移动到文字上怎样变化背景颜色
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/ ...
MySQL多表数据记录查询详解
在实际应用中,经常需要实现在一个查询语句中显示多张表的数据,这就是所谓的多表数据记录连接查询,简称来年将诶查询. 在具体实现连接查询操作时,首先将两个或两个以上的表按照某个条件连接起来,然后再查询到所 ...
qt 窗口控件自动调整大小
/******************************************************************** * qt 窗口控件自动调整大小 * * 在写gui的时候,希 ...
Extracting and composing robust features with denosing autoencoders 论文
这是一篇发表于2008年初的论文. 文章主要讲了利用 denosing autoencoder来学习 robust的中间特征..进上步,说明,利用这个方法,可以初始化神经网络的权值..这就相当于一种非 ...
[转载] PHP开发必看编程十大好习惯
适当抽象但是在抽象的时候,要避免不合理的抽象,有时也可能造成过渡设计,现在只需要一种螺丝刀,但你却把更多类型的螺丝刀都做出来了(而且还是瑞士军刀的样子..): 一致性团队开发中,可能每个人的编程风 ...
(转)获取android源码时repo的错误
获取android源码时repo的错误今天用repo获取android源码:../bin/repo init -u git://android.git.kernel.org/platform/man ...

Elasticsearch 常用基本查询

Elasticsearch 常用基本查询的更多相关文章

随机推荐

热门专题