ElasticSearch - 嵌套映射和过滤器

Because nested objects are indexed as separate hidden documents, we can’t query them directly. Instead, we have to use the nested query to access them:

GET /my_index/blogpost/_search
{
"query": {
"bool": {
"must": [
{ "match": { "title": "eggs" }},
{
"nested": {
"path": "comments",
"query": {
"bool": {
"must": [
{ "match": { "comments.name": "john" }},
{ "match": { "comments.age": 28 }}
]
}}}}
]
}}} ①The title clause operates on the root document.
②The nested clause “steps down” into the nested comments field. It no longer has access to fields in the root document, nor fields in any other nested document.
③ The comments.name and comments.age clauses operate on the same nested  document
nested field can contain other nested fields. Similarly, a nested query can contain othernested queries. The nesting hierarchy is applied as you would expect.

Of course, a nested query could match several nested documents. Each matching nested document would have its own relevance score, but these multiple scores need to be reduced to a single score that can be applied to the root document.

By default, it averages the scores of the matching nested documents. This can be controlled by setting thescore_mode parameter to avgmaxsum, or even none (in which case the root document gets a constant score of 1.0).

GET /my_index/blogpost/_search
{
"query": {
"bool": {
"must": [
{ "match": { "title": "eggs" }},
{
"nested": {
"path": "comments",
"score_mode": "max",
"query": {
"bool": {
"must": [
{ "match": { "comments.name": "john" }},
{ "match": { "comments.age": 28 }}
]
}}}}
]
}}}
①Give the root document the _score from the best-matching nested document.

If placed inside the filter clause of a Boolean query, a nested query behaves much like anested query, except that it doesn’t accept the score_mode parameter. Because it is being used as a non-scoring query — it includes or excludes, but doesn’t score —  a score_modedoesn’t make sense since there is nothing to score.

curl -XPOST "http://localhost:9200/index-1/movie/" -d'
{
   "title": "The Matrix",
   "cast": [
      {
         "firstName": "Keanu",
         "lastName": "Reeves"
      },
      {
         "firstName": "Laurence",
         "lastName": "Fishburne"
      }
   ]
}'

Given many such movies in our index we can find all movies with an actor named "Keanu" using a search request such as:

curl -XPOST "http://localhost:9200/index-1/movie/_search" -d'
{
   "query": {
      "filtered": {
         "query": {
            "match_all": {}
         },
         "filter": {
            "term": {
               "cast.firstName": "keanu"
            }
         }
      }
   }
}'

Running the above query indeed returns The Matrix. The same is true if we try to find movies that have an actor with the first name "Keanu" and last name "Reeves":

curl -XPOST "http://localhost:9200/index-1/movie/_search" -d'
{
   "query": {
      "filtered": {
         "query": {
            "match_all": {}
         },
         "filter": {
            "bool": {
               "must": [
                  {
                     "term": {
                        "cast.firstName": "keanu"
                     }
                  },
                  {
                     "term": {
                        "cast.lastName": "reeves"
                     }
                  }
               ]
            }
         }
      }
   }
}'

Or at least so it seems. However, let's see what happens if we search for movies with an actor with "Keanu" as first name and "Fishburne" as last name.

curl -XPOST "http://localhost:9200/index-1/movie/_search" -d'
{
   "query": {
      "filtered": {
         "query": {
            "match_all": {}
         },
         "filter": {
            "bool": {
               "must": [
                  {
                     "term": {
                        "cast.firstName": "keanu"
                     }
                  },
                  {
                     "term": {
                        "cast.lastName": "fishburne"
                     }
                  }
               ]
            }
         }
      }
   }
}'

Clearly this should, at first glance, not match The Matrix as there's no such actor amongst its cast. However, ElasticSearch will return The Matrix for the above query. After all, the movie does contain an author with "Keanu" as first name and (albeit a different) actor with "Fishburne" as last name. Based on the above query it has no way of knowing that we want the two term filters to match the same unique object in the list of actors. And even if it did, the way the data is indexed it wouldn't be able to handle that requirement.

Nested mapping and filter to the rescue

Luckily ElasticSearch provides a way for us to be able to filter on multiple fields within the same objects in arrays; mapping such fields as nested. To try this out, let's create ourselves a new index with the "actors" field mapped as nested.

curl -XPUT "http://localhost:9200/index-2" -d'
{
   "mappings": {
      "movie": {
         "properties": {
            "cast": {
               "type": "nested"
            }
         }
      }
   }
}'

After indexing the same movie document into the new index we can now find movies based on multiple properties of each actor by using a nested filter. Here's how we would search for movies starring an actor named "Keanu Fishburne":

curl -XPOST "http://localhost:9200/index-2/movie/_search" -d'
{
   "query": {
      "filtered": {
         "query": {
            "match_all": {}
         },
         "filter": {
            "nested": {
               "path": "cast",
               "filter": {
                  "bool": {
                     "must": [
                        {
                           "term": {
                              "firstName": "keanu"
                           }
                        },
                        {
                           "term": {
                              "lastName": "fishburne"
                           }
                        }
                     ]
                  }
               }
            }
         }
      }
   }
}'

As you can see we've wrapped our initial bool filter in a nested filter. The nested filter contains a path property where we specify that the filter applies to the cast property of the searched document. It also contains a filter (or a query) which will be applied to each value within the nested property.

As intended, running the abobe query doesn't return The Matrix while modifying it to instead match "Reeves" as last name will make it match The Matrix. However, there's one caveat.

Including nested values in parent documents

If we go back to our very first query, filtering only on actors first names without using a nested filter, like the request below, we won't get any hits.

curl -XPOST "http://localhost:9200/index-2/movie/_search" -d'
{
   "query": {
      "filtered": {
         "query": {
            "match_all": {}
         },
         "filter": {
            "term": {
               "cast.firstName": "keanu"
            }
         }
      }
   }
}'

This happens because movie documents no longer have cast.firstName fields. Instead each element in the cast array is, internally in ElasticSearch, indexed as a separate document.

Obviously we can still search for movies based only on first names amongst the cast, by using nested filters though. Like this:

curl -XPOST "http://localhost:9200/index-2/movie/_search" -d'
{
   "query": {
      "filtered": {
         "query": {
            "match_all": {}
         },
         "filter": {
            "nested": {
               "path": "cast",
               "filter": {
                  "term": {
                     "firstName": "keanu"
                  }
               }
            }
         }
      }
   }
}'

The above request returns The Matrix. However, sometimes having to use nested filters or queries when all we want to do is filter on a single property is a bit tedious. To be able to utilize the power of nested filters for complex criterias while still being able to filter on values in arrays the same way as if we hadn't mapped such properties as nested we can modify our mappings so that the nested values will also be included in the parent document. This is done using theinclude_in_parent property, like this:

curl -XPUT "http://localhost:9200/index-3" -d'
{
   "mappings": {
      "movie": {
         "properties": {
            "cast": {
               "type": "nested",
               "include_in_parent": true
            }
         }
      }
   }
}'

In an index such as the one created with the above request we'll both be able to filter on combinations of values within the same complex objects in the actors array using nested filters while still being able to filter on single fields without using nested filters. However, we now need to carefully consider where to use, and where to not use, nested filters in our queries as a query for "Keanu Fishburne" will match The Matrix using a regular bool filter while it won't when wrapping it in a nested filter. In other words, when using include_in_parent we may get unexpected results due to queries matching documents that it shouldn't if we forget to use nested filters.

PS. For updates about new posts, sites I find useful and the occasional rant you can follow me on Twitter. You are also most welcome to subscribe to the RSS-feed.

Array Type

Read the doc on elasticsearch.org

As its name suggests, it can be an array of native types (string, int, …) but also an array of objects (the basis used for “objects” and “nested”).

Here are some valid indexing examples :

{
"Article" : [
{
"id" : 12
"title" : "An article title",
"categories" : [1,3,5,7],
"tag" : ["elasticsearch", "symfony",'Obtao'],
"author" : [
{
"firstname" : "Francois",
"surname": "francoisg",
"id" : 18
},
{
"firstname" : "Gregory",
"surname" : "gregquat"
"id" : "2"
}
]
}
},
{
"id" : 13
"title" : "A second article title",
"categories" : [1,7],
"tag" : ["elasticsearch", "symfony",'Obtao'],
"author" : [
{
"firstname" : "Gregory",
"surname" : "gregquat",
"id" : "2"
}
]
}
}

You can find different Array :

  • Categories : array of integers
  • Tags : array of strings
  • author : array of objects (inner objects or nested)

We explicitely specify this “simple” type as it can be more easy/maintainable to store a flatten value rather than the complete object.
Using a non relational structure should make you think about a specific model for your search engine :

  • To filter : If you just want to filter/search/aggregate on the textual value of an object, then flatten the value in the parent object.
  • To get the list of objects that are linked to a parent (and if you do not need to filter or index these objects), just store the list of ids and hydrate them with Doctrine and Symfony (in French for the moment).

Inner objects

The inner objects are just the JSON object association in a parent. For example, the “authors” in the above example. The mapping for this example could be :

fos_elastica:
clients:
default: { host: %elastic_host%, port: %elastic_port% }
indexes:
blog :
types:
article :
mappings:
title : ~
categories : ~
tag : ~
author :
type : object
properties :
firstname : ~
surname : ~
id :
type : integer

You can Filter or Query on these “inner objects”. For example :

query: author.firstname=Francois will return the post with the id 12 (and not the one with the id 13).

You can read more on the Elasticsearch website

Inner objects are easy to configure. As Elasticsearch documents are “schema less”, you can index them without specify any mapping.

The limitation of this method lies in the manner as ElasticSearch stores your data. Reusing the above example, here is the internal representation of our objects :

[
{
"id" : 12
"title" : An article title",
"categories" : [1,3,5,7],
"tag" : ["elasticsearch", "symfony",'Obtao'],
"author.firstname" : ["Francois","Gregory"],
"author.surname" : ["Francoisg","gregquat"],
"author.id" : [18,2]
}
{
"id" : 13
"title" : "A second article",
"categories" : [1,7],
"tag" : ["elasticsearch", "symfony",'Obtao'],
"author.firstname" : ["Gregory"],
"author.surname" : ["gregquat"],
"author.id" : [2]
}
]

The consequence is that the query :

{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"term": {
"firstname": "francois",
"surname": "gregquat"
}
}
}
}
}

author.firstname=Francois AND surname=gregquat will return the document “12″. In the case of an inner object, this query can by translated as “Who has at least one author.surname = gregquat and one author.firstname=francois”.

To fix this problem, you must use the nested.

Les nested

First important difference : nested must be specified in your mapping.

The mapping looks like an object one, only the type changes :

fos_elastica:
clients:
default: { host: %elastic_host%, port: %elastic_port% }
indexes:
blog :
types:
article :
mappings:
title : ~
categories : ~
tag : ~
author :
type : nested
properties :
firstname : ~
surname : ~
id :
type : integer

This time, the internal representation will be :

[
{
"id" : 12
"title" : "An article title",
"categories" : [1,3,5,7],
"tag" : ["elasticsearch", "symfony",'Obtao'],
"author" : [{
"firstname" : "Francois",
"surname" : "Francoisg",
"id" : 18
},
{
"firstname" : "Gregory",
"surname" : "gregquat",
"id" : 2
}]
},
{
"id" : 13
"title" : "A second article title",
"categories" : [1,7],
"tags" : ["elasticsearch", "symfony",'Obtao'],
"author" : [{
"firstname" : "Gregory",
"surname" : "gregquat",
"id" : 2
}]
}
]

This time, we keep the object structure.

Nested have their own filters which allows to filter by nested object. If we go on with our example (with the limitation of inner objects), we can write this query :

{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"nested" : {
"path" : "author",
"filter": {
"bool": {
"must": [
{
"term" : {
"author.firsname": "francois"
}
},
{
"term" : {
"author.surname": "gregquat"
}
}
]
}
}
}
}
}
}
}

hi
We can translate it as “Who has an author object whose surname is equal to ‘gregquat’ and whose firstname is ‘francois’”. This query will return no result.

There is still a problem which is penalizing when working with bug objects : when you want to change a single value of the nester, you have to reindex the whole parent document (including the nested).
If the objects are heavy, and often updated, the impact on performances can be important.

To fix this problem, you can use the parent/child associations.

Parent/Child

Parent/child associations are very similar to OneToMany relationships (one parent, several children).
The relationship remains hierarchical : an object type is only associated to one parent, and it’s impossible to create a ManyToMany relationship.

We are going to link our article to a category :

fos_elastica:
clients:
default: { host: %elastic_host%, port: %elastic_port% }
indexes:
blog :
types:
category :
mappings :
id : ~
name : ~
description : ~
article :
mappings:
title : ~
tag : ~
author : ~
_routing:
required: true
path: category
_parent:
type : "category"
identifier: "id" #optional as id is the default value
property : "category" #optional as the default value is the type value

When indexing an article, a reference to the Category will also be indexed (category.id).
So, we can index separately categories and article while keeping the references between them.

Like for nested, there are Filters and Queries that allow to search on parents or children :

  • Has Parent Filter / Has Parent Query : Filter/query on parent fields, returns children objects. In our case, we could filter articles whose parent category contains “symfony” in his description.
  • Has Child Filter / Has Child Query : Filter/query on child fields, returns the parent object. In our case, we could filter Categories for which “francoisg” has written an article.
{
"query": {
"has_child": {
"type": "article",
"query" : {
"filtered": {
"query": { "match_all": {}},
"filter" : {
"term": {"tag": "symfony"}
}
}
}
}
}
}

This query will return the Categories that have at least one article tagged with “symfony”.

The queries are here written in JSON, but are easily transformable into PHP with the Elastica library.

ElasticSearch 嵌套映射和过滤器及查询的更多相关文章

  1. Solr查询和过滤器执行顺序剖析

    一.简介 Solr的搜索主要由两个操作组成:找到与请求参数相匹配的文档:对这些文档进行排序,返回最相关的匹配文档.默认情况下,文档根据相关度进行排序.这意味着,找到匹配的文档集之后,需要另一个操作来计 ...

  2. ElasticSearch 5学习(10)——结构化查询(包括新特性)

    之前我们所有的查询都属于命令行查询,但是不利于复杂的查询,而且一般在项目开发中不使用命令行查询方式,只有在调试测试时使用简单命令行查询,但是,如果想要善用搜索,我们必须使用请求体查询(request ...

  3. Elasticsearch(入门篇)——Query DSL与查询行为

    ES提供了丰富多彩的查询接口,可以满足各种各样的查询要求.更多内容请参考:ELK修炼之道 Query DSL结构化查询 Query DSL是一个Java开源框架用于构建类型安全的SQL查询语句.采用A ...

  4. python 全栈开发,Day70(模板自定义标签和过滤器,模板继承 (extend),Django的模型层-ORM简介)

    昨日内容回顾 视图函数: request对象 request.path 请求路径 request.GET GET请求数据 QueryDict {} request.POST POST请求数据 Quer ...

  5. python3之Django内置模板标签和过滤器

    一.模板标签 内置标签: 1.autoescape 控制当前的自动转义行为,此标记采用on或者off作为参数,并确定自动转义是否在块内有效.该块以endautoescape结束标签关闭. views: ...

  6. Django内建模版标签和过滤器

    第四章列出了许多的常用内建模板标签和过滤器.然而,Django自带了更多的内建模板标签及过滤器.这章附录列出了截止到编写本书时,Django所包含的各个内建模板标签和过滤器,但是,新的标签是会被定期地 ...

  7. Django基础(2)--模板自定义标签和过滤器,模板继承 (extend),Django的模型层-ORM简介

    没整理完 昨日回顾: 视图函数: request对象 request.path 请求路径 request.GET GET请求数据 QueryDict {} request.POST POST请求数据 ...

  8. .Net Core中间件和过滤器实现错误日志记录

    1.中间件的概念 ASP.NET Core的处理流程是一个管道,中间件是组装到应用程序管道中用来处理请求和响应的组件. 每个中间件可以: 选择是否将请求传递给管道中的下一个组件. 可以在调用管道中的下 ...

  9. Elasticsearch入门教程(三):Elasticsearch索引&映射

    原文:Elasticsearch入门教程(三):Elasticsearch索引&映射 版权声明:本文为博主原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明. 本文 ...

随机推荐

  1. 向架构师进军--->如何编写软件架构文档

    如果你对项目管理.系统架构有兴趣,请加微信订阅号"softjg",加入这个PM.架构师的大家庭 问:为什么要编写软件架构文档,它的好处是什么? 答: 有文档的架构有助于不同利益相关 ...

  2. Tiny6410 LCD设置

    1.注意LCD的硬件连接 2.LCD初始化 2.1 初始化步骤 LCD时序设置 LCD芯片 2.2 引脚初始化 2.3 配置 MIFPCON 寄存器及SPCON 寄存器 2.4 配置VIDCONx 2 ...

  3. AX7: HOW TO USE TABLE METHOD EXTENSION CLASS

    To create new methods on a table without customize you should use the Table method extension class. ...

  4. Python 装饰器学习

    Python装饰器学习(九步入门)   这是在Python学习小组上介绍的内容,现学现卖.多练习是好的学习方式. 第一步:最简单的函数,准备附加额外功能 1 2 3 4 5 6 7 8 # -*- c ...

  5. Translucent Bar Android状态栏自定义颜色

    Android4.4 一个很重要的改变就是透明系统栏..新的系统栏是渐变透明的, 可以最大限度的允许屏幕显示更多内容, 也可以让系统栏和 Action Bar 融为一体, 仅仅留下最低限度的背景保护以 ...

  6. HTML5学习总结

    一.HTML5概念 HTML5并不仅仅只是做为HTML标记语言的一个最新版本,更重要的是它制定了Web应用开发的一系列标准,成为第一个将Web做为应用开发平台的HTML语言. HTML5定义了一系列新 ...

  7. HTML5 File详解

    input file控件限制上传文件类型 Html5 FileReader 对文件进行Base64编码 FileReader.readAsDataURL

  8. strcpy和memcpy的区别(转载)

    strcpy和memcpy都是标准C库函数,它们有下面的特点.strcpy提供了字符串的复制.即strcpy只用于字符串复制,并且它不仅复制字符串内容之外,还会复制字符串的结束符. 已知strcpy函 ...

  9. php工作笔记8-并发和数据类型

    1.mysql在进行数据的修改时,并发情况下: $RoundsRows=$modelRounds->where("id=$roundsID and (sendMoney + $amou ...

  10. php工作笔记7-概率算法

    a/m  b/m   c/m   d/m   10%    40%    20% a+b+c+d+... < = m array k   =  {a,b,c...} randt = rand(1 ...