Cross-type joins in Elasticsearch

http://rore.im/posts/elasticsearch-joins

December 31, 2014

When modeling data in Elasticsearch, a common question is how to design the data to capture relationships between entities, to allow at least some level of “joins”.

Elasticsearch has a good guide about data modeling. One of the options provided for expressing relationships is the parent-child model.

A parent-child relationship in Elasticsearch is a way to express a one-to-many relationship (a parent with many children). The parent and child are separate Elasticsearch types, bounded only by specifying the parent type on the child mapping, and by giving the parent ID for every child index operation (this is used for routing the child to the shard of the parent).

It’s a useful model when a parent has many children and when the child update pattern is different from that of the parent. (Since every child is a separate document, updating the child does not require re-indexing the parent).

But this model also provides an interesting (if limited) way to capture relationships between sibling types.

Lets consider the following data:

Bill has two children - Adam and Eve, and a Dog (Apple).
Bob has no children or pets (ah, freedom!).
Mary has a little newborn child called Lamb.
Jane has a boy named Xander, a cat (Buffy) and a dog (Willow).

Lets create this data in Elasticsearch.
We will have a parent type - “person”, and two child types - “children” and “pets”.
First we’ll create the mapping for the child types.

    #!/bin/bash

    export ELASTICSEARCH_ENDPOINT="http://localhost:9200"

    # Create indexes

    curl -XPUT "$ELASTICSEARCH_ENDPOINT/es-joins" -d '{
"mappings": {
"children": {
"_parent": {
"type": "person"
}
},
"pets": {
"_parent": {
"type": "person"
}
}
}
}'

Next, index all the documents - parents, children and pets.

    # Index documents
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_bulk?refresh=true" -d '
{"index":{"_index":"es-joins","_type":"person","_id":1}}
{"name":"Bill","gender":"male"}
{"index":{"_index":"es-joins","_type":"person","_id":2}}
{"name":"Bob","gender":"male"}
{"index":{"_index":"es-joins","_type":"person","_id":3}}
{"name":"Mary","gender":"female"}
{"index":{"_index":"es-joins","_type":"person","_id":4}}
{"name":"Jane","gender":"female"}
{"index":{"_index":"es-joins","_type":"children","_parent":1,"_id":1}}
{"name":"Adam","gender":"male"}
{"index":{"_index":"es-joins","_type":"children","_parent":1,"_id":2}}
{"name":"Eve","gender":"female"}
{"index":{"_index":"es-joins","_type":"children","_parent":3,"_id":3}}
{"name":"Lamb","gender":"male"}
{"index":{"_index":"es-joins","_type":"children","_parent":4,"_id":4}}
{"name":"Xander","gender":"male"}
{"index":{"_index":"es-joins","_type":"pets","_parent":1,"_id":1}}
{"name":"Apple","type":"dog"}
{"index":{"_index":"es-joins","_type":"pets","_parent":4,"_id":2}}
{"name":"Buffy","type":"cat"}
{"index":{"_index":"es-joins","_type":"pets","_parent":4,"_id":3}}
{"name":"Willow","type":"dog"}
'

Now we can do some searches on it.
The usual example will be searching a parent by its children. Lets find
all the parents that has a girl. We expect to get back only Bill.

    curl -XPOST "$ELASTICSEARCH_ENDPOINT/es-joins/person/_search?pretty" -d '
{
"query": {
"filtered": {
"filter": {
"and": [
{
"has_child": {
"type": "children",
"query": {
"term": {
"gender": "female"
}
}
}
}
]
}
}
}
}
'

We can also combine conditions on multiple child types.
Lets find parents that have a boy and a dog. This time we expect to get back both Bill and Jane.

    curl -XPOST "$ELASTICSEARCH_ENDPOINT/es-joins/person/_search?pretty" -d '
{
"query": {
"filtered": {
"filter": {
"and": [
{
"has_child": {
"type": "children",
"query": {
"term": {
"gender": "male"
}
}
}
},
{
"has_child": {
"type": "pets",
"query": {
"term": {
"type": "dog"
}
}
}
}
]
}
}
}
}
'

Another commonly used option is finding children by their parents.
But a more interesting possibility is finding children by their siblings.
Lets lookup all boys that have a dog. To do that we’re searching on the
“children” type, and doing a has_parent filter that contains a has_child
filter on the “pets” type.
This time we expect to get back the children - Adam and Xander.

    curl -XPOST "$ELASTICSEARCH_ENDPOINT/es-joins/children/_search?pretty" -d '
{
"query": {
"filtered": {
"filter": {
"and": [
{
"has_parent": {
"parent_type": "person",
"filter": {
"has_child": {
"type": "pets",
"query": {
"term": {
"type": "dog"
}
}
}
}
}
},
{
"term": {
"gender": "male"
}
}
]
}
}
}
}
'

Of course, our data model here is a bit simplified as it allows only a single parent. If we were to extend it, we would create a “family” parent type, with child types - “parents”, “children” and “pets”.

Currently, in order to get the details of the “joined” entity, another query is needed. For example, when searching “all boys that have a dog”, if we want the details of the dogs we need a second search for “all dogs with parents that have children with _id=…” (and the _ids of the children from the first search).
This will change with the new upcoming inner hits feature that will allow getting the data of the inner entities in a single query.

One should note that this method is not exactly recommended by
Elasticsearch. Because of the memory requirements and performance hit,
the official recommendation is: “Avoid using multiple parent-child joins in a single query”. So as always, test, measure and choose your modeling wisely.

[转]Cross-type joins in Elasticsearch的更多相关文章

  1. 自己写的数据交换工具——从Oracle到Elasticsearch

    先说说需求的背景,由于业务数据都在Oracle数据库中,想要对它进行数据的分析会非常非常慢,用传统的数据仓库-->数据集市这种方式,集市层表会非常大,查询的时候如果再做一些group的操作,一个 ...

  2. ElasticSearch+NLog+Elmah实现Asp.Net分布式日志管理

    本文将介绍使用NLOG.Elmah结合ElasticSearch实现分布式日志管理. 一.ElasticSearch简介 ElasticSearch是一个基于Lucene的搜索服务器.它提供了一个分布 ...

  3. Elasticsearch: Indexing SQL databases. The easy way

    Elasticsearchis a great search engine, flexible, fast and fun. So how can I get started with it? Thi ...

  4. elasticsearch插件大全

    Elasticsearch扩展性非常好,有很多官方和第三方开发的插件,下面以分词.同步.数据传输.脚本支持.站点.其它这几个类别进行划分. 分词插件 Combo Analysis Plugin (作者 ...

  5. 安装elasticsearch

    安装elasticsearch   来自:http://www.cnblogs.com/huangfox/p/3541300.html 一)安装elasticsearch 1)下载elasticsea ...

  6. ElasticSearch中文分词(IK)

    ElasticSearch常用的很受欢迎的是IK,这里稍微介绍下安装过程及测试过程.   1.ElasticSearch官方分词 自带的中文分词器很弱,可以体检下: [zsz@VS-zsz ~]$ c ...

  7. Elasticsearch和mysql数据同步(elasticsearch-jdbc)

    1.介绍 对mysql.oracle等数据库数据进行同步到ES有三种做法:一个是通过elasticsearch提供的API进行增删改查,一个就是通过中间件进行数据全量.增量的数据同步,另一个是通过收集 ...

  8. Logstash同步Oracle数据到ElasticSearch

    最近在项目上应用到了ElasticSearch和Logstash,在此主要记录了Logstash-input-jdbc同步Oracle数据库到ElasticSearch的主要步骤,本文是对环境进行简单 ...

  9. ELK( ElasticSearch+ Logstash+ Kibana)分布式日志系统部署文档

    开始在公司实施的小应用,慢慢完善之~~~~~~~~文档制作 了好作运维同事之间的前期普及.. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 软件下载地址: https://www.e ...

随机推荐

  1. Sona

    Sona Sona , Maven of the Strings . Of cause, she can play the zither. Sona can't speak but she can m ...

  2. PHP如何自定义PHP内置函数

    其实对于PHP程序员,有个纯PHP的解决方案.在php.ini里有个配置项 auto_prepend_file,可以设置一个PHP文件作为每次执行前自动加载的文件. 在这个文件里写函数,你就可以当成定 ...

  3. 查验身份证 (15 分) 一个合法的身份证号码由17位地区、日期编号和顺序编号加1位校验码组成。校验码的计算规则如下: 首先对前17位数字加权求和,权重分配为:{7,9,10,5,8,4,2,1,6,3,7,9,10,5,8,4,2};然后将计算的和对11取模得到值Z;最后按照以下关系对应Z值与校验码M的值:

    // test4.cpp : 此文件包含 "main" 函数.程序执行将在此处开始并结束.// #include "pch.h"#include <ios ...

  4. python django 访问static静态文件

    settings 文件配置: STATIC_URL = '/static/' STATICFILES_DIRS = ( os.path.join(BASE_DIR, 'static'),)PROJEC ...

  5. Win10系列:VC++媒体播放控制4

    (7)音量控制 MediaElement控件具有一个Volume属性,通过设置此属性的值可以改变视频音量的大小.接下来介绍如何实现视频的音量控制,首先打开MainPage.xaml文件,并在Grid元 ...

  6. 关于静态资源是否应该放到WEB-INF目录

    首先,css/js/html没有必要放在WEB-INF下. 最终这些会被原封不动的展现在客户端,所以访问安全根本就不会成为问题. jsp放在web-inf下,原因主要有两个 远古时代的模式会把业务逻辑 ...

  7. linux图形和命令界面切换

    一.系统不在虚拟机中的情况 使用ctrl+alt+F1~6切换到命令行界面:ctrl+alt+F7切换到图形界面 二.系统在虚拟机中的情况 Ctrl+Alt+shift+F1~6切换到命令行界面:使用 ...

  8. 从mysql读取数据写入mongo

    # coding:utf-8 # Created by qinlin.liu at 2017/3/14 import pymysql import datetime #pymongo说明文档  : h ...

  9. 运行TensorFlow出现Your CPU supports instructions that this TensorFlow binary was not compiled to use: AV

    原因: import os #在顶头位置加上 os.environ["TF_CPP_MIN_LOG_LEVEL"]='1' # '1'表示默认的显示等级,运行时显示所有信息 os. ...

  10. Cracking The Coding Interview 9.1

    //原文: // // You are given two sorted arrays, A and B, and A has a large enough buffer at the end to ...