这篇我们介绍一下ES的聚合功能(aggregation)。聚合是把索引数据可视化处理成可读有用数据的主要工具。聚合由bucket桶和metrics度量两部分组成。

所谓bucket就是SQL的GROUPBY,如下:

GET /cartxns/_search
{
"size" : ,
"aggs": {
"color": {
"terms": {"field": "color.keyword"}
}
}
} ... "aggregations" : {
"color" : {
"doc_count_error_upper_bound" : ,
"sum_other_doc_count" : ,
"buckets" : [
{
"key" : "red",
"doc_count" :
},
{
"key" : "blue",
"doc_count" :
},
{
"key" : "green",
"doc_count" :
}
]
}
}

上面这个例子中是以color.keyword为bucket的。elastic4是如下表现的:

val aggTerms = search("cartxns").aggregations(
termsAgg("colors","color.keyword").includeExactValues("red","green")
).sourceInclude("color","make").size()
println(aggTerms.show) val termsResult = client.execute(aggTerms).await termsResult.result.hits.hits.foreach(m => println(m.sourceAsMap))
termsResult.result.aggregations.terms("colors").buckets.foreach(b => println(s"${b.key},${b.docCount}"))

输出为:

POST:/cartxns/_search?
StringEntity({"size":,"_source":{"includes":["color","make"]},"aggs":{"colors":{"terms":{"field":"color.keyword","include":["red","green"]}}}},Some(application/json))
Map(color -> red, make -> honda)
Map(color -> red, make -> honda)
Map(color -> green, make -> ford)
red,
green,

下面的avg_price是个简单的度量:

POST /cartxns/_search
{
"aggs":{
"colors":{
"terms":{"field":"color.keyword"},
"aggs":{
"avg_price":{
"avg":{"field":"price"}
}
}
}
}
} ... "aggregations" : {
"colors" : {
"doc_count_error_upper_bound" : ,
"sum_other_doc_count" : ,
"buckets" : [
{
"key" : "red",
"doc_count" : ,
"avg_price" : {
"value" : 32500.0
}
},
{
"key" : "blue",
"doc_count" : ,
"avg_price" : {
"value" : 20000.0
}
},
{
"key" : "green",
"doc_count" : ,
"avg_price" : {
"value" : 21000.0
}
}
]
}
}

terms定义bucket。在terms下加上aggs-avg表示符合某个backet条件文件的平均定价avg_price。elastic4是如下表达的:

  val aggTermsAvg = search("cartxns").aggregations(
termsAgg("colors","color.keyword").subAggregations(
avgAgg("avg_price","price")
)
).sourceInclude("color","make").size()
println(aggTermsAvg.show) val avgResult = client.execute(aggTermsAvg).await avgResult.result.hits.hits.foreach(m => println(m.sourceAsMap))
avgResult.result.aggregations.terms("colors").buckets
.foreach(b => println(s"${b.key},${b.docCount},${b.avg("avg_price").value}")) ... POST:/cartxns/_search?
StringEntity({"size":,"_source":{"includes":["color","make"]},"aggs":{"colors":{"terms":{"field":"color.keyword"},"aggs":{"avg_price":{"avg":{"field":"price"}}}}}},Some(application/json))
Map(color -> red, make -> honda)
Map(color -> red, make -> honda)
Map(color -> green, make -> ford)
red,,32500.0
blue,,20000.0
green,,21000.0

然后,我们可以在bucket里再增加bucket,如下:

POST /cartxns/_search
{
"aggs":{
"colors":{
"terms":{"field":"color.keyword"},
"aggs":{
"avg_price":{"avg":{"field":"price"}},
"makes":{"terms":{"field":"make.keyword"}}
}
}
}
} ... "aggregations" : {
"colors" : {
"doc_count_error_upper_bound" : ,
"sum_other_doc_count" : ,
"buckets" : [
{
"key" : "red",
"doc_count" : ,
"makes" : {
"doc_count_error_upper_bound" : ,
"sum_other_doc_count" : ,
"buckets" : [
{
"key" : "honda",
"doc_count" :
},
{
"key" : "bmw",
"doc_count" :
}
]
},
"avg_price" : {
"value" : 32500.0
}
},
{
"key" : "blue",
"doc_count" : ,
"makes" : {
"doc_count_error_upper_bound" : ,
"sum_other_doc_count" : ,
"buckets" : [
{
"key" : "ford",
"doc_count" :
},
{
"key" : "toyota",
"doc_count" :
}
]
},
"avg_price" : {
"value" : 20000.0
}
},
{
"key" : "green",
"doc_count" : ,
"makes" : {
"doc_count_error_upper_bound" : ,
"sum_other_doc_count" : ,
"buckets" : [
{
"key" : "ford",
"doc_count" :
},
{
"key" : "toyota",
"doc_count" :
}
]
},
"avg_price" : {
"value" : 21000.0
}
}
]
}
}

elastic4示范:

  val aggTAvgT = search("cartxns").aggregations(
termsAgg("colors","color.keyword").subAggregations(
avgAgg("avg_price","price"),
termsAgg("makes","make.keyword")
)
).size()
println(aggTAvgT.show) val avgTTResult = client.execute(aggTAvgT).await avgTTResult.result.hits.hits.foreach(m => println(m.sourceAsMap))
avgTTResult.result.aggregations.terms("colors").buckets
.foreach { cb =>
println(s"${cb.key},${cb.docCount},${cb.avg("avg_price").value}")
cb.terms("makes").buckets.foreach(mb => println(s"${mb.key},${mb.docCount}"))
} ... POST:/cartxns/_search?
StringEntity({"size":,"aggs":{"colors":{"terms":{"field":"color.keyword"},"aggs":{"avg_price":{"avg":{"field":"price"}},"makes":{"terms":{"field":"make.keyword"}}}}}},Some(application/json))
Map(price -> , color -> red, make -> honda, sold -> --)
Map(price -> , color -> red, make -> honda, sold -> --)
Map(price -> , color -> green, make -> ford, sold -> --)
red,,32500.0
honda,
bmw,
blue,,20000.0
ford,
toyota,
green,,21000.0
ford,
toyota,

最后,我们再在最内层的bucket增加min,max两个metrics:

POST /cartxns/_search
{
"size":,
"aggs":{
"colors":{
"terms":{"field":"color.keyword"},
"aggs":{
"avg_price":{"avg":{"field":"price"}},
"makes":{"terms":{"field":"make.keyword"},
"aggs":{
"max_price":{"max":{"field":"price"}},
"min_price":{"min":{"field":"price"}}
}
}
}
}
}
} ... "aggregations" : {
"colors" : {
"doc_count_error_upper_bound" : ,
"sum_other_doc_count" : ,
"buckets" : [
{
"key" : "red",
"doc_count" : ,
"makes" : {
"doc_count_error_upper_bound" : ,
"sum_other_doc_count" : ,
"buckets" : [
{
"key" : "honda",
"doc_count" : ,
"max_price" : {
"value" : 20000.0
},
"min_price" : {
"value" : 10000.0
}
},
{
"key" : "bmw",
"doc_count" : ,
"max_price" : {
"value" : 80000.0
},
"min_price" : {
"value" : 80000.0
}
}
]
},
"avg_price" : {
"value" : 32500.0
}
},
{
"key" : "blue",
"doc_count" : ,
"makes" : {
"doc_count_error_upper_bound" : ,
"sum_other_doc_count" : ,
"buckets" : [
{
"key" : "ford",
"doc_count" : ,
"max_price" : {
"value" : 25000.0
},
"min_price" : {
"value" : 25000.0
}
},
{
"key" : "toyota",
"doc_count" : ,
"max_price" : {
"value" : 15000.0
},
"min_price" : {
"value" : 15000.0
}
}
]
},
"avg_price" : {
"value" : 20000.0
}
},
{
"key" : "green",
"doc_count" : ,
"makes" : {
"doc_count_error_upper_bound" : ,
"sum_other_doc_count" : ,
"buckets" : [
{
"key" : "ford",
"doc_count" : ,
"max_price" : {
"value" : 30000.0
},
"min_price" : {
"value" : 30000.0
}
},
{
"key" : "toyota",
"doc_count" : ,
"max_price" : {
"value" : 12000.0
},
"min_price" : {
"value" : 12000.0
}
}
]
},
"avg_price" : {
"value" : 21000.0
}
}
]
}
}

elastic4示范:

  val aggTAvgTMM = search("cartxns").aggregations(
termsAgg("colors","color.keyword").subAggregations(
avgAgg("avg_price","price"),
termsAgg("makes","make.keyword").subAggregations(
maxAgg("max_price","price"),
minAgg("min_price","price")
)
)
).size()
println(aggTAvgTMM.show) val avgTTMMResult = client.execute(aggTAvgTMM).await avgTTMMResult.result.hits.hits.foreach(m => println(m.sourceAsMap))
avgTTMMResult.result.aggregations.terms("colors").buckets
.foreach { cb =>
println(s"${cb.key},${cb.docCount},${cb.avg("avg_price").value}")
cb.terms("makes").buckets.foreach { mb =>
println(s"${mb.key},${mb.docCount},${mb.avg("min_price").value},${mb.avg("max_price").value}")
}
} ... POST:/cartxns/_search?
StringEntity({"size":,"aggs":{"colors":{"terms":{"field":"color.keyword"},"aggs":{"avg_price":{"avg":{"field":"price"}},"makes":{"terms":{"field":"make.keyword"},"aggs":{"max_price":{"max":{"field":"price"}},"min_price":{"min":{"field":"price"}}}}}}}},Some(application/json))
Map(price -> , color -> red, make -> honda, sold -> --)
Map(price -> , color -> red, make -> honda, sold -> --)
Map(price -> , color -> green, make -> ford, sold -> --)
red,,32500.0
honda,,10000.0,20000.0
bmw,,80000.0,80000.0
blue,,20000.0
ford,,25000.0,25000.0
toyota,,15000.0,15000.0
green,,21000.0
ford,,30000.0,30000.0
toyota,,12000.0,12000.0

search(12)- elastic4s-聚合=桶+度量的更多相关文章

  1. elasticsearch聚合--桶(Buckets)和指标(Metrics)的概念

    写在前面的话:读书破万卷,编码如有神--------------------------------------------------------------------主要内容包括: 聚合的两个核 ...

  2. 第六章:Django 综合篇 - 12:聚合内容 RSS/Atom

    Django提供了一个高层次的聚合内容框架,让我们创建RSS/Atom变得简单,你需要做的只是编写一个简单的Python类. 一.范例 要创建一个feed,只需要编写一个Feed类,然后设置一条指向F ...

  3. 010-elasticsearch5.4.3【四】-聚合操作【一】-度量聚合【metrics】-min、max、sum、avg、count

    一.概述 度量类型聚合主要针对的number类型的数据,需要ES做比较多的计算工作 参考向导:地址 import org.elasticsearch.search.aggregations.Aggre ...

  4. Elastic Stack 笔记(七)Elasticsearch5.6 聚合分析

    博客地址:http://www.moonxy.com 一.前言 Elasticsearch 是一个分布式的全文搜索引擎,索引和搜索是 Elasticsarch 的基本功能.同时,Elasticsear ...

  5. 翻译 | Placing Search in Context The Concept Revisited

    翻译 | Placing Search in Context The Concept Revisited 原文 摘要 [1] Keyword-based search engines are in w ...

  6. Hive 文件格式 & Hive操作(外部表、内部表、区、桶、视图、索引、join用法、内置操作符与函数、复合类型、用户自定义函数UDF、查询优化和权限控制)

    本博文的主要内容如下: Hive文件存储格式 Hive 操作之表操作:创建外.内部表 Hive操作之表操作:表查询 Hive操作之表操作:数据加载 Hive操作之表操作:插入单表.插入多表 Hive语 ...

  7. 031 Spring Data Elasticsearch学习笔记---重点掌握第5节高级查询和第6节聚合部分

    Elasticsearch提供的Java客户端有一些不太方便的地方: 很多地方需要拼接Json字符串,在java中拼接字符串有多恐怖你应该懂的 需要自己把对象序列化为json存储 查询到结果也需要自己 ...

  8. ElasticSearch 2 (37) - 信息聚合系列之内存与延时

    ElasticSearch 2 (37) - 信息聚合系列之内存与延时 摘要 控制内存使用与延时 版本 elasticsearch版本: elasticsearch-2.x 内容 Fielddata ...

  9. ElasticSearch 聚合函数

    一.简单聚合 桶 :简单来说就是满足特定条件的文档的集合. 指标:大多数 指标 是简单的数学运算(例如最小值.平均值.最大值,还有汇总),这些是通过文档的值来计算. 桶能让我们划分文档到有意义的集合, ...

随机推荐

  1. stand up meeting 11/20/2015

    3组员 今日工作 工作耗时/h 明日计划 计划耗时/h 冯晓云 将输出string里的翻译合理取分为动名词等各种词性,按约定格式返回,按热度排列,但每一个词性下的解释仍然是由“$$”分词:对于查询词为 ...

  2. jdbctemplate打印sql

    在logback.xml里加入如下配置即可: <include resource="org/springframework/boot/logging/logback/base.xml& ...

  3. self不明白什么意思,我来帮助你了解self的含义

    先看下面这段代码 # 用函数模仿类def dog(name, gender): def jiao(dog1): print('%s汪汪叫' % dog1["name"]) def ...

  4. 文件包含漏洞(pikachu)

    文件包含漏洞 在web后台开发中,程序员往往为了提高效率以及让代码看起来更加简洁,会使用'包含'函数功能,比如把一系列功能函数都写进function.php中,之后当某个文件需要调用的时候,就直接在文 ...

  5. Component Object Model (COM) 是什么?

    本文主要介绍 COM 的基础知识,倾向于理论性的理解,面向初学者,浅尝辄止. 1. COM 是什么: COM 的英文全称是,Component Object Model,中文译为,组件对象模型.它官方 ...

  6. 计算5的n次幂html代码

    <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8&quo ...

  7. 简要理解CommonJS规范

    写在前面: 一个文件就是一个模块. 另外本文中的示例代码需要在node.js环境中方可正常运行,否则将出现错误.事实上ES6已经出现了模块规范,如果使用ES6的模块规范是无需node.js环境的.因此 ...

  8. 数学--数论---P4718 Pollard-Rho算法 大数分解

    P4718 [模板]Pollard-Rho算法 题目描述 MillerRabin算法是一种高效的质数判断方法.虽然是一种不确定的质数判断法,但是在选择多种底数的情况下,正确率是可以接受的.Pollar ...

  9. centos下配置LNMP环境(源码安装)

    准备工作,安装依赖库 yum -y install gcc automake autoconf libtool make gcc-c++ glibc libxslt-devel libjpeg lib ...

  10. css属性、样式、边框、选择器

    CSS 层叠样式表 (Cascading Style Sheets,缩写为 CSS),是一种 样式表 语言, 用来描述 HTML或 XML(包括如 SVG.MathML.XHTML 之类的 XML 分 ...