在聚合的分组统计中我们会面临两种分组元素类型:连续型如时间,自然数等、离散型如地点、产品等。离散型数据本身就代表不同的组别,但连续型数据则需要手工按等长间隔进行切分了。下面是一个按价钱段聚合的例子:

POST /cartxns/_search
{
"size" : ,
"aggs": {
"sales_per_pricerange": {
"histogram": {
"field": "price",
"interval":
},
"aggs": {
"total sales": {
"sum": {
"field": "price"
}
}
}
}
}
}
}

在上面这个例子中我们把价钱按20000进行分段。得出0-19999,20000-39999,40000-59999 ... 价格段的度量:

  "aggregations" : {
"sales_per_pricerange" : {
"buckets" : [
{
"key" : 0.0,
"doc_count" : ,
"total sales" : {
"value" : 37000.0
}
},
{
"key" : 20000.0,
"doc_count" : ,
"total sales" : {
"value" : 95000.0
}
},
{
"key" : 40000.0,
"doc_count" : ,
"total sales" : {
"value" : 0.0
}
},
{
"key" : 60000.0,
"doc_count" : ,
"total sales" : {
"value" : 0.0
}
},
{
"key" : 80000.0,
"doc_count" : ,
"total sales" : {
"value" : 80000.0
}
}
]
}
}

在elastic4s中是这样表达的:

  val aggHist = search("cartxns").aggregations(
histogramAggregation("sales_per_price")
.field("price")
.interval().subAggregations(
sumAggregation("total_sales").field("price")
)
)
println(aggHist.show) val histResult = client.execute(aggHist).await if (histResult.isSuccess)
histResult.result.aggregations.histogram("sales_per_price").buckets
.foreach(hb => println(s"${hb.key},${hb.docCount}:${hb.sum("total_sales").value}"))
else println(s"error: ${histResult.error.reason}") .... POST:/cartxns/_search?
StringEntity({"aggs":{"sales_per_price":{"histogram":{"interval":20000.0,"field":"price"},"aggs":{"total_sales":{"sum":{"field":"price"}}}}}},Some(application/json))
0.0,:37000.0
20000.0,:95000.0
40000.0,:0.0
60000.0,:0.0
80000.0,:80000.0

下面这个按车款分组统计的就是一个离散元素的聚合统计了:

POST /cartxns/_search
{
"size" : ,
"aggs": {
"avage price per model" : {
"terms": {"field" : "make.keyword"},
"aggs": {
"average price": {
"avg": {"field": "price"}
},
"max price" : {
"max": {
"field": "price"
}
},
"min price" : {
"min": {
"field": "price"
}
} }
}
}
}

我们可以得到每一款车的平均售价、最低最高售价:

  "aggregations" : {
"avage price per model" : {
"doc_count_error_upper_bound" : ,
"sum_other_doc_count" : ,
"buckets" : [
{
"key" : "honda",
"doc_count" : ,
"max price" : {
"value" : 20000.0
},
"average price" : {
"value" : 16666.666666666668
},
"min price" : {
"value" : 10000.0
}
},
{
"key" : "ford",
"doc_count" : ,
"max price" : {
"value" : 30000.0
},
"average price" : {
"value" : 27500.0
},
"min price" : {
"value" : 25000.0
}
},
{
"key" : "toyota",
"doc_count" : ,
"max price" : {
"value" : 15000.0
},
"average price" : {
"value" : 13500.0
},
"min price" : {
"value" : 12000.0
}
},
{
"key" : "bmw",
"doc_count" : ,
"max price" : {
"value" : 80000.0
},
"average price" : {
"value" : 80000.0
},
"min price" : {
"value" : 80000.0
}
}
]
}
}

elastic4s示范如下:

  val aggDisc = search("cartxns").aggregations(
termsAgg("prices_per_model","make.keyword").subAggregations(
avgAgg("average_price","price"),
minAgg("min_price","price"),
maxAgg("max_price","price")
)
)
println(aggDisc.show)
val discResult = client.execute(aggDisc).await if (discResult.isSuccess)
discResult.result.aggregations.terms("prices_per_model").buckets
.foreach(mb =>
println(s"${mb.key},${mb.docCount}:${mb.avg("average_price").value}," +
s"${mb.min("min_price").value.getOrElse(0)}," +
s"${mb.max("max_price").value.getOrElse(0)}"))
else println(s"error: ${discResult.error.causedBy.getOrElse("unknown")}") ... POST:/cartxns/_search?
StringEntity({"aggs":{"prices_per_model":{"terms":{"field":"make.keyword"},"aggs":{"average_price":{"avg":{"field":"price"}},"min_price":{"min":{"field":"price"}},"max_price":{"max":{"field":"price"}}}}}},Some(application/json))
honda,:16666.666666666668,10000.0,20000.0
ford,:27500.0,25000.0,30000.0
toyota,:13500.0,12000.0,15000.0
bmw,:80000.0,80000.0,80000.0

date_histogram是一种按时间间隔聚合的统计方法。对于按时间趋势变化的数据分析十分有用:

POST /cartxns/_search
{
"aggs": {
"sales_per_month": {
"date_histogram": {
"field": "sold",
"calendar_interval":"1M",
"format": "yyyy-MM-dd"
}
}
}
} ... "aggregations" : {
"sales_per_month" : {
"buckets" : [
{
"key_as_string" : "2014-01-01",
"key" : ,
"doc_count" :
},
{
"key_as_string" : "2014-02-01",
"key" : ,
"doc_count" :
},
{
"key_as_string" : "2014-03-01",
"key" : ,
"doc_count" :
},
{
"key_as_string" : "2014-04-01",
"key" : ,
"doc_count" :
},
{
"key_as_string" : "2014-05-01",
"key" : ,
"doc_count" :
},
{
"key_as_string" : "2014-06-01",
"key" : ,
"doc_count" :
},
{
"key_as_string" : "2014-07-01",
"key" : ,
"doc_count" :
},
{
"key_as_string" : "2014-08-01",
"key" : ,
"doc_count" :
},
{
"key_as_string" : "2014-09-01",
"key" : ,
"doc_count" :
},
{
"key_as_string" : "2014-10-01",
"key" : ,
"doc_count" :
},
{
"key_as_string" : "2014-11-01",
"key" : ,
"doc_count" :
}
]
}
}

上面这个例子产生以月为单元的bucket。elastic4s示范:

  val aggDateHist = search("cartxns").aggregations(
dateHistogramAggregation("sales_per_month")
.field("sold")
.calendarInterval(DateHistogramInterval.Month)
.format("yyyy-MM-dd")
.minDocCount()
)
println(aggDateHist.show) val dtHistResult = client.execute(aggDateHist).await if (dtHistResult.isSuccess)
dtHistResult.result.aggregations.dateHistogram("sales_per_month").buckets
.foreach(db => println(s"${db.date},${db.docCount}"))
else println(s"error: ${dtHistResult.error.causedBy.getOrElse("unknown")}") ... POST:/cartxns/_search?
StringEntity({"aggs":{"sales_per_month":{"date_histogram":{"calendar_interval":"1M","min_doc_count":,"format":"yyyy-MM-dd","field":"sold"}}}},Some(application/json))
--,
--,
--,
--,
--,
--,
--,

在以月划分bucket后可以再进行每个月的深度聚合:

POST /cartxns/_search
{
"aggs": {
"sales_per_month": {
"date_histogram": {
"field": "sold",
"calendar_interval":"1M",
"format": "yyyy-MM-dd"
},
"aggs": {
"per_make_sum": {
"terms": {
"field": "make.keyword",
"size":
},
"aggs": {
"sum_price": {
"sum": {"field": "price"}
}
}
},
"total_sum": {
"sum": {
"field": "price"
}
}
}
}
}
}

我们可以得到每个月的销售总额、每个车款每个月的销售,如下:

"aggregations" : {
"sales_per_month" : {
"buckets" : [
{
"key_as_string" : "2014-01-01",
"key" : ,
"doc_count" : ,
"per_make_sum" : {
"doc_count_error_upper_bound" : ,
"sum_other_doc_count" : ,
"buckets" : [
{
"key" : "bmw",
"doc_count" : ,
"sum_price" : {
"value" : 80000.0
}
}
]
},
"total_sum" : {
"value" : 80000.0
}
},
{
"key_as_string" : "2014-02-01",
"key" : ,
"doc_count" : ,
"per_make_sum" : {
"doc_count_error_upper_bound" : ,
"sum_other_doc_count" : ,
"buckets" : [
{
"key" : "ford",
"doc_count" : ,
"sum_price" : {
"value" : 25000.0
}
}
]
},
"total_sum" : {
"value" : 25000.0
}
},
{
"key_as_string" : "2014-03-01",
"key" : ,
"doc_count" : ,
"per_make_sum" : {
"doc_count_error_upper_bound" : ,
"sum_other_doc_count" : ,
"buckets" : [ ]
},
"total_sum" : {
"value" : 0.0
}
},
{
"key_as_string" : "2014-04-01",
"key" : ,
"doc_count" : ,
"per_make_sum" : {
"doc_count_error_upper_bound" : ,
"sum_other_doc_count" : ,
"buckets" : [ ]
},
"total_sum" : {
"value" : 0.0
}
},
{
"key_as_string" : "2014-05-01",
"key" : ,
"doc_count" : ,
"per_make_sum" : {
"doc_count_error_upper_bound" : ,
"sum_other_doc_count" : ,
"buckets" : [
{
"key" : "ford",
"doc_count" : ,
"sum_price" : {
"value" : 30000.0
}
}
]
},
"total_sum" : {
"value" : 30000.0
}
},
{
"key_as_string" : "2014-06-01",
"key" : ,
"doc_count" : ,
"per_make_sum" : {
"doc_count_error_upper_bound" : ,
"sum_other_doc_count" : ,
"buckets" : [ ]
},
"total_sum" : {
"value" : 0.0
}
},
{
"key_as_string" : "2014-07-01",
"key" : ,
"doc_count" : ,
"per_make_sum" : {
"doc_count_error_upper_bound" : ,
"sum_other_doc_count" : ,
"buckets" : [
{
"key" : "toyota",
"doc_count" : ,
"sum_price" : {
"value" : 15000.0
}
}
]
},
"total_sum" : {
"value" : 15000.0
}
},
{
"key_as_string" : "2014-08-01",
"key" : ,
"doc_count" : ,
"per_make_sum" : {
"doc_count_error_upper_bound" : ,
"sum_other_doc_count" : ,
"buckets" : [
{
"key" : "toyota",
"doc_count" : ,
"sum_price" : {
"value" : 12000.0
}
}
]
},
"total_sum" : {
"value" : 12000.0
}
},
{
"key_as_string" : "2014-09-01",
"key" : ,
"doc_count" : ,
"per_make_sum" : {
"doc_count_error_upper_bound" : ,
"sum_other_doc_count" : ,
"buckets" : [ ]
},
"total_sum" : {
"value" : 0.0
}
},
{
"key_as_string" : "2014-10-01",
"key" : ,
"doc_count" : ,
"per_make_sum" : {
"doc_count_error_upper_bound" : ,
"sum_other_doc_count" : ,
"buckets" : [
{
"key" : "honda",
"doc_count" : ,
"sum_price" : {
"value" : 10000.0
}
}
]
},
"total_sum" : {
"value" : 10000.0
}
},
{
"key_as_string" : "2014-11-01",
"key" : ,
"doc_count" : ,
"per_make_sum" : {
"doc_count_error_upper_bound" : ,
"sum_other_doc_count" : ,
"buckets" : [
{
"key" : "honda",
"doc_count" : ,
"sum_price" : {
"value" : 40000.0
}
}
]
},
"total_sum" : {
"value" : 40000.0
}
}
]
}
}

用elastic4s可以这样写:

  val aggMonthSales= search("cartxns").aggregations(
dateHistogramAggregation("sales_per_month")
.field("sold")
.calendarInterval(DateHistogramInterval.Month)
.format("yyyy-MM-dd")
.minDocCount().subAggregations(
termsAgg("month_make","make.keyword").subAggregations(
sumAggregation("month_total_per_make").field("price")
),
sumAggregation("monthly_total").field("price")
)
) println(aggMonthSales.show) val monthSalesResult = client.execute(aggMonthSales).await if (monthSalesResult.isSuccess)
monthSalesResult.result.aggregations.dateHistogram("sales_per_month").buckets
.foreach { sb =>
println(s"${sb.date},${sb.docCount},${sb.sum("monthly_total").value}")
sb.terms("month_make").buckets
.foreach(mb =>
println(s"${mb.key},${mb.docCount},${mb.sum("month_total_per_make").value}"))
}
else println(s"error: ${monthSalesResult.error.causedBy.getOrElse("unknown")}") ... POST:/cartxns/_search?
StringEntity({"aggs":{"sales_per_month":{"date_histogram":{"calendar_interval":"1M","min_doc_count":,"format":"yyyy-MM-dd","field":"sold"},"aggs":{"month_make":{"terms":{"field":"make.keyword"},"aggs":{"month_total_per_make":{"sum":{"field":"price"}}}},"monthly_total":{"sum":{"field":"price"}}}}}},Some(application/json))
--,,80000.0
bmw,,80000.0
--,,25000.0
ford,,25000.0
--,,30000.0
ford,,30000.0
--,,15000.0
toyota,,15000.0
--,,12000.0
toyota,,12000.0
--,,10000.0
honda,,10000.0
--,,40000.0
honda,,40000.0

search(13)- elastic4s-histograms:聚合直方图的更多相关文章

  1. 13 Tensorflow API主要功能

    要想使用Tensorflow API,首先要知道它能干什么.Tensorflow具有Python.C++.Java.Go等多种语言API,其中Python的API是最简单和好用的. Tensor Tr ...

  2. TensorBoard中HISTOGRAMS和DISTRIBUTIONS图形的含义

    前言 之前我都是用TensorBoard记录训练过程中的Loss.mAP等标量,很容易就知道TensorBoard里的SCALARS(标量)(其中横纵轴的含义.Smoothing等). 最近在尝试模型 ...

  3. Elasticsearch 2.3.3 JAVA api说明文档

    原文地址:https://www.blog-china.cn/template\documentHtml\1484101683485.html 翻译作者:@青山常在人不老 加入翻译:cdcnsuper ...

  4. elasticsearch系列七:ES Java客户端-Elasticsearch Java client(ES Client 简介、Java REST Client、Java Client、Spring Data Elasticsearch)

    一.ES Client 简介 1. ES是一个服务,采用C/S结构 2. 回顾 ES的架构 3. ES支持的客户端连接方式 3.1 REST API ,端口 9200 这种连接方式对应于架构图中的RE ...

  5. 【转载】DRuid 大数据分析之查询

    转载自http://yangyangmyself.iteye.com/blog/2321759 1.Druid 查询概述     上一节完成数据导入后,接下来讲讲Druid如何查询及统计分析导入的数据 ...

  6. Elasticsearch Java client(ES Client 简介、Java REST Client、Java Client、Spring Data Elasticsearch)

    elasticsearch系列七:ES Java客户端-Elasticsearch Java client(ES Client 简介.Java REST Client.Java Client.Spri ...

  7. 微服务监控之二:Metrics+influxdb+grafana构建监控平台

    系统开发到一定的阶段,线上的机器越来越多,就需要一些监控了,除了服务器的监控,业务方面也需要一些监控服务.Metrics作为一款监控指标的度量类库,提供了许多工具帮助开发者来完成自定义的监控工作. 使 ...

  8. Elasticsearch技术解析与实战 PDF (内含目录)

    Elasticsearch技术解析与实战                                  介绍: Elasticsearch是一个强[0大0]的搜索引擎,提供了近实时的索引.搜索.分 ...

  9. ML面试1000题系列(71-80)

    本文总结ML面试常见的问题集 转载来源:https://blog.csdn.net/v_july_v/article/details/78121924 71.看你是搞视觉的,熟悉哪些CV框架,顺带聊聊 ...

随机推荐

  1. Buu刷题

    前言 希望自己能够更加的努力,希望通过多刷大赛题来提高自己的知识面.(ง •_•)ง easy_tornado 进入题目 看到render就感觉可能是模板注入的东西 hints.txt给出提示,可以看 ...

  2. Intellij IDEA 基础设置,个性化设置,好用的设置→_→

    Intellij IDEA 个性化设置 Appearance & Behavior 外观和行为 Keymap 快捷键 Editor 编辑器设置 Plugins 插件 Version Contr ...

  3. Python之学会测试,让开发更加高效(一)

      前几天,听了公司某位大佬关于编程心得的体会,其中讲到了"测试驱动开发",感觉自己的测试技能薄弱,因此,写下这篇文章,希望对测试能有个入门.这段时间,笔者也体会到了测试的价值,一 ...

  4. JavaScript type="text/template"的用法

    JavaScript type="text/template"相当于定义一个模板,如果没有使用html()方法的话,是显示不出来的,我们直接看例子(我是在tp框架的里面写的) &l ...

  5. 在Spring Boot中配置web app

    文章目录 添加依赖 配置端口 配置Context Path 配置错误页面 在程序中停止Spring Boot 配置日志级别 注册Servlet 切换嵌套服务器 在Spring Boot中配置web a ...

  6. Spring5参考指南:AspectJ注解

    文章目录 什么是AspectJ注解 启用AOP 定义Aspect 定义Pointcut 切入点指示符(PCD) 切入点组合 Advice 访问JoinPoint Advice参数 Advice参数和泛 ...

  7. HDU 6341 Let Sudoku Rotate

    #include<bits/stdc++.h> using namespace std; #define rep(i,a,b) for(int i=a;i<=b;++i) #defi ...

  8. linux之centos安装jdk以及nginx详细过程

    一.安装jdk 1:首先下载jdk到本地,然后通过git 上传到linux服务器上 2:进入目录usr,并创建目录java,将jdk的压缩文件移动到该目录下 cd /usr mkdir java mv ...

  9. 获得CCNA和CCNP及CCIE认证的必备条件和有效期绍

    CCNA认证培训介绍 CCNA认证(CCNA-思科网络安装和支持认证助理)是整个Cisco认证体系中最初级的认证,同时它也是获得CCNP认证.CCDP认证和CCSP认证的必要条件(CCIP认证.CCI ...

  10. Airtest常见的素定位不到

    一.为什么发这博客 前几天业务需要需要操作云手机进行爬取没办法只有混进airtest官方群边学习边进行开发,蛮简单的东西(可能是我之前会selenium,appuim关系吧),但是群里会有很多问题,关 ...