search（13）- elastic4s-histograms：聚合直方图

在聚合的分组统计中我们会面临两种分组元素类型：连续型如时间，自然数等、离散型如地点、产品等。离散型数据本身就代表不同的组别，但连续型数据则需要手工按等长间隔进行切分了。下面是一个按价钱段聚合的例子：

POST /cartxns/_search

{

  "size" : ,

  "aggs": {

    "sales_per_pricerange": {

      "histogram": {

        "field": "price",

        "interval":

      },

      "aggs": {

        "total sales": {

          "sum": {

            "field": "price"

          }

        }

      }

    }

  }

 }

}

在上面这个例子中我们把价钱按20000进行分段。得出0-19999,20000-39999,40000-59999 ... 价格段的度量：

  "aggregations" : {

    "sales_per_pricerange" : {

      "buckets" : [

        {

          "key" : 0.0,

          "doc_count" : ,

          "total sales" : {

            "value" : 37000.0

          }

        },

        {

          "key" : 20000.0,

          "doc_count" : ,

          "total sales" : {

            "value" : 95000.0

          }

        },

        {

          "key" : 40000.0,

          "doc_count" : ,

          "total sales" : {

            "value" : 0.0

          }

        },

        {

          "key" : 60000.0,

          "doc_count" : ,

          "total sales" : {

            "value" : 0.0

          }

        },

        {

          "key" : 80000.0,

          "doc_count" : ,

          "total sales" : {

            "value" : 80000.0

          }

        }

      ]

    }

  }

在elastic4s中是这样表达的：

  val aggHist = search("cartxns").aggregations(

    histogramAggregation("sales_per_price")

      .field("price")

      .interval().subAggregations(

      sumAggregation("total_sales").field("price")

    )

  )

  println(aggHist.show)

  val histResult = client.execute(aggHist).await

  if (histResult.isSuccess)

    histResult.result.aggregations.histogram("sales_per_price").buckets

        .foreach(hb => println(s"${hb.key},${hb.docCount}:${hb.sum("total_sales").value}"))

  else println(s"error: ${histResult.error.reason}")

....

POST:/cartxns/_search?

StringEntity({"aggs":{"sales_per_price":{"histogram":{"interval":20000.0,"field":"price"},"aggs":{"total_sales":{"sum":{"field":"price"}}}}}},Some(application/json))

0.0,:37000.0

20000.0,:95000.0

40000.0,:0.0

60000.0,:0.0

80000.0,:80000.0

下面这个按车款分组统计的就是一个离散元素的聚合统计了：

POST /cartxns/_search

{

  "size" : ,

  "aggs": {

    "avage price per model" : {

        "terms": {"field" : "make.keyword"},

        "aggs": {

          "average price": {

            "avg": {"field": "price"}

          },

          "max price" : {

            "max": {

              "field": "price"

            }

          },

          "min price" : {

            "min": {

              "field": "price"

            }

          }

        }

     }

  }

}

我们可以得到每一款车的平均售价、最低最高售价：

  "aggregations" : {

    "avage price per model" : {

      "doc_count_error_upper_bound" : ,

      "sum_other_doc_count" : ,

      "buckets" : [

        {

          "key" : "honda",

          "doc_count" : ,

          "max price" : {

            "value" : 20000.0

          },

          "average price" : {

            "value" : 16666.666666666668

          },

          "min price" : {

            "value" : 10000.0

          }

        },

        {

          "key" : "ford",

          "doc_count" : ,

          "max price" : {

            "value" : 30000.0

          },

          "average price" : {

            "value" : 27500.0

          },

          "min price" : {

            "value" : 25000.0

          }

        },

        {

          "key" : "toyota",

          "doc_count" : ,

          "max price" : {

            "value" : 15000.0

          },

          "average price" : {

            "value" : 13500.0

          },

          "min price" : {

            "value" : 12000.0

          }

        },

        {

          "key" : "bmw",

          "doc_count" : ,

          "max price" : {

            "value" : 80000.0

          },

          "average price" : {

            "value" : 80000.0

          },

          "min price" : {

            "value" : 80000.0

          }

        }

      ]

    }

  }

elastic4s示范如下：

  val aggDisc = search("cartxns").aggregations(

    termsAgg("prices_per_model","make.keyword").subAggregations(

      avgAgg("average_price","price"),

      minAgg("min_price","price"),

      maxAgg("max_price","price")

    )

  )

  println(aggDisc.show)

  val discResult = client.execute(aggDisc).await

  if (discResult.isSuccess)

    discResult.result.aggregations.terms("prices_per_model").buckets

      .foreach(mb =>

        println(s"${mb.key},${mb.docCount}:${mb.avg("average_price").value}," +

          s"${mb.min("min_price").value.getOrElse(0)}," +

          s"${mb.max("max_price").value.getOrElse(0)}"))

  else println(s"error: ${discResult.error.causedBy.getOrElse("unknown")}")

...

POST:/cartxns/_search?

StringEntity({"aggs":{"prices_per_model":{"terms":{"field":"make.keyword"},"aggs":{"average_price":{"avg":{"field":"price"}},"min_price":{"min":{"field":"price"}},"max_price":{"max":{"field":"price"}}}}}},Some(application/json))

honda,:16666.666666666668,10000.0,20000.0

ford,:27500.0,25000.0,30000.0

toyota,:13500.0,12000.0,15000.0

bmw,:80000.0,80000.0,80000.0

date_histogram是一种按时间间隔聚合的统计方法。对于按时间趋势变化的数据分析十分有用：

POST /cartxns/_search

{

   "aggs": {

     "sales_per_month": {

       "date_histogram": {

         "field": "sold",

         "calendar_interval":"1M",

         "format": "yyyy-MM-dd"

       }

     }

   }

}

...

  "aggregations" : {

    "sales_per_month" : {

      "buckets" : [

        {

          "key_as_string" : "2014-01-01",

          "key" : ,

          "doc_count" :

        },

        {

          "key_as_string" : "2014-02-01",

          "key" : ,

          "doc_count" :

        },

        {

          "key_as_string" : "2014-03-01",

          "key" : ,

          "doc_count" :

        },

        {

          "key_as_string" : "2014-04-01",

          "key" : ,

          "doc_count" :

        },

        {

          "key_as_string" : "2014-05-01",

          "key" : ,

          "doc_count" :

        },

        {

          "key_as_string" : "2014-06-01",

          "key" : ,

          "doc_count" :

        },

        {

          "key_as_string" : "2014-07-01",

          "key" : ,

          "doc_count" :

        },

        {

          "key_as_string" : "2014-08-01",

          "key" : ,

          "doc_count" :

        },

        {

          "key_as_string" : "2014-09-01",

          "key" : ,

          "doc_count" :

        },

        {

          "key_as_string" : "2014-10-01",

          "key" : ,

          "doc_count" :

        },

        {

          "key_as_string" : "2014-11-01",

          "key" : ,

          "doc_count" :

        }

      ]

    }

  }

上面这个例子产生以月为单元的bucket。elastic4s示范：

  val aggDateHist = search("cartxns").aggregations(

    dateHistogramAggregation("sales_per_month")

      .field("sold")

      .calendarInterval(DateHistogramInterval.Month)

      .format("yyyy-MM-dd")

      .minDocCount()

  )

  println(aggDateHist.show)

  val dtHistResult = client.execute(aggDateHist).await

  if (dtHistResult.isSuccess)

    dtHistResult.result.aggregations.dateHistogram("sales_per_month").buckets

        .foreach(db => println(s"${db.date},${db.docCount}"))

  else println(s"error: ${dtHistResult.error.causedBy.getOrElse("unknown")}")

...

POST:/cartxns/_search?

StringEntity({"aggs":{"sales_per_month":{"date_histogram":{"calendar_interval":"1M","min_doc_count":,"format":"yyyy-MM-dd","field":"sold"}}}},Some(application/json))

--,

--,

--,

--,

--,

--,

--,

在以月划分bucket后可以再进行每个月的深度聚合：

POST /cartxns/_search

{

   "aggs": {

     "sales_per_month": {

       "date_histogram": {

         "field": "sold",

         "calendar_interval":"1M",

         "format": "yyyy-MM-dd"

       },

       "aggs": {

         "per_make_sum": {

           "terms": {

             "field": "make.keyword",

             "size":

           },

           "aggs": {

             "sum_price": {

               "sum": {"field": "price"}

             }

           }

         },

         "total_sum": {

           "sum": {

             "field": "price"

           }

         }

       }

     }

   }

}

我们可以得到每个月的销售总额、每个车款每个月的销售，如下：

"aggregations" : {

    "sales_per_month" : {

      "buckets" : [

        {

          "key_as_string" : "2014-01-01",

          "key" : ,

          "doc_count" : ,

          "per_make_sum" : {

            "doc_count_error_upper_bound" : ,

            "sum_other_doc_count" : ,

            "buckets" : [

              {

                "key" : "bmw",

                "doc_count" : ,

                "sum_price" : {

                  "value" : 80000.0

                }

              }

            ]

          },

          "total_sum" : {

            "value" : 80000.0

          }

        },

        {

          "key_as_string" : "2014-02-01",

          "key" : ,

          "doc_count" : ,

          "per_make_sum" : {

            "doc_count_error_upper_bound" : ,

            "sum_other_doc_count" : ,

            "buckets" : [

              {

                "key" : "ford",

                "doc_count" : ,

                "sum_price" : {

                  "value" : 25000.0

                }

              }

            ]

          },

          "total_sum" : {

            "value" : 25000.0

          }

        },

        {

          "key_as_string" : "2014-03-01",

          "key" : ,

          "doc_count" : ,

          "per_make_sum" : {

            "doc_count_error_upper_bound" : ,

            "sum_other_doc_count" : ,

            "buckets" : [ ]

          },

          "total_sum" : {

            "value" : 0.0

          }

        },

        {

          "key_as_string" : "2014-04-01",

          "key" : ,

          "doc_count" : ,

          "per_make_sum" : {

            "doc_count_error_upper_bound" : ,

            "sum_other_doc_count" : ,

            "buckets" : [ ]

          },

          "total_sum" : {

            "value" : 0.0

          }

        },

        {

          "key_as_string" : "2014-05-01",

          "key" : ,

          "doc_count" : ,

          "per_make_sum" : {

            "doc_count_error_upper_bound" : ,

            "sum_other_doc_count" : ,

            "buckets" : [

              {

                "key" : "ford",

                "doc_count" : ,

                "sum_price" : {

                  "value" : 30000.0

                }

              }

            ]

          },

          "total_sum" : {

            "value" : 30000.0

          }

        },

        {

          "key_as_string" : "2014-06-01",

          "key" : ,

          "doc_count" : ,

          "per_make_sum" : {

            "doc_count_error_upper_bound" : ,

            "sum_other_doc_count" : ,

            "buckets" : [ ]

          },

          "total_sum" : {

            "value" : 0.0

          }

        },

        {

          "key_as_string" : "2014-07-01",

          "key" : ,

          "doc_count" : ,

          "per_make_sum" : {

            "doc_count_error_upper_bound" : ,

            "sum_other_doc_count" : ,

            "buckets" : [

              {

                "key" : "toyota",

                "doc_count" : ,

                "sum_price" : {

                  "value" : 15000.0

                }

              }

            ]

          },

          "total_sum" : {

            "value" : 15000.0

          }

        },

        {

          "key_as_string" : "2014-08-01",

          "key" : ,

          "doc_count" : ,

          "per_make_sum" : {

            "doc_count_error_upper_bound" : ,

            "sum_other_doc_count" : ,

            "buckets" : [

              {

                "key" : "toyota",

                "doc_count" : ,

                "sum_price" : {

                  "value" : 12000.0

                }

              }

            ]

          },

          "total_sum" : {

            "value" : 12000.0

          }

        },

        {

          "key_as_string" : "2014-09-01",

          "key" : ,

          "doc_count" : ,

          "per_make_sum" : {

            "doc_count_error_upper_bound" : ,

            "sum_other_doc_count" : ,

            "buckets" : [ ]

          },

          "total_sum" : {

            "value" : 0.0

          }

        },

        {

          "key_as_string" : "2014-10-01",

          "key" : ,

          "doc_count" : ,

          "per_make_sum" : {

            "doc_count_error_upper_bound" : ,

            "sum_other_doc_count" : ,

            "buckets" : [

              {

                "key" : "honda",

                "doc_count" : ,

                "sum_price" : {

                  "value" : 10000.0

                }

              }

            ]

          },

          "total_sum" : {

            "value" : 10000.0

          }

        },

        {

          "key_as_string" : "2014-11-01",

          "key" : ,

          "doc_count" : ,

          "per_make_sum" : {

            "doc_count_error_upper_bound" : ,

            "sum_other_doc_count" : ,

            "buckets" : [

              {

                "key" : "honda",

                "doc_count" : ,

                "sum_price" : {

                  "value" : 40000.0

                }

              }

            ]

          },

          "total_sum" : {

            "value" : 40000.0

          }

        }

      ]

    }

  }

用elastic4s可以这样写：

  val aggMonthSales= search("cartxns").aggregations(

    dateHistogramAggregation("sales_per_month")

      .field("sold")

      .calendarInterval(DateHistogramInterval.Month)

      .format("yyyy-MM-dd")

      .minDocCount().subAggregations(

        termsAgg("month_make","make.keyword").subAggregations(

        sumAggregation("month_total_per_make").field("price")

      ),

      sumAggregation("monthly_total").field("price")

     )

   )

  println(aggMonthSales.show)

  val monthSalesResult = client.execute(aggMonthSales).await

  if (monthSalesResult.isSuccess)

     monthSalesResult.result.aggregations.dateHistogram("sales_per_month").buckets

       .foreach { sb =>

       println(s"${sb.date},${sb.docCount},${sb.sum("monthly_total").value}")

       sb.terms("month_make").buckets

        .foreach(mb =>

        println(s"${mb.key},${mb.docCount},${mb.sum("month_total_per_make").value}"))

     }

  else println(s"error: ${monthSalesResult.error.causedBy.getOrElse("unknown")}")

...

POST:/cartxns/_search?

StringEntity({"aggs":{"sales_per_month":{"date_histogram":{"calendar_interval":"1M","min_doc_count":,"format":"yyyy-MM-dd","field":"sold"},"aggs":{"month_make":{"terms":{"field":"make.keyword"},"aggs":{"month_total_per_make":{"sum":{"field":"price"}}}},"monthly_total":{"sum":{"field":"price"}}}}}},Some(application/json))

--,,80000.0

bmw,,80000.0

--,,25000.0

ford,,25000.0

--,,30000.0

ford,,30000.0

--,,15000.0

toyota,,15000.0

--,,12000.0

toyota,,12000.0

--,,10000.0

honda,,10000.0

--,,40000.0

honda,,40000.0

search（13）- elastic4s-histograms：聚合直方图的更多相关文章

13 Tensorflow API主要功能
要想使用Tensorflow API,首先要知道它能干什么.Tensorflow具有Python.C++.Java.Go等多种语言API,其中Python的API是最简单和好用的. Tensor Tr ...
TensorBoard中HISTOGRAMS和DISTRIBUTIONS图形的含义
前言之前我都是用TensorBoard记录训练过程中的Loss.mAP等标量,很容易就知道TensorBoard里的SCALARS(标量)(其中横纵轴的含义.Smoothing等). 最近在尝试模型 ...
Elasticsearch 2.3.3 JAVA api说明文档
原文地址:https://www.blog-china.cn/template\documentHtml\1484101683485.html 翻译作者:@青山常在人不老加入翻译:cdcnsuper ...
elasticsearch系列七：ES Java客户端-Elasticsearch Java client（ES Client 简介、Java REST Client、Java Client、Spring Data Elasticsearch）
一.ES Client 简介 1. ES是一个服务,采用C/S结构 2. 回顾 ES的架构 3. ES支持的客户端连接方式 3.1 REST API ,端口 9200 这种连接方式对应于架构图中的RE ...
【转载】DRuid 大数据分析之查询
转载自http://yangyangmyself.iteye.com/blog/2321759 1.Druid 查询概述上一节完成数据导入后,接下来讲讲Druid如何查询及统计分析导入的数据 ...
Elasticsearch Java client（ES Client 简介、Java REST Client、Java Client、Spring Data Elasticsearch）
elasticsearch系列七:ES Java客户端-Elasticsearch Java client(ES Client 简介.Java REST Client.Java Client.Spri ...
微服务监控之二：Metrics+influxdb+grafana构建监控平台
系统开发到一定的阶段,线上的机器越来越多,就需要一些监控了,除了服务器的监控,业务方面也需要一些监控服务.Metrics作为一款监控指标的度量类库,提供了许多工具帮助开发者来完成自定义的监控工作. 使 ...
Elasticsearch技术解析与实战 PDF （内含目录）
Elasticsearch技术解析与实战介绍: Elasticsearch是一个强[0大0]的搜索引擎,提供了近实时的索引.搜索.分 ...
ML面试1000题系列（71-80）
本文总结ML面试常见的问题集转载来源:https://blog.csdn.net/v_july_v/article/details/78121924 71.看你是搞视觉的,熟悉哪些CV框架,顺带聊聊 ...

随机推荐

C++基础学习笔记五：重载之运算符重载
C++基础学习笔记五:重载之运算符重载什么是运算符重载用同一个运算符完成不同的功能即同一个运算符可以有不同的功能的方法叫做运算符重载.运算符重载是静态多态性的体现. 运算符重载的规则重载公式 ...
Mysql基础练习--实例
修改字段名:alter table 表名 change 旧字段名新字段名新数据类型;--- 主键 ------------------------------------------------- ...
qa问答机器人pysparnn问题的召回
""" 构造召回的模型 """ from sklearn.feature_extraction.text import TfidfVecto ...
centos 部署 vue项目
安装Nodejs 下载安装包,可选择其他版本 node-v10.16.0-linux-x64.tar.xz 将下载文件上传至linux服务器并解压 tar -xvf node-v10.16.0-lin ...
Django编写自定义manage.py 命令
官网文档地址:编写自定义 django-admin 命令金句: 你所浪费的今天,正是昨天死的人所期待的明天. 开篇话: python manage.py <command> 的命令我们用 ...
解析一下阿里出品的泰山版 Java 开发手册
说起华山,我就想起岳不群,不,令狐冲:说起泰山,我就想起司马迁,他的那句名言"人总有一死,或重于泰山,或轻于鸿毛",真的发人深省啊.这就意味着,阿里出品的泰山版 Java 开发手册 ...
web.config 301
<?xml version="1.0" encoding="UTF-8"?> <configuration> <system.we ...
Bat 脚本删除某一行
findstr /v /i /c:"kiwi" /c:"oranges" myfile.txt >newfile.txt
ubuntu（物理机）连接ARM开发板
非虚拟机 ubuntu下连接开发板首先安装超级终端minicom sudo apt-get install minicom 安装完minicom以后,需要将开发板和电脑进行物理连接.需要使用一条网线 ...
使用nodejs + wecharty打造你的个人微信机器人
开源地址:https://github.com/isnl/wechat-robot 注: 从2017年6月下旬开始,使用基于web版微信接入方案存在大概率的被限制登陆的可能性. 主要表现为:无法登陆W ...

search（13）- elastic4s-histograms：聚合直方图

search（13）- elastic4s-histograms：聚合直方图的更多相关文章

随机推荐

热门专题