Hive函数:GROUPING SETS,GROUPING__ID,CUBE,ROLLUP
参考:lxw大数据田地:http://lxw1234.com/archives/2015/04/193.htm
数据准备:
CREATE EXTERNAL TABLE test_data (
month STRING,
day STRING,
cookieid STRING
) ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
stored as textfile location '/user/jc_rc_ftp/test_data'; select * from test_data l;
+----------+-------------+-------------+--+
| l.month | l.day | l.cookieid |
+----------+-------------+-------------+--+
| 2015-03 | 2015-03-10 | cookie1 |
| 2015-03 | 2015-03-10 | cookie5 |
| 2015-03 | 2015-03-12 | cookie7 |
| 2015-04 | 2015-04-12 | cookie3 |
| 2015-04 | 2015-04-13 | cookie2 |
| 2015-04 | 2015-04-13 | cookie4 |
| 2015-04 | 2015-04-16 | cookie4 |
| 2015-03 | 2015-03-10 | cookie2 |
| 2015-03 | 2015-03-10 | cookie3 |
| 2015-04 | 2015-04-12 | cookie5 |
| 2015-04 | 2015-04-13 | cookie6 |
| 2015-04 | 2015-04-15 | cookie3 |
| 2015-04 | 2015-04-15 | cookie2 |
| 2015-04 | 2015-04-16 | cookie1 |
+----------+-------------+-------------+--+
14 rows selected (0.249 seconds)
GROUPING SETS
在一个GROUP BY查询中,根据不同的维度组合进行聚合,等价于将不同维度的GROUP BY结果集进行UNION ALL
SELECT
month,
day,
COUNT(DISTINCT cookieid) AS uv,
GROUPING__ID
FROM test_data
GROUP BY month,day
GROUPING SETS (month,day)
ORDER BY GROUPING__ID; 等价于
SELECT month,NULL,COUNT(DISTINCT cookieid) AS uv,1 AS GROUPING__ID FROM test_data GROUP BY month
UNION ALL
SELECT NULL,day,COUNT(DISTINCT cookieid) AS uv,2 AS GROUPING__ID FROM test_data GROUP BY day +----------+-------------+-----+---------------+--+
| month | day | uv | grouping__id |
+----------+-------------+-----+---------------+--+
| 2015-04 | NULL | 6 | 1 |
| 2015-03 | NULL | 5 | 1 |
| NULL | 2015-04-16 | 2 | 2 |
| NULL | 2015-04-15 | 2 | 2 |
| NULL | 2015-04-13 | 3 | 2 |
| NULL | 2015-04-12 | 2 | 2 |
| NULL | 2015-03-12 | 1 | 2 |
| NULL | 2015-03-10 | 4 | 2 |
+----------+-------------+-----+---------------+--+
8 rows selected (177.299 seconds) SELECT
month,
day,
COUNT(DISTINCT cookieid) AS uv,
GROUPING__ID
FROM test_data
GROUP BY month,day
GROUPING SETS (month,day,(month,day))
ORDER BY GROUPING__ID; 等价于
SELECT month,NULL,COUNT(DISTINCT cookieid) AS uv,1 AS GROUPING__ID FROM test_data GROUP BY month
UNION ALL
SELECT NULL,day,COUNT(DISTINCT cookieid) AS uv,2 AS GROUPING__ID FROM test_data GROUP BY day
UNION ALL
SELECT month,day,COUNT(DISTINCT cookieid) AS uv,3 AS GROUPING__ID FROM test_data GROUP BY month,day
+----------+-------------+-----+---------------+--+
| month | day | uv | grouping__id |
+----------+-------------+-----+---------------+--+
| 2015-04 | NULL | 6 | 1 |
| 2015-03 | NULL | 5 | 1 |
| NULL | 2015-03-10 | 4 | 2 |
| NULL | 2015-04-16 | 2 | 2 |
| NULL | 2015-04-15 | 2 | 2 |
| NULL | 2015-04-13 | 3 | 2 |
| NULL | 2015-04-12 | 2 | 2 |
| NULL | 2015-03-12 | 1 | 2 |
| 2015-04 | 2015-04-16 | 2 | 3 |
| 2015-04 | 2015-04-12 | 2 | 3 |
| 2015-04 | 2015-04-13 | 3 | 3 |
| 2015-03 | 2015-03-12 | 1 | 3 |
| 2015-03 | 2015-03-10 | 4 | 3 |
| 2015-04 | 2015-04-15 | 2 | 3 |
+----------+-------------+-----+---------------+--+
备注:其中的 GROUPING__ID,表示结果属于哪一个分组集合。
CUBE
根据GROUP BY的维度的所有组合进行聚合。
SELECT
month,
day,
COUNT(DISTINCT cookieid) AS uv,
GROUPING__ID
FROM test_data
GROUP BY month,day
WITH CUBE
ORDER BY GROUPING__ID; 等价于
SELECT NULL,NULL,COUNT(DISTINCT cookieid) AS uv,0 AS GROUPING__ID FROM test_data
UNION ALL
SELECT month,NULL,COUNT(DISTINCT cookieid) AS uv,1 AS GROUPING__ID FROM test_data GROUP BY month
UNION ALL
SELECT NULL,day,COUNT(DISTINCT cookieid) AS uv,2 AS GROUPING__ID FROM test_data GROUP BY day
UNION ALL
SELECT month,day,COUNT(DISTINCT cookieid) AS uv,3 AS GROUPING__ID FROM test_data GROUP BY month,day
+----------+-------------+-----+---------------+--+
| month | day | uv | grouping__id |
+----------+-------------+-----+---------------+--+
| NULL | NULL | 7 | 0 |
| 2015-03 | NULL | 5 | 1 |
| 2015-04 | NULL | 6 | 1 |
| NULL | 2015-04-16 | 2 | 2 |
| NULL | 2015-04-15 | 2 | 2 |
| NULL | 2015-04-13 | 3 | 2 |
| NULL | 2015-04-12 | 2 | 2 |
| NULL | 2015-03-12 | 1 | 2 |
| NULL | 2015-03-10 | 4 | 2 |
| 2015-04 | 2015-04-12 | 2 | 3 |
| 2015-04 | 2015-04-16 | 2 | 3 |
| 2015-03 | 2015-03-12 | 1 | 3 |
| 2015-03 | 2015-03-10 | 4 | 3 |
| 2015-04 | 2015-04-15 | 2 | 3 |
| 2015-04 | 2015-04-13 | 3 | 3 |
+----------+-------------+-----+---------------+--+
ROLLUP
是CUBE的子集,以最左侧的维度为主,从该维度进行层级聚合。
比如,以month维度进行层级聚合:
SELECT
month,
day,
COUNT(DISTINCT cookieid) AS uv,
GROUPING__ID
FROM test_data
GROUP BY month,day
WITH ROLLUP
ORDER BY GROUPING__ID;
可以实现这样的上钻过程:月天的UV->月的UV->总UV
+----------+-------------+-----+---------------+--+
| month | day | uv | grouping__id |
+----------+-------------+-----+---------------+--+
| NULL | NULL | 7 | 0 |
| 2015-04 | NULL | 6 | 1 |
| 2015-03 | NULL | 5 | 1 |
| 2015-04 | 2015-04-16 | 2 | 3 |
| 2015-04 | 2015-04-15 | 2 | 3 |
| 2015-04 | 2015-04-13 | 3 | 3 |
| 2015-04 | 2015-04-12 | 2 | 3 |
| 2015-03 | 2015-03-12 | 1 | 3 |
| 2015-03 | 2015-03-10 | 4 | 3 |
+----------+-------------+-----+---------------+--+ --把month和day调换顺序,则以day维度进行层级聚合:
SELECT
day,
month,
COUNT(DISTINCT cookieid) AS uv,
GROUPING__ID
FROM test_data
GROUP BY day,month
WITH ROLLUP
ORDER BY GROUPING__ID;
+-------------+----------+-----+---------------+--+
| day | month | uv | grouping__id |
+-------------+----------+-----+---------------+--+
| NULL | NULL | 7 | 0 |
| 2015-04-12 | NULL | 2 | 1 |
| 2015-04-15 | NULL | 2 | 1 |
| 2015-03-12 | NULL | 1 | 1 |
| 2015-04-16 | NULL | 2 | 1 |
| 2015-03-10 | NULL | 4 | 1 |
| 2015-04-13 | NULL | 3 | 1 |
| 2015-04-16 | 2015-04 | 2 | 3 |
| 2015-04-15 | 2015-04 | 2 | 3 |
| 2015-04-13 | 2015-04 | 3 | 3 |
| 2015-03-12 | 2015-03 | 1 | 3 |
| 2015-03-10 | 2015-03 | 4 | 3 |
| 2015-04-12 | 2015-04 | 2 | 3 |
+-------------+----------+-----+---------------+--+
可以实现这样的上钻过程:
天月的UV->天的UV->总UV
(这里,根据天和月进行聚合,和根据天聚合结果一样,因为有父子关系,如果是其他维度组合的话,就会不一样)
Hive函数:GROUPING SETS,GROUPING__ID,CUBE,ROLLUP的更多相关文章
- Hive高阶聚合函数 GROUPING SETS、Cube、Rollup
-- GROUPING SETS作为GROUP BY的子句,允许开发人员在GROUP BY语句后面指定多个统计选项,可以简单理解为多条group by语句通过union all把查询结果聚合起来结合起 ...
- Hive SQL grouping sets 用法
概述 GROUPING SETS,GROUPING__ID,CUBE,ROLLUP 这几个分析函数通常用于OLAP中,不能累加,而且需要根据不同维度上钻和下钻的指标统计,比如,分小时.天.月的UV数. ...
- hive中grouping sets的使用
hive中grouping sets 数量较多时如何处理? 可以使用如下设置来 set hive.new.job.grouping.set.cardinality = 30; 这条设置的意义在于 ...
- GROUPING SETS、CUBE、ROLLUP
其实还是写一个Demo 比较好 USE tempdb IF OBJECT_ID( 'dbo.T1' , 'U' )IS NOT NULL BEGIN DROP TABLE dbo.T1; END; G ...
- Hive学习之路 (十七)Hive分析窗口函数(五) GROUPING SETS、GROUPING__ID、CUBE和ROLLUP
概述 GROUPING SETS,GROUPING__ID,CUBE,ROLLUP 这几个分析函数通常用于OLAP中,不能累加,而且需要根据不同维度上钻和下钻的指标统计,比如,分小时.天.月的UV数. ...
- 解析数仓OLAP函数:ROLLUP、CUBE、GROUPING SETS
摘要:GaussDB(DWS) ROLLUP,CUBE,GROUPING SETS等OLAP函数的原理解析. 本文分享自华为云社区<GaussDB(DWS) OLAP函数浅析>,作者: D ...
- Oracle的rollup、cube、grouping sets函数
转载自:https://blog.csdn.net/huang_xw/article/details/6402396 Oracle的group by除了基本用法以外,还有3种扩展用法,分别是rollu ...
- SQL Server2008 程序设计 汇总 GROUP BY,WITH ROLLUP,WITH CUBE,GROUPING SETS(..)
--SQL Server2008 程序设计 汇总 GROUP BY ,WITH ROLLUP WITH CUBE GROUPING SET(..) /*********************** ...
- TSQL 分组集(Grouping Sets)
分组集(Grouping Sets)是多个分组的并集,用于在一个查询中,按照不同的分组列对集合进行聚合运算,等价于对单个分组使用“union all”,计算多个结果集的并集.使用分组集的聚合查询,返回 ...
随机推荐
- 如何在IPFS里面上传一张图片
之前有好几人问过小编,想在IPFS里面上传一张图片.如何做? 今天小编就讲一下如何在IPFS里面上传.下载文件? 1 下载IPFS软件 下载地址:https://dist.ipfs.io/#go-ip ...
- Batch update returned unexpected row count from update [0] 异常处理
在one-to-many时遇到此异常,本以为是配置出错.在使用s标签开启debug模式,并在struts2主配置文件中添加异常映射,再次提交表单后得到以下异常详情. org.springframewo ...
- Sublime + Python3 + 虚拟环境 + 去除 中文输出乱码
MacBook Pro Retina 13 2013年底版 所用软件 1. Sublime Text 3安装 Virtualenv package 2. 用 iterm2 .或者终端安装zip:apt ...
- POJ-1004-Finanical Management
Description Larry graduated this year and finally has a job. He's making a lot of money, but somehow ...
- 企业必会技能 tomcat
企业必会技能 tomcat tomcat 一.什么是Tomcat? Tomcat是Apache 软件基金会(Apache Software Foundation)的Jakarta项目中的一个核心项 ...
- ReflectASM-invoke,高效率java反射机制原理
前言:前段时间在设计公司基于netty的易用框架时,很多地方都用到了反射机制.反射的性能一直是大家有目共睹的诟病,相比于直接调用速度上差了很多.但是在很多地方,作为未知通用判断的时候,不得不调用反射类 ...
- vue-过渡动画
本篇资料参考于官方文档: http://cn.vuejs.org/guide/transitions.html 概述: Vue 在跳转页面时,提供多种不同方式的动画过渡效果. ●in-out:新元素先 ...
- Entity Framework——并发策略
使用EF框架遇到并发时,一般采取乐观并发控制. 1支持并发检验 为支持并发检验,需要对实体进行额外的设置.默认情况下是不支持并发检验的.有以下两种方式: 方式名称 说明 时间戳注解/行版本 使用Tim ...
- C语言程序设计基础-第1周作业-初步
1.安装带有计算机术语的翻译软件 2.在自己电脑上安装C编译器,windows系统建议安装dev-c++,其他系统自行查找. 3.加入课程小组,有任何疑问可以在小组中提问:https://group. ...
- TRY