Hive SQL grouping sets 用法

概述

GROUPING SETS,GROUPING__ID,CUBE,ROLLUP

这几个分析函数通常用于OLAP中，不能累加，而且需要根据不同维度上钻和下钻的指标统计，比如，分小时、天、月的UV数。

GROUPING SETS和GROUPING__ID

说明

在一个GROUP BY查询中，根据不同的维度组合进行聚合，等价于将不同维度的GROUP BY结果集进行UNION ALL

GROUPING__ID，表示结果属于哪一个分组集合。

查询语句：

select

  month,

  day,

  count(distinct cookieid) as uv,

  GROUPING__ID

from cookie.cookie5

group by month,day

grouping sets (month,day)

order by GROUPING__ID;

等价于：

SELECT month,NULL,COUNT(DISTINCT cookieid) AS uv,1 AS GROUPING__ID FROM cookie5 GROUP BY month

UNION ALL

SELECT NULL,day,COUNT(DISTINCT cookieid) AS uv,2 AS GROUPING__ID FROM cookie5 GROUP BY day

查询结果

结果说明

第一列是按照month进行分组

第二列是按照day进行分组

第三列是按照month或day分组是，统计这一组有几个不同的cookieid

第四列grouping_id表示这一组结果属于哪个分组集合，根据grouping sets中的分组条件month，day，1是代表month，2是代表day

再比如：

SELECT  month, day,

COUNT(DISTINCT cookieid) AS uv,

GROUPING__ID

FROM cookie5

GROUP BY month,day

GROUPING SETS (month,day,(month,day))

ORDER BY GROUPING__ID;

等价于：

SELECT month,NULL,COUNT(DISTINCT cookieid) AS uv,1 AS GROUPING__ID FROM cookie5 GROUP BY month

UNION ALL

SELECT NULL,day,COUNT(DISTINCT cookieid) AS uv,2 AS GROUPING__ID FROM cookie5 GROUP BY day

UNION ALL

SELECT month,day,COUNT(DISTINCT cookieid) AS uv,3 AS GROUPING__ID FROM cookie5 GROUP BY month,day

CUBE

说明

根据GROUP BY的维度的所有组合进行聚合

查询语句

SELECT  month, day,

COUNT(DISTINCT cookieid) AS uv,

GROUPING__ID

FROM cookie5

GROUP BY month,day

WITH CUBE

ORDER BY GROUPING__ID;

等价于

SELECT NULL,NULL,COUNT(DISTINCT cookieid) AS uv,0 AS GROUPING__ID FROM cookie5

UNION ALL

SELECT month,NULL,COUNT(DISTINCT cookieid) AS uv,1 AS GROUPING__ID FROM cookie5 GROUP BY month

UNION ALL

SELECT NULL,day,COUNT(DISTINCT cookieid) AS uv,2 AS GROUPING__ID FROM cookie5 GROUP BY day

UNION ALL

SELECT month,day,COUNT(DISTINCT cookieid) AS uv,3 AS GROUPING__ID FROM cookie5 GROUP BY month,day

查询结果

ROLLUP

说明

是CUBE的子集，以最左侧的维度为主，从该维度进行层级聚合

查询语句

-- 比如，以month维度进行层级聚合

SELECT  month, day, COUNT(DISTINCT cookieid) AS uv, GROUPING__ID

FROM cookie5

GROUP BY month,day WITH ROLLUP  ORDER BY GROUPING__ID;

可以实现这样的上钻过程：
月天的UV->月的UV->总UV

--把month和day调换顺序，则以day维度进行层级聚合：

可以实现这样的上钻过程：
天月的UV->天的UV->总UV
（这里，根据天和月进行聚合，和根据天聚合结果一样，因为有父子关系，如果是其他维度组合的话，就会不一样）

https://www.cnblogs.com/qingyunzong/p/8798987.html

感谢博主！