转自:http://lxw1234.com/archives/2015/04/176.htm,Hive分析窗口函数(一) SUM,AVG,MIN,MAX

之前看到大数据田地有关于max()over(partition by)的用法,今天恰好工作中用到了它,但是使用中遇到了一个问题:在max(rsrp)over(partition by buildingid,height) as max_rsrp返回的结果不是分组中的最大值。最中找到了问题的原因:max_rsrp数据类型为string而不是double类型,导致的一个bug问题。

再处理的过程中也再次把大数据田地的中关于sum,avg,max,min的函数用法做了demo,因此有了该参考后的文章。

数据准备:

echo ''>data_file.txt
vim data_file.txt
cookie1,2015-04-10,1
cookie1,2015-04-11,5
cookie1,2015-04-12,7
cookie1,2015-04-13,3
cookie1,2015-04-14,2
cookie1,2015-04-15,4
cookie1,2015-04-16,4
cookie2,2015-04-10,6
cookie2,2015-04-11,5
cookie2,2015-04-12,7
cookie2,2015-04-13,4
cookie2,2015-04-14,3
cookie2,2015-04-15,5
cookie2,2015-04-16,5
hadoop fs -rm -r /user/jrf/test_data
hadoop fs -mkdir /user/jrf/test_data
hadoop fs -copyFromLocal data_file.txt /user/jrf/test_data/
drop table if exists test_data;
create EXTERNAL TABLE test_data (
cookieid string,
createtime string, --day
pv INT
) ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
stored as textfile location '/user/jrf/test_data/';
select * from test_data;
+---------------------+-----------------------+---------------+--+
| test_data.cookieid | test_data.createtime | test_data.pv |
+---------------------+-----------------------+---------------+--+
| cookie1 | 2015-04-10 | 1 |
| cookie1 | 2015-04-11 | 5 |
| cookie1 | 2015-04-12 | 7 |
| cookie1 | 2015-04-13 | 3 |
| cookie1 | 2015-04-14 | 2 |
| cookie1 | 2015-04-15 | 4 |
| cookie1 | 2015-04-16 | 4 |
| cookie2 | 2015-04-10 | 6 |
| cookie2 | 2015-04-11 | 5 |
| cookie2 | 2015-04-12 | 7 |
| cookie2 | 2015-04-13 | 4 |
| cookie2 | 2015-04-14 | 3 |
| cookie2 | 2015-04-15 | 5 |
| cookie2 | 2015-04-16 | 5 |
+---------------------+-----------------------+---------------+--+

SUM — 注意,结果和ORDER BY相关,默认为升序

SELECT cookieid,createtime,pv,
SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1
SUM(pv) OVER(PARTITION BY cookieid) AS pv3,--分组内所有行
SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4,--当前行+往前3行
SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5,--当前行+往前3行+往后1行
SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 --当前行+往后所有行
FROM test_data order by cookieid,createtime;
+-----------+-------------+-----+------+------+------+------+------+------+--+
| cookieid | createtime | pv | pv1 | pv2 | pv3 | pv4 | pv5 | pv6 |
+-----------+-------------+-----+------+------+------+------+------+------+--+
| cookie1 | 2015-04-10 | 1 | 1 | 1 | 26 | 1 | 6 | 26 |
| cookie1 | 2015-04-11 | 5 | 6 | 6 | 26 | 6 | 13 | 25 |
| cookie1 | 2015-04-12 | 7 | 13 | 13 | 26 | 13 | 16 | 20 |
| cookie1 | 2015-04-13 | 3 | 16 | 16 | 26 | 16 | 18 | 13 |
| cookie1 | 2015-04-14 | 2 | 18 | 18 | 26 | 17 | 21 | 10 |
| cookie1 | 2015-04-15 | 4 | 22 | 22 | 26 | 16 | 20 | 8 |
| cookie1 | 2015-04-16 | 4 | 26 | 26 | 26 | 13 | 13 | 4 |
| cookie2 | 2015-04-10 | 6 | 6 | 6 | 35 | 6 | 11 | 35 |
| cookie2 | 2015-04-11 | 5 | 11 | 11 | 35 | 11 | 18 | 29 |
| cookie2 | 2015-04-12 | 7 | 18 | 18 | 35 | 18 | 22 | 24 |
| cookie2 | 2015-04-13 | 4 | 22 | 22 | 35 | 22 | 25 | 17 |
| cookie2 | 2015-04-14 | 3 | 25 | 25 | 35 | 19 | 24 | 13 |
| cookie2 | 2015-04-15 | 5 | 30 | 30 | 35 | 19 | 24 | 10 |
| cookie2 | 2015-04-16 | 5 | 35 | 35 | 35 | 17 | 17 | 5 |
+-----------+-------------+-----+------+------+------+------+------+------+--+
pv1: 分组内从起点到当前行的pv累积,如,11号的pv1=10号的pv+11号的pv, 12号=10号+11号+12号
pv2: 同pv1
pv3: 分组内(cookie1)所有的pv累加
pv4: 分组内当前行+往前3行,如,11号=10号+11号, 12号=10号+11号+12号, 13号=10号+11号+12号+13号, 14号=11号+12号+13号+14号
pv5: 分组内当前行+往前3行+往后1行,如,14号=11号+12号+13号+14号+15号=5+7+3+2+4=21
pv6: 分组内当前行+往后所有行,如,13号=13号+14号+15号+16号=3+2+4+4=13,14号=14号+15号+16号=2+4+4=10

如果不指定ROWS BETWEEN,默认为从起点到当前行;
如果不指定ORDER BY,则将分组内所有值累加;
关键是理解ROWS BETWEEN含义,也叫做WINDOW子句:
PRECEDING:往前
FOLLOWING:往后
CURRENT ROW:当前行
UNBOUNDED:起点,UNBOUNDED PRECEDING 表示从前面的起点, UNBOUNDED FOLLOWING:表示到后面的终点

–其他AVG,MIN,MAX,和SUM用法一样。

--AVG
SELECT cookieid,createtime,pv,
AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1
AVG(pv) OVER(PARTITION BY cookieid) AS pv3,--分组内所有行
AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4,--当前行+往前3行
AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5,--当前行+往前3行+往后1行
AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 --当前行+往后所有行
FROM test_data order by cookieid,createtime;
+-----------+-------------+-----+---------------------+---------------------+---------------------+--------------------+--------------------+---------------------+--+
| cookieid | createtime | pv | pv1 | pv2 | pv3 | pv4 | pv5 | pv6 |
+-----------+-------------+-----+---------------------+---------------------+---------------------+--------------------+--------------------+---------------------+--+
| cookie1 | 2015-04-10 | 1 | 1.0 | 1.0 | 3.7142857142857144 | 1.0 | 3.0 | 3.7142857142857144 |
| cookie1 | 2015-04-11 | 5 | 3.0 | 3.0 | 3.7142857142857144 | 3.0 | 4.333333333333333 | 4.166666666666667 |
| cookie1 | 2015-04-12 | 7 | 4.333333333333333 | 4.333333333333333 | 3.7142857142857144 | 4.333333333333333 | 4.0 | 4.0 |
| cookie1 | 2015-04-13 | 3 | 4.0 | 4.0 | 3.7142857142857144 | 4.0 | 3.6 | 3.25 |
| cookie1 | 2015-04-14 | 2 | 3.6 | 3.6 | 3.7142857142857144 | 4.25 | 4.2 | 3.3333333333333335 |
| cookie1 | 2015-04-15 | 4 | 3.6666666666666665 | 3.6666666666666665 | 3.7142857142857144 | 4.0 | 4.0 | 4.0 |
| cookie1 | 2015-04-16 | 4 | 3.7142857142857144 | 3.7142857142857144 | 3.7142857142857144 | 3.25 | 3.25 | 4.0 |
| cookie2 | 2015-04-10 | 6 | 6.0 | 6.0 | 5.0 | 6.0 | 5.5 | 5.0 |
| cookie2 | 2015-04-11 | 5 | 5.5 | 5.5 | 5.0 | 5.5 | 6.0 | 4.833333333333333 |
| cookie2 | 2015-04-12 | 7 | 6.0 | 6.0 | 5.0 | 6.0 | 5.5 | 4.8 |
| cookie2 | 2015-04-13 | 4 | 5.5 | 5.5 | 5.0 | 5.5 | 5.0 | 4.25 |
| cookie2 | 2015-04-14 | 3 | 5.0 | 5.0 | 5.0 | 4.75 | 4.8 | 4.333333333333333 |
| cookie2 | 2015-04-15 | 5 | 5.0 | 5.0 | 5.0 | 4.75 | 4.8 | 5.0 |
| cookie2 | 2015-04-16 | 5 | 5.0 | 5.0 | 5.0 | 4.25 | 4.25 | 5.0 |
+-----------+-------------+-----+---------------------+---------------------+---------------------+--------------------+--------------------+---------------------+--+
--MIN
SELECT cookieid,createtime,pv,
MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2,--从起点到当前行,结果同pv1
MIN(pv) OVER(PARTITION BY cookieid) AS pv3,--分组内所有行
MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4,--当前行+往前3行
MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5,--当前行+往前3行+往后1行
MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 --当前行+往后所有行
FROM test_data order by cookieid,createtime;
+-----------+-------------+-----+------+------+------+------+------+------+--+
| cookieid | createtime | pv | pv1 | pv2 | pv3 | pv4 | pv5 | pv6 |
+-----------+-------------+-----+------+------+------+------+------+------+--+
| cookie1 | 2015-04-10 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| cookie1 | 2015-04-11 | 5 | 1 | 1 | 1 | 1 | 1 | 2 |
| cookie1 | 2015-04-12 | 7 | 1 | 1 | 1 | 1 | 1 | 2 |
| cookie1 | 2015-04-13 | 3 | 1 | 1 | 1 | 1 | 1 | 2 |
| cookie1 | 2015-04-14 | 2 | 1 | 1 | 1 | 2 | 2 | 2 |
| cookie1 | 2015-04-15 | 4 | 1 | 1 | 1 | 2 | 2 | 4 |
| cookie1 | 2015-04-16 | 4 | 1 | 1 | 1 | 2 | 2 | 4 |
| cookie2 | 2015-04-10 | 6 | 6 | 6 | 3 | 6 | 5 | 3 |
| cookie2 | 2015-04-11 | 5 | 5 | 5 | 3 | 5 | 5 | 3 |
| cookie2 | 2015-04-12 | 7 | 5 | 5 | 3 | 5 | 4 | 3 |
| cookie2 | 2015-04-13 | 4 | 4 | 4 | 3 | 4 | 3 | 3 |
| cookie2 | 2015-04-14 | 3 | 3 | 3 | 3 | 3 | 3 | 3 |
| cookie2 | 2015-04-15 | 5 | 3 | 3 | 3 | 3 | 3 | 5 |
| cookie2 | 2015-04-16 | 5 | 3 | 3 | 3 | 3 | 3 | 5 |
+-----------+-------------+-----+------+------+------+------+------+------+--+
--MAX
SELECT cookieid,createtime,pv,
MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1
MAX(pv) OVER(PARTITION BY cookieid) AS pv3, --分组内所有行
MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4, --当前行+往前3行
MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5, --当前行+往前3行+往后1行
MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 --当前行+往后所有行
FROM test_data order by cookieid,createtime;
+-----------+-------------+-----+------+------+------+------+------+------+--+
| cookieid | createtime | pv | pv1 | pv2 | pv3 | pv4 | pv5 | pv6 |
+-----------+-------------+-----+------+------+------+------+------+------+--+
| cookie1 | 2015-04-10 | 1 | 1 | 1 | 7 | 1 | 5 | 7 |
| cookie1 | 2015-04-11 | 5 | 5 | 5 | 7 | 5 | 7 | 7 |
| cookie1 | 2015-04-12 | 7 | 7 | 7 | 7 | 7 | 7 | 7 |
| cookie1 | 2015-04-13 | 3 | 7 | 7 | 7 | 7 | 7 | 4 |
| cookie1 | 2015-04-14 | 2 | 7 | 7 | 7 | 7 | 7 | 4 |
| cookie1 | 2015-04-15 | 4 | 7 | 7 | 7 | 7 | 7 | 4 |
| cookie1 | 2015-04-16 | 4 | 7 | 7 | 7 | 4 | 4 | 4 |
| cookie2 | 2015-04-10 | 6 | 6 | 6 | 7 | 6 | 6 | 7 |
| cookie2 | 2015-04-11 | 5 | 6 | 6 | 7 | 6 | 7 | 7 |
| cookie2 | 2015-04-12 | 7 | 7 | 7 | 7 | 7 | 7 | 7 |
| cookie2 | 2015-04-13 | 4 | 7 | 7 | 7 | 7 | 7 | 5 |
| cookie2 | 2015-04-14 | 3 | 7 | 7 | 7 | 7 | 7 | 5 |
| cookie2 | 2015-04-15 | 5 | 7 | 7 | 7 | 7 | 7 | 5 |
| cookie2 | 2015-04-16 | 5 | 7 | 7 | 7 | 5 | 5 | 5 |
+-----------+-------------+-----+------+------+------+------+------+------+--+ SELECT cookieid,
createtime,
pv,
min(pv) OVER(PARTITION BY cookieid) AS min_pv,
max(pv) OVER(PARTITION BY cookieid) AS max_pv
FROM test_data;
+-----------+-------------+-----+---------+---------+--+
| cookieid | createtime | pv | min_pv | max_pv |
+-----------+-------------+-----+---------+---------+--+
| cookie1 | 2015-04-10 | 1 | 1 | 7 |
| cookie1 | 2015-04-16 | 4 | 1 | 7 |
| cookie1 | 2015-04-15 | 4 | 1 | 7 |
| cookie1 | 2015-04-14 | 2 | 1 | 7 |
| cookie1 | 2015-04-13 | 3 | 1 | 7 |
| cookie1 | 2015-04-12 | 7 | 1 | 7 |
| cookie1 | 2015-04-11 | 5 | 1 | 7 |
| cookie2 | 2015-04-16 | 5 | 3 | 7 |
| cookie2 | 2015-04-15 | 5 | 3 | 7 |
| cookie2 | 2015-04-14 | 3 | 3 | 7 |
| cookie2 | 2015-04-13 | 4 | 3 | 7 |
| cookie2 | 2015-04-12 | 7 | 3 | 7 |
| cookie2 | 2015-04-11 | 5 | 3 | 7 |
| cookie2 | 2015-04-10 | 6 | 3 | 7 |
+-----------+-------------+-----+---------+---------+--+

Hive函数:SUM,AVG,MIN,MAX的更多相关文章

  1. Hive分析窗口函数(一) SUM,AVG,MIN,MAX

    Hive分析窗口函数(一) SUM,AVG,MIN,MAX Hive分析窗口函数(一) SUM,AVG,MIN,MAX Hive中提供了越来越多的分析函数,用于完成负责的统计分析.抽时间将所有的分析窗 ...

  2. Hive学习之路 (十三)Hive分析窗口函数(一) SUM,AVG,MIN,MAX

    数据准备 数据格式 cookie1,, cookie1,, cookie1,, cookie1,, cookie1,, cookie1,, cookie1,, 创建数据库及表 create datab ...

  3. MybatisPlus Lambda表达式 聚合查询 分组查询 COUNT SUM AVG MIN MAX GroupBy

    一.序言 众所周知,MybatisPlus在处理单表DAO操作时非常的方便.在处理多表连接连接查询也有优雅的解决方案.今天分享MybatisPlus基于Lambda表达式优雅实现聚合分组查询. 由于视 ...

  4. C# 中奇妙的函数–6. 五个序列聚合运算(Sum, Average, Min, Max,Aggregate)

    今天,我们将着眼于五个用于序列的聚合运算.很多时候当我们在对序列进行操作时,我们想要做基于这些序列执行某种汇总然后,计算结果. Enumerable 静态类的LINQ扩展方法可以做到这一点 .就像之前 ...

  5. SQL模糊查询,sum,AVG,MAX,min函数

    cmd mysql -hlocalhost -uroot -p select * from emp where ename like '___' -- 三个横线, - 代表字符,可以查询 三个enam ...

  6. 三、函数 (SUM、MIN、MAX、COUNT、AVG)

    第八章 使用数据处理函数 8.1 函数 SQL支持利用函数来处理数据.函数一般是在数据上执行的,给数据的转换和处理提供了方便. 每一个DBMS都有特定的函数.只有少数几个函数被所有主要的DBMS等同的 ...

  7. LINQ to SQL Count/Sum/Min/Max/Avg Join

    public class Linq { MXSICEDataContext Db = new MXSICEDataContext(); // LINQ to SQL // Count/Sum/Min/ ...

  8. LINQ to SQL 语句(3) 之 Count/Sum/Min/Max/Avg

    LINQ  to SQL 语句(3) 之  Count/Sum/Min/Max/Avg [1] Count/Sum 讲解 [2] Min 讲解 [3] Max 讲解 [4] Average 和 Agg ...

  9. [转]LINQ语句之Select/Distinct和Count/Sum/Min/Max/Avg

    在讲述了LINQ,顺便说了一下Where操作,这篇开始我们继续说LINQ语句,目的让大家从语句的角度了解LINQ,LINQ包括LINQ to Objects.LINQ to DataSets.LINQ ...

随机推荐

  1. 优先级队列用法详解(priority_queue)

    由于优先级队列的内部数据结构为 堆,所以这里先介绍堆的一些操作. 堆的一些函数操作在algorithm头文件中 //在[first, last)范围内构造最大堆,first,last 可以是vecto ...

  2. NGUI_slider

    十一.进度条UISlider 1.一般按以下规律使用进度条; 如果某一钟值,他有最大值,需要表达当前的值得占比,则可以使用不可拖动的进度条 如果某一值,他有最大值,需要玩家记性滑动设置,则可以使用可拖 ...

  3. ASP.NET Core Web 支付功能接入 微信-扫码支付篇

    这篇文章将介绍ASP.NET Core中使用 开源项目 Payment,实现接入微信-扫码支付及异步通知功能. 开发环境:Win 10 x64.VS2017 15.6.4..NET Core SDK ...

  4. 剑指Offer-不用加减乘除做加法

    package Other; /** * 不用加减乘除做加法 * 写一个函数,求两个整数之和,要求在函数体内不得使用+.-.*./四则运算符号. * 思路:位运算 * 1.两个数异或:相当于每一位相加 ...

  5. java编程基础知识及常见例题

    ⒈标识符: 只能包含数字.字母.下划线.$,并且不能以数字开头.语义直观规范 驼峰法则: 如:方法名.变量名采用驼峰法则 帕斯卡命名法: 如: 类.接口.枚举采用帕斯卡命名法包名:网址倒写,com.网 ...

  6. 排序算法Java实现(归并排序)

    算法描述:对于给定的一组记录,首先将每两个相邻的长度为1的子序列进行归并,得到 n/2(向上取整)个长度为2或1的有序子序列,再将其两两归并,反复执行此过程,直到得到一个有序序列. package s ...

  7. 【Python】 上下文管理器和contextlib

    上下文管理器 一直对python中的上下文管理比较迷惑,趁着今天研究SQLAlchemy顺便看了一下,感觉稍微清楚了一点.http://www.cnblogs.com/chenny7/p/421344 ...

  8. JVM学习六:JVM之类加载器之双亲委派机制

    前面我们知道类加载有系统自带的3种加载器,也有自定义的加载器,那么这些加载器之间的关系是什么,已经在加载类的时候,谁去加载呢?这节,我们将进行讲解. 一.双亲委派机制 JVM的ClassLoader采 ...

  9. Mysql加密解密随机函数

    MD5(str) md5加密 SELECT MD5('hello') 5d41402abc4b2a76b9719d911017c592 sha(str) sha加密 SELECT SHA('hello ...

  10. Linux下的指令:find,which

    1. 在Linux系统下,使用find来查找文件: find [path] [condition] [operation] path指定了在哪个目录查找,condition限定了查找条件,operat ...