转自:http://lxw1234.com/archives/2015/04/176.htm,Hive分析窗口函数(一) SUM,AVG,MIN,MAX

之前看到大数据田地有关于max()over(partition by)的用法,今天恰好工作中用到了它,但是使用中遇到了一个问题:在max(rsrp)over(partition by buildingid,height) as max_rsrp返回的结果不是分组中的最大值。最中找到了问题的原因:max_rsrp数据类型为string而不是double类型,导致的一个bug问题。

再处理的过程中也再次把大数据田地的中关于sum,avg,max,min的函数用法做了demo,因此有了该参考后的文章。

数据准备:

echo ''>data_file.txt
vim data_file.txt
cookie1,2015-04-10,1
cookie1,2015-04-11,5
cookie1,2015-04-12,7
cookie1,2015-04-13,3
cookie1,2015-04-14,2
cookie1,2015-04-15,4
cookie1,2015-04-16,4
cookie2,2015-04-10,6
cookie2,2015-04-11,5
cookie2,2015-04-12,7
cookie2,2015-04-13,4
cookie2,2015-04-14,3
cookie2,2015-04-15,5
cookie2,2015-04-16,5
hadoop fs -rm -r /user/jrf/test_data
hadoop fs -mkdir /user/jrf/test_data
hadoop fs -copyFromLocal data_file.txt /user/jrf/test_data/
drop table if exists test_data;
create EXTERNAL TABLE test_data (
cookieid string,
createtime string, --day
pv INT
) ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
stored as textfile location '/user/jrf/test_data/';
select * from test_data;
+---------------------+-----------------------+---------------+--+
| test_data.cookieid | test_data.createtime | test_data.pv |
+---------------------+-----------------------+---------------+--+
| cookie1 | 2015-04-10 | 1 |
| cookie1 | 2015-04-11 | 5 |
| cookie1 | 2015-04-12 | 7 |
| cookie1 | 2015-04-13 | 3 |
| cookie1 | 2015-04-14 | 2 |
| cookie1 | 2015-04-15 | 4 |
| cookie1 | 2015-04-16 | 4 |
| cookie2 | 2015-04-10 | 6 |
| cookie2 | 2015-04-11 | 5 |
| cookie2 | 2015-04-12 | 7 |
| cookie2 | 2015-04-13 | 4 |
| cookie2 | 2015-04-14 | 3 |
| cookie2 | 2015-04-15 | 5 |
| cookie2 | 2015-04-16 | 5 |
+---------------------+-----------------------+---------------+--+

SUM — 注意,结果和ORDER BY相关,默认为升序

SELECT cookieid,createtime,pv,
SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1
SUM(pv) OVER(PARTITION BY cookieid) AS pv3,--分组内所有行
SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4,--当前行+往前3行
SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5,--当前行+往前3行+往后1行
SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 --当前行+往后所有行
FROM test_data order by cookieid,createtime;
+-----------+-------------+-----+------+------+------+------+------+------+--+
| cookieid | createtime | pv | pv1 | pv2 | pv3 | pv4 | pv5 | pv6 |
+-----------+-------------+-----+------+------+------+------+------+------+--+
| cookie1 | 2015-04-10 | 1 | 1 | 1 | 26 | 1 | 6 | 26 |
| cookie1 | 2015-04-11 | 5 | 6 | 6 | 26 | 6 | 13 | 25 |
| cookie1 | 2015-04-12 | 7 | 13 | 13 | 26 | 13 | 16 | 20 |
| cookie1 | 2015-04-13 | 3 | 16 | 16 | 26 | 16 | 18 | 13 |
| cookie1 | 2015-04-14 | 2 | 18 | 18 | 26 | 17 | 21 | 10 |
| cookie1 | 2015-04-15 | 4 | 22 | 22 | 26 | 16 | 20 | 8 |
| cookie1 | 2015-04-16 | 4 | 26 | 26 | 26 | 13 | 13 | 4 |
| cookie2 | 2015-04-10 | 6 | 6 | 6 | 35 | 6 | 11 | 35 |
| cookie2 | 2015-04-11 | 5 | 11 | 11 | 35 | 11 | 18 | 29 |
| cookie2 | 2015-04-12 | 7 | 18 | 18 | 35 | 18 | 22 | 24 |
| cookie2 | 2015-04-13 | 4 | 22 | 22 | 35 | 22 | 25 | 17 |
| cookie2 | 2015-04-14 | 3 | 25 | 25 | 35 | 19 | 24 | 13 |
| cookie2 | 2015-04-15 | 5 | 30 | 30 | 35 | 19 | 24 | 10 |
| cookie2 | 2015-04-16 | 5 | 35 | 35 | 35 | 17 | 17 | 5 |
+-----------+-------------+-----+------+------+------+------+------+------+--+
pv1: 分组内从起点到当前行的pv累积,如,11号的pv1=10号的pv+11号的pv, 12号=10号+11号+12号
pv2: 同pv1
pv3: 分组内(cookie1)所有的pv累加
pv4: 分组内当前行+往前3行,如,11号=10号+11号, 12号=10号+11号+12号, 13号=10号+11号+12号+13号, 14号=11号+12号+13号+14号
pv5: 分组内当前行+往前3行+往后1行,如,14号=11号+12号+13号+14号+15号=5+7+3+2+4=21
pv6: 分组内当前行+往后所有行,如,13号=13号+14号+15号+16号=3+2+4+4=13,14号=14号+15号+16号=2+4+4=10

如果不指定ROWS BETWEEN,默认为从起点到当前行;
如果不指定ORDER BY,则将分组内所有值累加;
关键是理解ROWS BETWEEN含义,也叫做WINDOW子句:
PRECEDING:往前
FOLLOWING:往后
CURRENT ROW:当前行
UNBOUNDED:起点,UNBOUNDED PRECEDING 表示从前面的起点, UNBOUNDED FOLLOWING:表示到后面的终点

–其他AVG,MIN,MAX,和SUM用法一样。

--AVG
SELECT cookieid,createtime,pv,
AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1
AVG(pv) OVER(PARTITION BY cookieid) AS pv3,--分组内所有行
AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4,--当前行+往前3行
AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5,--当前行+往前3行+往后1行
AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 --当前行+往后所有行
FROM test_data order by cookieid,createtime;
+-----------+-------------+-----+---------------------+---------------------+---------------------+--------------------+--------------------+---------------------+--+
| cookieid | createtime | pv | pv1 | pv2 | pv3 | pv4 | pv5 | pv6 |
+-----------+-------------+-----+---------------------+---------------------+---------------------+--------------------+--------------------+---------------------+--+
| cookie1 | 2015-04-10 | 1 | 1.0 | 1.0 | 3.7142857142857144 | 1.0 | 3.0 | 3.7142857142857144 |
| cookie1 | 2015-04-11 | 5 | 3.0 | 3.0 | 3.7142857142857144 | 3.0 | 4.333333333333333 | 4.166666666666667 |
| cookie1 | 2015-04-12 | 7 | 4.333333333333333 | 4.333333333333333 | 3.7142857142857144 | 4.333333333333333 | 4.0 | 4.0 |
| cookie1 | 2015-04-13 | 3 | 4.0 | 4.0 | 3.7142857142857144 | 4.0 | 3.6 | 3.25 |
| cookie1 | 2015-04-14 | 2 | 3.6 | 3.6 | 3.7142857142857144 | 4.25 | 4.2 | 3.3333333333333335 |
| cookie1 | 2015-04-15 | 4 | 3.6666666666666665 | 3.6666666666666665 | 3.7142857142857144 | 4.0 | 4.0 | 4.0 |
| cookie1 | 2015-04-16 | 4 | 3.7142857142857144 | 3.7142857142857144 | 3.7142857142857144 | 3.25 | 3.25 | 4.0 |
| cookie2 | 2015-04-10 | 6 | 6.0 | 6.0 | 5.0 | 6.0 | 5.5 | 5.0 |
| cookie2 | 2015-04-11 | 5 | 5.5 | 5.5 | 5.0 | 5.5 | 6.0 | 4.833333333333333 |
| cookie2 | 2015-04-12 | 7 | 6.0 | 6.0 | 5.0 | 6.0 | 5.5 | 4.8 |
| cookie2 | 2015-04-13 | 4 | 5.5 | 5.5 | 5.0 | 5.5 | 5.0 | 4.25 |
| cookie2 | 2015-04-14 | 3 | 5.0 | 5.0 | 5.0 | 4.75 | 4.8 | 4.333333333333333 |
| cookie2 | 2015-04-15 | 5 | 5.0 | 5.0 | 5.0 | 4.75 | 4.8 | 5.0 |
| cookie2 | 2015-04-16 | 5 | 5.0 | 5.0 | 5.0 | 4.25 | 4.25 | 5.0 |
+-----------+-------------+-----+---------------------+---------------------+---------------------+--------------------+--------------------+---------------------+--+
--MIN
SELECT cookieid,createtime,pv,
MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2,--从起点到当前行,结果同pv1
MIN(pv) OVER(PARTITION BY cookieid) AS pv3,--分组内所有行
MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4,--当前行+往前3行
MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5,--当前行+往前3行+往后1行
MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 --当前行+往后所有行
FROM test_data order by cookieid,createtime;
+-----------+-------------+-----+------+------+------+------+------+------+--+
| cookieid | createtime | pv | pv1 | pv2 | pv3 | pv4 | pv5 | pv6 |
+-----------+-------------+-----+------+------+------+------+------+------+--+
| cookie1 | 2015-04-10 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| cookie1 | 2015-04-11 | 5 | 1 | 1 | 1 | 1 | 1 | 2 |
| cookie1 | 2015-04-12 | 7 | 1 | 1 | 1 | 1 | 1 | 2 |
| cookie1 | 2015-04-13 | 3 | 1 | 1 | 1 | 1 | 1 | 2 |
| cookie1 | 2015-04-14 | 2 | 1 | 1 | 1 | 2 | 2 | 2 |
| cookie1 | 2015-04-15 | 4 | 1 | 1 | 1 | 2 | 2 | 4 |
| cookie1 | 2015-04-16 | 4 | 1 | 1 | 1 | 2 | 2 | 4 |
| cookie2 | 2015-04-10 | 6 | 6 | 6 | 3 | 6 | 5 | 3 |
| cookie2 | 2015-04-11 | 5 | 5 | 5 | 3 | 5 | 5 | 3 |
| cookie2 | 2015-04-12 | 7 | 5 | 5 | 3 | 5 | 4 | 3 |
| cookie2 | 2015-04-13 | 4 | 4 | 4 | 3 | 4 | 3 | 3 |
| cookie2 | 2015-04-14 | 3 | 3 | 3 | 3 | 3 | 3 | 3 |
| cookie2 | 2015-04-15 | 5 | 3 | 3 | 3 | 3 | 3 | 5 |
| cookie2 | 2015-04-16 | 5 | 3 | 3 | 3 | 3 | 3 | 5 |
+-----------+-------------+-----+------+------+------+------+------+------+--+
--MAX
SELECT cookieid,createtime,pv,
MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1
MAX(pv) OVER(PARTITION BY cookieid) AS pv3, --分组内所有行
MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4, --当前行+往前3行
MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5, --当前行+往前3行+往后1行
MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 --当前行+往后所有行
FROM test_data order by cookieid,createtime;
+-----------+-------------+-----+------+------+------+------+------+------+--+
| cookieid | createtime | pv | pv1 | pv2 | pv3 | pv4 | pv5 | pv6 |
+-----------+-------------+-----+------+------+------+------+------+------+--+
| cookie1 | 2015-04-10 | 1 | 1 | 1 | 7 | 1 | 5 | 7 |
| cookie1 | 2015-04-11 | 5 | 5 | 5 | 7 | 5 | 7 | 7 |
| cookie1 | 2015-04-12 | 7 | 7 | 7 | 7 | 7 | 7 | 7 |
| cookie1 | 2015-04-13 | 3 | 7 | 7 | 7 | 7 | 7 | 4 |
| cookie1 | 2015-04-14 | 2 | 7 | 7 | 7 | 7 | 7 | 4 |
| cookie1 | 2015-04-15 | 4 | 7 | 7 | 7 | 7 | 7 | 4 |
| cookie1 | 2015-04-16 | 4 | 7 | 7 | 7 | 4 | 4 | 4 |
| cookie2 | 2015-04-10 | 6 | 6 | 6 | 7 | 6 | 6 | 7 |
| cookie2 | 2015-04-11 | 5 | 6 | 6 | 7 | 6 | 7 | 7 |
| cookie2 | 2015-04-12 | 7 | 7 | 7 | 7 | 7 | 7 | 7 |
| cookie2 | 2015-04-13 | 4 | 7 | 7 | 7 | 7 | 7 | 5 |
| cookie2 | 2015-04-14 | 3 | 7 | 7 | 7 | 7 | 7 | 5 |
| cookie2 | 2015-04-15 | 5 | 7 | 7 | 7 | 7 | 7 | 5 |
| cookie2 | 2015-04-16 | 5 | 7 | 7 | 7 | 5 | 5 | 5 |
+-----------+-------------+-----+------+------+------+------+------+------+--+ SELECT cookieid,
createtime,
pv,
min(pv) OVER(PARTITION BY cookieid) AS min_pv,
max(pv) OVER(PARTITION BY cookieid) AS max_pv
FROM test_data;
+-----------+-------------+-----+---------+---------+--+
| cookieid | createtime | pv | min_pv | max_pv |
+-----------+-------------+-----+---------+---------+--+
| cookie1 | 2015-04-10 | 1 | 1 | 7 |
| cookie1 | 2015-04-16 | 4 | 1 | 7 |
| cookie1 | 2015-04-15 | 4 | 1 | 7 |
| cookie1 | 2015-04-14 | 2 | 1 | 7 |
| cookie1 | 2015-04-13 | 3 | 1 | 7 |
| cookie1 | 2015-04-12 | 7 | 1 | 7 |
| cookie1 | 2015-04-11 | 5 | 1 | 7 |
| cookie2 | 2015-04-16 | 5 | 3 | 7 |
| cookie2 | 2015-04-15 | 5 | 3 | 7 |
| cookie2 | 2015-04-14 | 3 | 3 | 7 |
| cookie2 | 2015-04-13 | 4 | 3 | 7 |
| cookie2 | 2015-04-12 | 7 | 3 | 7 |
| cookie2 | 2015-04-11 | 5 | 3 | 7 |
| cookie2 | 2015-04-10 | 6 | 3 | 7 |
+-----------+-------------+-----+---------+---------+--+

Hive函数:SUM,AVG,MIN,MAX的更多相关文章

  1. Hive分析窗口函数(一) SUM,AVG,MIN,MAX

    Hive分析窗口函数(一) SUM,AVG,MIN,MAX Hive分析窗口函数(一) SUM,AVG,MIN,MAX Hive中提供了越来越多的分析函数,用于完成负责的统计分析.抽时间将所有的分析窗 ...

  2. Hive学习之路 (十三)Hive分析窗口函数(一) SUM,AVG,MIN,MAX

    数据准备 数据格式 cookie1,, cookie1,, cookie1,, cookie1,, cookie1,, cookie1,, cookie1,, 创建数据库及表 create datab ...

  3. MybatisPlus Lambda表达式 聚合查询 分组查询 COUNT SUM AVG MIN MAX GroupBy

    一.序言 众所周知,MybatisPlus在处理单表DAO操作时非常的方便.在处理多表连接连接查询也有优雅的解决方案.今天分享MybatisPlus基于Lambda表达式优雅实现聚合分组查询. 由于视 ...

  4. C# 中奇妙的函数–6. 五个序列聚合运算(Sum, Average, Min, Max,Aggregate)

    今天,我们将着眼于五个用于序列的聚合运算.很多时候当我们在对序列进行操作时,我们想要做基于这些序列执行某种汇总然后,计算结果. Enumerable 静态类的LINQ扩展方法可以做到这一点 .就像之前 ...

  5. SQL模糊查询,sum,AVG,MAX,min函数

    cmd mysql -hlocalhost -uroot -p select * from emp where ename like '___' -- 三个横线, - 代表字符,可以查询 三个enam ...

  6. 三、函数 (SUM、MIN、MAX、COUNT、AVG)

    第八章 使用数据处理函数 8.1 函数 SQL支持利用函数来处理数据.函数一般是在数据上执行的,给数据的转换和处理提供了方便. 每一个DBMS都有特定的函数.只有少数几个函数被所有主要的DBMS等同的 ...

  7. LINQ to SQL Count/Sum/Min/Max/Avg Join

    public class Linq { MXSICEDataContext Db = new MXSICEDataContext(); // LINQ to SQL // Count/Sum/Min/ ...

  8. LINQ to SQL 语句(3) 之 Count/Sum/Min/Max/Avg

    LINQ  to SQL 语句(3) 之  Count/Sum/Min/Max/Avg [1] Count/Sum 讲解 [2] Min 讲解 [3] Max 讲解 [4] Average 和 Agg ...

  9. [转]LINQ语句之Select/Distinct和Count/Sum/Min/Max/Avg

    在讲述了LINQ,顺便说了一下Where操作,这篇开始我们继续说LINQ语句,目的让大家从语句的角度了解LINQ,LINQ包括LINQ to Objects.LINQ to DataSets.LINQ ...

随机推荐

  1. Linux新手的最佳包管理器

    一个 Linux 新用户应该知道他或她的进步源自于对 Linux 发行版的使用,而 Linux 发行版有好几种,并以不同的方式管理软件包. 在 Linux开发 中,包管理器非常重要,知道如何使用多种包 ...

  2. 8.String StringBuffer StringBuilder

    http://www.cnblogs.com/xiohao/p/4271140.html

  3. 笔记:Spring Cloud Zuul 快速入门

    Spring Cloud Zuul 实现了路由规则与实例的维护问题,通过 Spring Cloud Eureka 进行整合,将自身注册为 Eureka 服务治理下的应用,同时从 Eureka 中获取了 ...

  4. 笔记:JDBC 数据库

    数据库 URL 在连接数据库时,我们必须使用各种与数据库类型相关的参数,例如主机名.端口号和数据库名称等,JDBC使用了一种与普通URL相类似的语法来描述数据库,JDBC URL 一般语法为: jdb ...

  5. Redis 慢日志

    redis的slowlog是redis用于记录记录慢查询执行时间的日志系统.由于slowlog只保存在内存中,因此slowlog的效率很高,完全不用担心会影响到redis的性能.Slowlog是Red ...

  6. mysql新手入门随笔2

    17.创建表 CREATE TABLE tbname(columnname1 类型 约束条件, columnname2 类型 约束条件,-); 三大类型:数值型,时间日期型,字符串类型 六大约束条件: ...

  7. 光环国际联合阿里云推出“AI智客计划”

    2018阿里巴巴云栖大会深圳峰会3月28日.29日在大中华喜来登酒店举行,阿里云全面展示智能城市.智能汽车.智能生活.智能制造等产业创新. 3月28日下午,以"深化产教融合,科技赋能育人才& ...

  8. WCF跨域解决方法及一些零碎的东西。

    之前发过一篇随笔,说的WCF配置文件配置问题.里面也配了跨域支持,但是jsoncollback只支持Get请求,Post请求是解决不了,所以这里把真正的WCF跨域问题贴出来. 话不多说,直接帖配置文件 ...

  9. [日常] 最近的一些破事w...

    更新博文一篇以示诈尸(大雾 (其实只是断了个网然后就彻底失踪了一波w...连题解都没法写了QAQ) $ \tiny{诈尸的实际情况是老姚提前走还把十一机房门锁了然而钥匙在联赛的时候就还了于是并不能进去 ...

  10. Hibernate学习笔记三 多表

    一对多|多对一 表中的表达 实体中的表达 实体代码: package com.yyb.domain; import java.util.HashSet; import java.util.Set; p ...