hive 窗口分析函数

0: jdbc:hive2://localhost:10000> select * from t_access;
+----------------+---------------------------------+-----------------------+--------------+--+
| t_access.ip | t_access.url | t_access.access_time | t_access.dt |
+----------------+---------------------------------+-----------------------+--------------+--+
| 192.168.33.3 | http://www.xxx.ccc.aa/stu | 2017-08-04 15:30:20 | 20170804 |
| 192.168.33.3 | http://www.xxx.ccc.aa/teach | 2017-08-04 15:35:20 | 20170804 |
| 192.168.33.4 | http://www.xxx.ccc.aa/stu | 2017-08-04 15:30:20 | 20170804 |
| 192.168.33.4 | http://www.xxx.ccc.aa/job | 2017-08-04 16:30:20 | 20170804 |
| 192.168.33.5 | http://www.xxx.ccc.aa/job | 2017-08-04 15:40:20 | 20170804 |
| 192.168.33.3 | http://www.xxx.ccc.aa/stu | 2017-08-05 15:30:20 | 20170805 |
| 192.168.44.3 | http://www.xxx.ccc.aa/teach | 2017-08-05 15:35:20 | 20170805 |
| 192.168.33.44 | http://www.xxx.ccc.aa/stu | 2017-08-05 15:30:20 | 20170805 |
| 192.168.33.46 | http://www.xxx.ccc.aa/job | 2017-08-05 16:30:20 | 20170805 |
| 192.168.33.55 | http://www.xxx.ccc.aa/job | 2017-08-05 15:40:20 | 20170805 |
| 192.168.133.3 | http://www.xxx.ccc.aa/register | 2017-08-06 15:30:20 | 20170806 |
| 192.168.111.3 | http://www.xxx.ccc.aa/register | 2017-08-06 15:35:20 | 20170806 |
| 192.168.34.44 | http://www.xxx.ccc.aa/pay | 2017-08-06 15:30:20 | 20170806 |
| 192.168.33.46 | http://www.xxx.ccc.aa/excersize | 2017-08-06 16:30:20 | 20170806 |
| 192.168.33.55 | http://www.xxx.ccc.aa/job | 2017-08-06 15:40:20 | 20170806 |
| 192.168.33.46 | http://www.xxx.ccc.aa/excersize | 2017-08-06 16:30:20 | 20170806 |
| 192.168.33.25 | http://www.xxx.ccc.aa/job | 2017-08-06 15:40:20 | 20170806 |
| 192.168.33.36 | http://www.xxx.ccc.aa/excersize | 2017-08-06 16:30:20 | 20170806 |
| 192.168.33.55 | http://www.xxx.ccc.aa/job | 2017-08-06 15:40:20 | 20170806 |
+----------------+---------------------------------+-----------------------+--------------+--+ ## LAG函数
select ip,url,access_time,
row_number() over(partition by ip order by access_time) as rn,
lag(access_time,1,0) over(partition by ip order by access_time)as last_access_time
from t_access; +----------------+---------------------------------+----------------------+-----+----------------------+--+
| ip | url | access_time | rn | last_access_time |
+----------------+---------------------------------+----------------------+-----+----------------------+--+
| 192.168.111.3 | http://www.xxx.ccc.aa/register | 2017-08-06 15:35:20 | 1 | 0 |
| 192.168.133.3 | http://www.xxx.ccc.aa/register | 2017-08-06 15:30:20 | 1 | 0 |
| 192.168.33.25 | http://www.xxx.ccc.aa/job | 2017-08-06 15:40:20 | 1 | 0 |
| 192.168.33.3 | http://www.xxx.ccc.aa/stu | 2017-08-04 15:30:20 | 1 | 0 |
| 192.168.33.3 | http://www.xxx.ccc.aa/teach | 2017-08-04 15:35:20 | 2 | 2017-08-04 15:30:20 |
| 192.168.33.3 | http://www.xxx.ccc.aa/stu | 2017-08-05 15:30:20 | 3 | 2017-08-04 15:35:20 |
| 192.168.33.36 | http://www.xxx.ccc.aa/excersize | 2017-08-06 16:30:20 | 1 | 0 |
| 192.168.33.4 | http://www.xxx.ccc.aa/stu | 2017-08-04 15:30:20 | 1 | 0 |
| 192.168.33.4 | http://www.xxx.ccc.aa/job | 2017-08-04 16:30:20 | 2 | 2017-08-04 15:30:20 |
| 192.168.33.44 | http://www.xxx.ccc.aa/stu | 2017-08-05 15:30:20 | 1 | 0 |
| 192.168.33.46 | http://www.xxx.ccc.aa/job | 2017-08-05 16:30:20 | 1 | 0 |
| 192.168.33.46 | http://www.xxx.ccc.aa/excersize | 2017-08-06 16:30:20 | 2 | 2017-08-05 16:30:20 |
| 192.168.33.46 | http://www.xxx.ccc.aa/excersize | 2017-08-06 16:30:20 | 3 | 2017-08-06 16:30:20 |
| 192.168.33.5 | http://www.xxx.ccc.aa/job | 2017-08-04 15:40:20 | 1 | 0 |
| 192.168.33.55 | http://www.xxx.ccc.aa/job | 2017-08-05 15:40:20 | 1 | 0 |
| 192.168.33.55 | http://www.xxx.ccc.aa/job | 2017-08-06 15:40:20 | 2 | 2017-08-05 15:40:20 |
| 192.168.33.55 | http://www.xxx.ccc.aa/job | 2017-08-06 15:40:20 | 3 | 2017-08-06 15:40:20 |
| 192.168.34.44 | http://www.xxx.ccc.aa/pay | 2017-08-06 15:30:20 | 1 | 0 |
| 192.168.44.3 | http://www.xxx.ccc.aa/teach | 2017-08-05 15:35:20 | 1 | 0 |
+----------------+---------------------------------+----------------------+-----+----------------------+--+ ## LEAD函数
select ip,url,access_time,
row_number() over(partition by ip order by access_time) as rn,
lead(access_time,1,0) over(partition by ip order by access_time)as last_access_time
from t_access;
+----------------+---------------------------------+----------------------+-----+----------------------+--+
| ip | url | access_time | rn | last_access_time |
+----------------+---------------------------------+----------------------+-----+----------------------+--+
| 192.168.111.3 | http://www.xxx.ccc.aa/register | 2017-08-06 15:35:20 | 1 | 0 |
| 192.168.133.3 | http://www.xxx.ccc.aa/register | 2017-08-06 15:30:20 | 1 | 0 |
| 192.168.33.25 | http://www.xxx.ccc.aa/job | 2017-08-06 15:40:20 | 1 | 0 |
| 192.168.33.3 | http://www.xxx.ccc.aa/stu | 2017-08-04 15:30:20 | 1 | 2017-08-04 15:35:20 |
| 192.168.33.3 | http://www.xxx.ccc.aa/teach | 2017-08-04 15:35:20 | 2 | 2017-08-05 15:30:20 |
| 192.168.33.3 | http://www.xxx.ccc.aa/stu | 2017-08-05 15:30:20 | 3 | 0 |
| 192.168.33.36 | http://www.xxx.ccc.aa/excersize | 2017-08-06 16:30:20 | 1 | 0 |
| 192.168.33.4 | http://www.xxx.ccc.aa/stu | 2017-08-04 15:30:20 | 1 | 2017-08-04 16:30:20 |
| 192.168.33.4 | http://www.xxx.ccc.aa/job | 2017-08-04 16:30:20 | 2 | 0 |
| 192.168.33.44 | http://www.xxx.ccc.aa/stu | 2017-08-05 15:30:20 | 1 | 0 |
| 192.168.33.46 | http://www.xxx.ccc.aa/job | 2017-08-05 16:30:20 | 1 | 2017-08-06 16:30:20 |
| 192.168.33.46 | http://www.xxx.ccc.aa/excersize | 2017-08-06 16:30:20 | 2 | 2017-08-06 16:30:20 |
| 192.168.33.46 | http://www.xxx.ccc.aa/excersize | 2017-08-06 16:30:20 | 3 | 0 |
| 192.168.33.5 | http://www.xxx.ccc.aa/job | 2017-08-04 15:40:20 | 1 | 0 |
| 192.168.33.55 | http://www.xxx.ccc.aa/job | 2017-08-05 15:40:20 | 1 | 2017-08-06 15:40:20 |
| 192.168.33.55 | http://www.xxx.ccc.aa/job | 2017-08-06 15:40:20 | 2 | 2017-08-06 15:40:20 |
| 192.168.33.55 | http://www.xxx.ccc.aa/job | 2017-08-06 15:40:20 | 3 | 0 |
| 192.168.34.44 | http://www.xxx.ccc.aa/pay | 2017-08-06 15:30:20 | 1 | 0 |
| 192.168.44.3 | http://www.xxx.ccc.aa/teach | 2017-08-05 15:35:20 | 1 | 0 |
+----------------+---------------------------------+----------------------+-----+----------------------+--+ ## FIRST_VALUE 函数
例:取每个用户访问的第一个页面
select ip,url,access_time,
row_number() over(partition by ip order by access_time) as rn,
first_value(url) over(partition by ip order by access_time rows between unbounded preceding and unbounded following)as last_access_time
from t_access;
+----------------+---------------------------------+----------------------+-----+---------------------------------+--+
| ip | url | access_time | rn | last_access_time |
+----------------+---------------------------------+----------------------+-----+---------------------------------+--+
| 192.168.111.3 | http://www.xxx.ccc.aa/register | 2017-08-06 15:35:20 | 1 | http://www.xxx.ccc.aa/register |
| 192.168.133.3 | http://www.xxx.ccc.aa/register | 2017-08-06 15:30:20 | 1 | http://www.xxx.ccc.aa/register |
| 192.168.33.25 | http://www.xxx.ccc.aa/job | 2017-08-06 15:40:20 | 1 | http://www.xxx.ccc.aa/job |
| 192.168.33.3 | http://www.xxx.ccc.aa/stu | 2017-08-04 15:30:20 | 1 | http://www.xxx.ccc.aa/stu |
| 192.168.33.3 | http://www.xxx.ccc.aa/teach | 2017-08-04 15:35:20 | 2 | http://www.xxx.ccc.aa/stu |
| 192.168.33.3 | http://www.xxx.ccc.aa/stu | 2017-08-05 15:30:20 | 3 | http://www.xxx.ccc.aa/stu |
| 192.168.33.36 | http://www.xxx.ccc.aa/excersize | 2017-08-06 16:30:20 | 1 | http://www.xxx.ccc.aa/excersize |
| 192.168.33.4 | http://www.xxx.ccc.aa/stu | 2017-08-04 15:30:20 | 1 | http://www.xxx.ccc.aa/stu |
| 192.168.33.4 | http://www.xxx.ccc.aa/job | 2017-08-04 16:30:20 | 2 | http://www.xxx.ccc.aa/stu |
| 192.168.33.44 | http://www.xxx.ccc.aa/stu | 2017-08-05 15:30:20 | 1 | http://www.xxx.ccc.aa/stu |
| 192.168.33.46 | http://www.xxx.ccc.aa/job | 2017-08-05 16:30:20 | 1 | http://www.xxx.ccc.aa/job |
| 192.168.33.46 | http://www.xxx.ccc.aa/excersize | 2017-08-06 16:30:20 | 2 | http://www.xxx.ccc.aa/job |
| 192.168.33.46 | http://www.xxx.ccc.aa/excersize | 2017-08-06 16:30:20 | 3 | http://www.xxx.ccc.aa/job |
| 192.168.33.5 | http://www.xxx.ccc.aa/job | 2017-08-04 15:40:20 | 1 | http://www.xxx.ccc.aa/job |
| 192.168.33.55 | http://www.xxx.ccc.aa/job | 2017-08-05 15:40:20 | 1 | http://www.xxx.ccc.aa/job |
| 192.168.33.55 | http://www.xxx.ccc.aa/job | 2017-08-06 15:40:20 | 2 | http://www.xxx.ccc.aa/job |
| 192.168.33.55 | http://www.xxx.ccc.aa/job | 2017-08-06 15:40:20 | 3 | http://www.xxx.ccc.aa/job |
| 192.168.34.44 | http://www.xxx.ccc.aa/pay | 2017-08-06 15:30:20 | 1 | http://www.xxx.ccc.aa/pay |
| 192.168.44.3 | http://www.xxx.ccc.aa/teach | 2017-08-05 15:35:20 | 1 | http://www.xxx.ccc.aa/teach |
+----------------+---------------------------------+----------------------+-----+---------------------------------+--+ ## LAST_VALUE 函数
例:取每个用户访问的最后一个页面
select ip,url,access_time,
row_number() over(partition by ip order by access_time) as rn,
last_value(url) over(partition by ip order by access_time rows between unbounded preceding and unbounded following)as last_access_time
from t_access;
+----------------+---------------------------------+----------------------+-----+---------------------------------+--+
| ip | url | access_time | rn | last_access_time |
+----------------+---------------------------------+----------------------+-----+---------------------------------+--+
| 192.168.111.3 | http://www.xxx.ccc.aa/register | 2017-08-06 15:35:20 | 1 | http://www.xxx.ccc.aa/register |
| 192.168.133.3 | http://www.xxx.ccc.aa/register | 2017-08-06 15:30:20 | 1 | http://www.xxx.ccc.aa/register |
| 192.168.33.25 | http://www.xxx.ccc.aa/job | 2017-08-06 15:40:20 | 1 | http://www.xxx.ccc.aa/job |
| 192.168.33.3 | http://www.xxx.ccc.aa/stu | 2017-08-04 15:30:20 | 1 | http://www.xxx.ccc.aa/stu |
| 192.168.33.3 | http://www.xxx.ccc.aa/teach | 2017-08-04 15:35:20 | 2 | http://www.xxx.ccc.aa/stu |
| 192.168.33.3 | http://www.xxx.ccc.aa/stu | 2017-08-05 15:30:20 | 3 | http://www.xxx.ccc.aa/stu |
| 192.168.33.36 | http://www.xxx.ccc.aa/excersize | 2017-08-06 16:30:20 | 1 | http://www.xxx.ccc.aa/excersize |
| 192.168.33.4 | http://www.xxx.ccc.aa/stu | 2017-08-04 15:30:20 | 1 | http://www.xxx.ccc.aa/stu |
| 192.168.33.4 | http://www.xxx.ccc.aa/job | 2017-08-04 16:30:20 | 2 | http://www.xxx.ccc.aa/stu |
| 192.168.33.44 | http://www.xxx.ccc.aa/stu | 2017-08-05 15:30:20 | 1 | http://www.xxx.ccc.aa/stu |
| 192.168.33.46 | http://www.xxx.ccc.aa/job | 2017-08-05 16:30:20 | 1 | http://www.xxx.ccc.aa/job |
| 192.168.33.46 | http://www.xxx.ccc.aa/excersize | 2017-08-06 16:30:20 | 2 | http://www.xxx.ccc.aa/job |
| 192.168.33.46 | http://www.xxx.ccc.aa/excersize | 2017-08-06 16:30:20 | 3 | http://www.xxx.ccc.aa/job |
| 192.168.33.5 | http://www.xxx.ccc.aa/job | 2017-08-04 15:40:20 | 1 | http://www.xxx.ccc.aa/job |
| 192.168.33.55 | http://www.xxx.ccc.aa/job | 2017-08-05 15:40:20 | 1 | http://www.xxx.ccc.aa/job |
| 192.168.33.55 | http://www.xxx.ccc.aa/job | 2017-08-06 15:40:20 | 2 | http://www.xxx.ccc.aa/job |
| 192.168.33.55 | http://www.xxx.ccc.aa/job | 2017-08-06 15:40:20 | 3 | http://www.xxx.ccc.aa/job |
| 192.168.34.44 | http://www.xxx.ccc.aa/pay | 2017-08-06 15:30:20 | 1 | http://www.xxx.ccc.aa/pay |
| 192.168.44.3 | http://www.xxx.ccc.aa/teach | 2017-08-05 15:35:20 | 1 | http://www.xxx.ccc.aa/teach |
+----------------+---------------------------------+----------------------+-----+---------------------------------+--+ /*
累计报表--分析函数实现版
*/
-- sum() over() 函数
select id
,month
,sum(amount) over(partition by id order by month rows between unbounded preceding and current row)
from
(select id,month,
sum(fee) as amount
from t_test
group by id,month) tmp;

Hive—简单窗口分析函数的更多相关文章

  1. hive中窗口分析函数

    分组统计 1. groups sets(field1,field2,field3, (field1,field2)) 样例如下: select dt,tenantCode,nvl(platform,' ...

  2. pyqt5之简单窗口的创建

    在学完tkinter后,发现tkinter在布局方面特别的不方便(Tkinter资料:http://effbot.org/tkinterbook/tkinter-index.htm),因此学习pyqt ...

  3. 雷林鹏分享:jQuery EasyUI 窗口 - 创建简单窗口

    jQuery EasyUI 窗口 - 创建简单窗口 创建一个窗口(window)非常简单,我们创建一个 DIV 标记: Some Content. 现在运行测试页面,您会看见一个窗口(window)显 ...

  4. OpenGL学习 (一) - 简单窗口绘制

    一.OpenGL 简介 OpenGL 本质: OpenGL(Open Graphics Library),通常可以认为是API,其包含了一系列可以操作图形.图像的函数.但深究下来,它是由Khronos ...

  5. Hive 窗口分析函数

    1.窗口函数 1.LAG(col,n,DEFAULT) 用于统计窗口内往上第n行值 第一个参数为列名,第二个参数为往上第n行(可选,默认为1),第三个参数为默认值(当往上第n行为NULL时候,取默认值 ...

  6. hive row_number等窗口分析函数

    一.排序&去重分析 row_number() over(partititon by col1 order by col2) as rn 结果:1,2,3,4 rank() over(parti ...

  7. Hive 窗口函数、分析函数

    1 分析函数:用于等级.百分点.n分片等 Ntile 是Hive很强大的一个分析函数. 可以看成是:它把有序的数据集合 平均分配 到 指定的数量(num)个桶中, 将桶号分配给每一行.如果不能平均分配 ...

  8. Windows程序设计笔记(二) 关于编写简单窗口程序中的几点疑惑

    在编写窗口程序时主要是5个步骤,创建窗口类.注册窗口类.创建窗口.显示窗口.消息环的编写.对于这5个步骤为何要这样写,当初我不是太理解,学习到现在有些问题我基本上已经找到了答案,同时对于Windows ...

  9. hive:排序分析函数

    基本排序函数 语法: rank()over([partition by col1] order by col2) dense_rank()over([partition by col1] order ...

随机推荐

  1. 【Hibernate学习笔记-5.2】使用@Temporal修饰日期类型的属性

    作者:ssslinppp       1. 摘要 关于日期类型,Java和数据库表示的方法不同: Java:只有java.util.Date和java.util.Calender两种: 数据库:dat ...

  2. 【XMLHttpRequest】获取XMLHttpRequest

    // 获取http请求 function getXMLHttpRequest() { req = false; //本地XMLHttpRequest对象 if (window.XMLHttpReque ...

  3. MySQL主从同步和半同步配置

    mysql主从配置: 1,安装maraidb,使用国内yum镜像站下载:[root@localhost mysql]# cat /etc/yum.repos.d/MairaDB.repo # Mari ...

  4. 浏览器缩放导致的样式bug

    缩放75% 这种问题修改的话 要兼顾多种浏览器,并且有些地方样式是要求写死的,修改成本会比较大,所以一般是不会去处理的

  5. java中使HttpDelete可以发送body信息

    java中使HttpDelete可以发送body信息RESTful api中用到了DELETE方法,android开发的同事遇到了问题,使用HttpDelete执行DELETE操作的时候,不能携带bo ...

  6. nsenter工具进入docker容器

    对于运行在后台的Docker容器,我们经常需要做的事情是进入到容器中,docker为我们提供了docker exec .docker attach 命令,并且还提供了nsenter工具,外部工具供我们 ...

  7. 当vcenter是linux版本的时候Sysprep存放路径

    为 VMware vCenter Server Appliance 安装 Microsoft Sysprep 工具在从 Microsoft 网站下载并安装 Microsoft Sysprep 工具之后 ...

  8. python + docker, 实现天气数据 从FTP获取以及持久化(五)-- 利用 Docker 容器化 Python 程序

    背景 不知不觉中,我们已经完成了所有的编程工作.接下来,我们需要把 Python 程序 做 容器化 (Docker)部署. 思考 考虑到项目的实际情况,“持久化天气”的功能将会是一个独立的功能模块发布 ...

  9. 新手搭建 nginx + php (LNMP)

    配置源 纯净的Centos 6.5系统 配置163yum源 (这个比较简单,百度能解决很多问题) 开始 安装  开发软件包:yum  -y groupinstall  "Developmen ...

  10. 【洛谷】P1341 无序字母对(欧拉回路)

    题目 传送门:QWQ 分析 快把欧拉回路忘光了. 欧拉回路大概就是一笔画的问题,可不可以一笔画完全图. 全图有欧拉回路当且仅当全图的奇数度数的点有0或2个. 2个时是一个点是起点,另一个是终点. 本题 ...