hive 窗口分析函数

0: jdbc:hive2://localhost:10000> select * from t_access;
+----------------+---------------------------------+-----------------------+--------------+--+
| t_access.ip | t_access.url | t_access.access_time | t_access.dt |
+----------------+---------------------------------+-----------------------+--------------+--+
| 192.168.33.3 | http://www.xxx.ccc.aa/stu | 2017-08-04 15:30:20 | 20170804 |
| 192.168.33.3 | http://www.xxx.ccc.aa/teach | 2017-08-04 15:35:20 | 20170804 |
| 192.168.33.4 | http://www.xxx.ccc.aa/stu | 2017-08-04 15:30:20 | 20170804 |
| 192.168.33.4 | http://www.xxx.ccc.aa/job | 2017-08-04 16:30:20 | 20170804 |
| 192.168.33.5 | http://www.xxx.ccc.aa/job | 2017-08-04 15:40:20 | 20170804 |
| 192.168.33.3 | http://www.xxx.ccc.aa/stu | 2017-08-05 15:30:20 | 20170805 |
| 192.168.44.3 | http://www.xxx.ccc.aa/teach | 2017-08-05 15:35:20 | 20170805 |
| 192.168.33.44 | http://www.xxx.ccc.aa/stu | 2017-08-05 15:30:20 | 20170805 |
| 192.168.33.46 | http://www.xxx.ccc.aa/job | 2017-08-05 16:30:20 | 20170805 |
| 192.168.33.55 | http://www.xxx.ccc.aa/job | 2017-08-05 15:40:20 | 20170805 |
| 192.168.133.3 | http://www.xxx.ccc.aa/register | 2017-08-06 15:30:20 | 20170806 |
| 192.168.111.3 | http://www.xxx.ccc.aa/register | 2017-08-06 15:35:20 | 20170806 |
| 192.168.34.44 | http://www.xxx.ccc.aa/pay | 2017-08-06 15:30:20 | 20170806 |
| 192.168.33.46 | http://www.xxx.ccc.aa/excersize | 2017-08-06 16:30:20 | 20170806 |
| 192.168.33.55 | http://www.xxx.ccc.aa/job | 2017-08-06 15:40:20 | 20170806 |
| 192.168.33.46 | http://www.xxx.ccc.aa/excersize | 2017-08-06 16:30:20 | 20170806 |
| 192.168.33.25 | http://www.xxx.ccc.aa/job | 2017-08-06 15:40:20 | 20170806 |
| 192.168.33.36 | http://www.xxx.ccc.aa/excersize | 2017-08-06 16:30:20 | 20170806 |
| 192.168.33.55 | http://www.xxx.ccc.aa/job | 2017-08-06 15:40:20 | 20170806 |
+----------------+---------------------------------+-----------------------+--------------+--+ ## LAG函数
select ip,url,access_time,
row_number() over(partition by ip order by access_time) as rn,
lag(access_time,1,0) over(partition by ip order by access_time)as last_access_time
from t_access; +----------------+---------------------------------+----------------------+-----+----------------------+--+
| ip | url | access_time | rn | last_access_time |
+----------------+---------------------------------+----------------------+-----+----------------------+--+
| 192.168.111.3 | http://www.xxx.ccc.aa/register | 2017-08-06 15:35:20 | 1 | 0 |
| 192.168.133.3 | http://www.xxx.ccc.aa/register | 2017-08-06 15:30:20 | 1 | 0 |
| 192.168.33.25 | http://www.xxx.ccc.aa/job | 2017-08-06 15:40:20 | 1 | 0 |
| 192.168.33.3 | http://www.xxx.ccc.aa/stu | 2017-08-04 15:30:20 | 1 | 0 |
| 192.168.33.3 | http://www.xxx.ccc.aa/teach | 2017-08-04 15:35:20 | 2 | 2017-08-04 15:30:20 |
| 192.168.33.3 | http://www.xxx.ccc.aa/stu | 2017-08-05 15:30:20 | 3 | 2017-08-04 15:35:20 |
| 192.168.33.36 | http://www.xxx.ccc.aa/excersize | 2017-08-06 16:30:20 | 1 | 0 |
| 192.168.33.4 | http://www.xxx.ccc.aa/stu | 2017-08-04 15:30:20 | 1 | 0 |
| 192.168.33.4 | http://www.xxx.ccc.aa/job | 2017-08-04 16:30:20 | 2 | 2017-08-04 15:30:20 |
| 192.168.33.44 | http://www.xxx.ccc.aa/stu | 2017-08-05 15:30:20 | 1 | 0 |
| 192.168.33.46 | http://www.xxx.ccc.aa/job | 2017-08-05 16:30:20 | 1 | 0 |
| 192.168.33.46 | http://www.xxx.ccc.aa/excersize | 2017-08-06 16:30:20 | 2 | 2017-08-05 16:30:20 |
| 192.168.33.46 | http://www.xxx.ccc.aa/excersize | 2017-08-06 16:30:20 | 3 | 2017-08-06 16:30:20 |
| 192.168.33.5 | http://www.xxx.ccc.aa/job | 2017-08-04 15:40:20 | 1 | 0 |
| 192.168.33.55 | http://www.xxx.ccc.aa/job | 2017-08-05 15:40:20 | 1 | 0 |
| 192.168.33.55 | http://www.xxx.ccc.aa/job | 2017-08-06 15:40:20 | 2 | 2017-08-05 15:40:20 |
| 192.168.33.55 | http://www.xxx.ccc.aa/job | 2017-08-06 15:40:20 | 3 | 2017-08-06 15:40:20 |
| 192.168.34.44 | http://www.xxx.ccc.aa/pay | 2017-08-06 15:30:20 | 1 | 0 |
| 192.168.44.3 | http://www.xxx.ccc.aa/teach | 2017-08-05 15:35:20 | 1 | 0 |
+----------------+---------------------------------+----------------------+-----+----------------------+--+ ## LEAD函数
select ip,url,access_time,
row_number() over(partition by ip order by access_time) as rn,
lead(access_time,1,0) over(partition by ip order by access_time)as last_access_time
from t_access;
+----------------+---------------------------------+----------------------+-----+----------------------+--+
| ip | url | access_time | rn | last_access_time |
+----------------+---------------------------------+----------------------+-----+----------------------+--+
| 192.168.111.3 | http://www.xxx.ccc.aa/register | 2017-08-06 15:35:20 | 1 | 0 |
| 192.168.133.3 | http://www.xxx.ccc.aa/register | 2017-08-06 15:30:20 | 1 | 0 |
| 192.168.33.25 | http://www.xxx.ccc.aa/job | 2017-08-06 15:40:20 | 1 | 0 |
| 192.168.33.3 | http://www.xxx.ccc.aa/stu | 2017-08-04 15:30:20 | 1 | 2017-08-04 15:35:20 |
| 192.168.33.3 | http://www.xxx.ccc.aa/teach | 2017-08-04 15:35:20 | 2 | 2017-08-05 15:30:20 |
| 192.168.33.3 | http://www.xxx.ccc.aa/stu | 2017-08-05 15:30:20 | 3 | 0 |
| 192.168.33.36 | http://www.xxx.ccc.aa/excersize | 2017-08-06 16:30:20 | 1 | 0 |
| 192.168.33.4 | http://www.xxx.ccc.aa/stu | 2017-08-04 15:30:20 | 1 | 2017-08-04 16:30:20 |
| 192.168.33.4 | http://www.xxx.ccc.aa/job | 2017-08-04 16:30:20 | 2 | 0 |
| 192.168.33.44 | http://www.xxx.ccc.aa/stu | 2017-08-05 15:30:20 | 1 | 0 |
| 192.168.33.46 | http://www.xxx.ccc.aa/job | 2017-08-05 16:30:20 | 1 | 2017-08-06 16:30:20 |
| 192.168.33.46 | http://www.xxx.ccc.aa/excersize | 2017-08-06 16:30:20 | 2 | 2017-08-06 16:30:20 |
| 192.168.33.46 | http://www.xxx.ccc.aa/excersize | 2017-08-06 16:30:20 | 3 | 0 |
| 192.168.33.5 | http://www.xxx.ccc.aa/job | 2017-08-04 15:40:20 | 1 | 0 |
| 192.168.33.55 | http://www.xxx.ccc.aa/job | 2017-08-05 15:40:20 | 1 | 2017-08-06 15:40:20 |
| 192.168.33.55 | http://www.xxx.ccc.aa/job | 2017-08-06 15:40:20 | 2 | 2017-08-06 15:40:20 |
| 192.168.33.55 | http://www.xxx.ccc.aa/job | 2017-08-06 15:40:20 | 3 | 0 |
| 192.168.34.44 | http://www.xxx.ccc.aa/pay | 2017-08-06 15:30:20 | 1 | 0 |
| 192.168.44.3 | http://www.xxx.ccc.aa/teach | 2017-08-05 15:35:20 | 1 | 0 |
+----------------+---------------------------------+----------------------+-----+----------------------+--+ ## FIRST_VALUE 函数
例:取每个用户访问的第一个页面
select ip,url,access_time,
row_number() over(partition by ip order by access_time) as rn,
first_value(url) over(partition by ip order by access_time rows between unbounded preceding and unbounded following)as last_access_time
from t_access;
+----------------+---------------------------------+----------------------+-----+---------------------------------+--+
| ip | url | access_time | rn | last_access_time |
+----------------+---------------------------------+----------------------+-----+---------------------------------+--+
| 192.168.111.3 | http://www.xxx.ccc.aa/register | 2017-08-06 15:35:20 | 1 | http://www.xxx.ccc.aa/register |
| 192.168.133.3 | http://www.xxx.ccc.aa/register | 2017-08-06 15:30:20 | 1 | http://www.xxx.ccc.aa/register |
| 192.168.33.25 | http://www.xxx.ccc.aa/job | 2017-08-06 15:40:20 | 1 | http://www.xxx.ccc.aa/job |
| 192.168.33.3 | http://www.xxx.ccc.aa/stu | 2017-08-04 15:30:20 | 1 | http://www.xxx.ccc.aa/stu |
| 192.168.33.3 | http://www.xxx.ccc.aa/teach | 2017-08-04 15:35:20 | 2 | http://www.xxx.ccc.aa/stu |
| 192.168.33.3 | http://www.xxx.ccc.aa/stu | 2017-08-05 15:30:20 | 3 | http://www.xxx.ccc.aa/stu |
| 192.168.33.36 | http://www.xxx.ccc.aa/excersize | 2017-08-06 16:30:20 | 1 | http://www.xxx.ccc.aa/excersize |
| 192.168.33.4 | http://www.xxx.ccc.aa/stu | 2017-08-04 15:30:20 | 1 | http://www.xxx.ccc.aa/stu |
| 192.168.33.4 | http://www.xxx.ccc.aa/job | 2017-08-04 16:30:20 | 2 | http://www.xxx.ccc.aa/stu |
| 192.168.33.44 | http://www.xxx.ccc.aa/stu | 2017-08-05 15:30:20 | 1 | http://www.xxx.ccc.aa/stu |
| 192.168.33.46 | http://www.xxx.ccc.aa/job | 2017-08-05 16:30:20 | 1 | http://www.xxx.ccc.aa/job |
| 192.168.33.46 | http://www.xxx.ccc.aa/excersize | 2017-08-06 16:30:20 | 2 | http://www.xxx.ccc.aa/job |
| 192.168.33.46 | http://www.xxx.ccc.aa/excersize | 2017-08-06 16:30:20 | 3 | http://www.xxx.ccc.aa/job |
| 192.168.33.5 | http://www.xxx.ccc.aa/job | 2017-08-04 15:40:20 | 1 | http://www.xxx.ccc.aa/job |
| 192.168.33.55 | http://www.xxx.ccc.aa/job | 2017-08-05 15:40:20 | 1 | http://www.xxx.ccc.aa/job |
| 192.168.33.55 | http://www.xxx.ccc.aa/job | 2017-08-06 15:40:20 | 2 | http://www.xxx.ccc.aa/job |
| 192.168.33.55 | http://www.xxx.ccc.aa/job | 2017-08-06 15:40:20 | 3 | http://www.xxx.ccc.aa/job |
| 192.168.34.44 | http://www.xxx.ccc.aa/pay | 2017-08-06 15:30:20 | 1 | http://www.xxx.ccc.aa/pay |
| 192.168.44.3 | http://www.xxx.ccc.aa/teach | 2017-08-05 15:35:20 | 1 | http://www.xxx.ccc.aa/teach |
+----------------+---------------------------------+----------------------+-----+---------------------------------+--+ ## LAST_VALUE 函数
例:取每个用户访问的最后一个页面
select ip,url,access_time,
row_number() over(partition by ip order by access_time) as rn,
last_value(url) over(partition by ip order by access_time rows between unbounded preceding and unbounded following)as last_access_time
from t_access;
+----------------+---------------------------------+----------------------+-----+---------------------------------+--+
| ip | url | access_time | rn | last_access_time |
+----------------+---------------------------------+----------------------+-----+---------------------------------+--+
| 192.168.111.3 | http://www.xxx.ccc.aa/register | 2017-08-06 15:35:20 | 1 | http://www.xxx.ccc.aa/register |
| 192.168.133.3 | http://www.xxx.ccc.aa/register | 2017-08-06 15:30:20 | 1 | http://www.xxx.ccc.aa/register |
| 192.168.33.25 | http://www.xxx.ccc.aa/job | 2017-08-06 15:40:20 | 1 | http://www.xxx.ccc.aa/job |
| 192.168.33.3 | http://www.xxx.ccc.aa/stu | 2017-08-04 15:30:20 | 1 | http://www.xxx.ccc.aa/stu |
| 192.168.33.3 | http://www.xxx.ccc.aa/teach | 2017-08-04 15:35:20 | 2 | http://www.xxx.ccc.aa/stu |
| 192.168.33.3 | http://www.xxx.ccc.aa/stu | 2017-08-05 15:30:20 | 3 | http://www.xxx.ccc.aa/stu |
| 192.168.33.36 | http://www.xxx.ccc.aa/excersize | 2017-08-06 16:30:20 | 1 | http://www.xxx.ccc.aa/excersize |
| 192.168.33.4 | http://www.xxx.ccc.aa/stu | 2017-08-04 15:30:20 | 1 | http://www.xxx.ccc.aa/stu |
| 192.168.33.4 | http://www.xxx.ccc.aa/job | 2017-08-04 16:30:20 | 2 | http://www.xxx.ccc.aa/stu |
| 192.168.33.44 | http://www.xxx.ccc.aa/stu | 2017-08-05 15:30:20 | 1 | http://www.xxx.ccc.aa/stu |
| 192.168.33.46 | http://www.xxx.ccc.aa/job | 2017-08-05 16:30:20 | 1 | http://www.xxx.ccc.aa/job |
| 192.168.33.46 | http://www.xxx.ccc.aa/excersize | 2017-08-06 16:30:20 | 2 | http://www.xxx.ccc.aa/job |
| 192.168.33.46 | http://www.xxx.ccc.aa/excersize | 2017-08-06 16:30:20 | 3 | http://www.xxx.ccc.aa/job |
| 192.168.33.5 | http://www.xxx.ccc.aa/job | 2017-08-04 15:40:20 | 1 | http://www.xxx.ccc.aa/job |
| 192.168.33.55 | http://www.xxx.ccc.aa/job | 2017-08-05 15:40:20 | 1 | http://www.xxx.ccc.aa/job |
| 192.168.33.55 | http://www.xxx.ccc.aa/job | 2017-08-06 15:40:20 | 2 | http://www.xxx.ccc.aa/job |
| 192.168.33.55 | http://www.xxx.ccc.aa/job | 2017-08-06 15:40:20 | 3 | http://www.xxx.ccc.aa/job |
| 192.168.34.44 | http://www.xxx.ccc.aa/pay | 2017-08-06 15:30:20 | 1 | http://www.xxx.ccc.aa/pay |
| 192.168.44.3 | http://www.xxx.ccc.aa/teach | 2017-08-05 15:35:20 | 1 | http://www.xxx.ccc.aa/teach |
+----------------+---------------------------------+----------------------+-----+---------------------------------+--+ /*
累计报表--分析函数实现版
*/
-- sum() over() 函数
select id
,month
,sum(amount) over(partition by id order by month rows between unbounded preceding and current row)
from
(select id,month,
sum(fee) as amount
from t_test
group by id,month) tmp;

Hive—简单窗口分析函数的更多相关文章

  1. hive中窗口分析函数

    分组统计 1. groups sets(field1,field2,field3, (field1,field2)) 样例如下: select dt,tenantCode,nvl(platform,' ...

  2. pyqt5之简单窗口的创建

    在学完tkinter后,发现tkinter在布局方面特别的不方便(Tkinter资料:http://effbot.org/tkinterbook/tkinter-index.htm),因此学习pyqt ...

  3. 雷林鹏分享:jQuery EasyUI 窗口 - 创建简单窗口

    jQuery EasyUI 窗口 - 创建简单窗口 创建一个窗口(window)非常简单,我们创建一个 DIV 标记: Some Content. 现在运行测试页面,您会看见一个窗口(window)显 ...

  4. OpenGL学习 (一) - 简单窗口绘制

    一.OpenGL 简介 OpenGL 本质: OpenGL(Open Graphics Library),通常可以认为是API,其包含了一系列可以操作图形.图像的函数.但深究下来,它是由Khronos ...

  5. Hive 窗口分析函数

    1.窗口函数 1.LAG(col,n,DEFAULT) 用于统计窗口内往上第n行值 第一个参数为列名,第二个参数为往上第n行(可选,默认为1),第三个参数为默认值(当往上第n行为NULL时候,取默认值 ...

  6. hive row_number等窗口分析函数

    一.排序&去重分析 row_number() over(partititon by col1 order by col2) as rn 结果:1,2,3,4 rank() over(parti ...

  7. Hive 窗口函数、分析函数

    1 分析函数:用于等级.百分点.n分片等 Ntile 是Hive很强大的一个分析函数. 可以看成是:它把有序的数据集合 平均分配 到 指定的数量(num)个桶中, 将桶号分配给每一行.如果不能平均分配 ...

  8. Windows程序设计笔记(二) 关于编写简单窗口程序中的几点疑惑

    在编写窗口程序时主要是5个步骤,创建窗口类.注册窗口类.创建窗口.显示窗口.消息环的编写.对于这5个步骤为何要这样写,当初我不是太理解,学习到现在有些问题我基本上已经找到了答案,同时对于Windows ...

  9. hive:排序分析函数

    基本排序函数 语法: rank()over([partition by col1] order by col2) dense_rank()over([partition by col1] order ...

随机推荐

  1. 织梦开启PHP 标签

    第一步: dedecms出现DedeCMS Error:Tag disabled:php原因解决 --------------------------------------------------- ...

  2. Hadoop高级培训课程大纲-管理员版

    一.课程概述 本次培训课程主要面向大数据系统管理人员和开发设计人员,基于开源社区大数据应用最活跃的Hadoop和HBase技术框架.围绕分布式文件存储(HDFS).分布式并行计算(Map/Recue) ...

  3. 用Keras搭建神经网络 简单模版(三)—— CNN 卷积神经网络(手写数字图片识别)

    # -*- coding: utf-8 -*- import numpy as np np.random.seed(1337) #for reproducibility再现性 from keras.d ...

  4. linux find grep组合使用

    一.常用组合 1. 查找所有".h"文件 find /PATH -name "*.h" 2. 查找所有".h"文件中的含有"hel ...

  5. 当vcenter是linux版本的时候Sysprep存放路径

    为 VMware vCenter Server Appliance 安装 Microsoft Sysprep 工具在从 Microsoft 网站下载并安装 Microsoft Sysprep 工具之后 ...

  6. tornado.ioloop.IOLoop相关文章

    http://6167018.blog.51cto.com/6157018/1532899 http://kenby.iteye.com/blog/1159621

  7. 如何使用App.config文件,读取字符串?

    如何使用App.config文件,读取字符串? .在项目里添加App.config文件,内容如下: <?xml version="1.0" encoding="ut ...

  8. 关于Mongodb的全面总结

    MongoDB的内部构造<MongoDB The Definitive Guide> MongoDB的官方文档基本是how to do的介绍,而关于how it worked却少之又少,本 ...

  9. PS制作gif动图教程

    之前做过一些动图,时间久了就忘记了,每次心血来潮想做的时候又要重新找资料,网上的教程都不够完整,因此整理了一份完整的教程,针对PS新手(对,没错,就是博主自己哈哈). 准备工作:photoshop.图 ...

  10. mybatis Dynamic SQL动态 SQL

    动态 SQL MyBatis 的强大特性之一便是它的动态 SQL.如果你有使用 JDBC 或其它类似框架的经验,你就能体会到根据不同条件拼接 SQL 语句的痛苦.例如拼接时要确保不能忘记添加必要的空格 ...