[Hive - LanguageManual ] Windowing and Analytics Functions (待)
- Added by Lefty Leverenz, last edited by Lefty Leverenz on Aug 01, 2014 (view change)
- show comment
Windowing and Analytics Functions
- Windowing and Analytics Functions
- Enhancements to Hive QL
- Examples
- PARTITION BY with one partitioning column, no ORDER BY or window specification
- PARTITION BY with two partitioning columns, no ORDER BY or window specification
- PARTITION BY with one partitioning column, one ORDER BY column, and no window specification
- PARTITION BY with two partitioning columns, two ORDER BY columns, and no window specification
- PARTITION BY with partitioning, ORDER BY, and window specification
- WINDOW clause
- LEAD using default 1 row lead and not specifying default value
- LAG specifying a lag of 3 rows and default value of 0
Enhancements to Hive QL
Version
Icon
Introduced in Hive version 0.11.
This section introduces the Hive QL enhancements for windowing and analytics functions. See "Windowing Specifications in HQL" (attached to HIVE-4197) for details. HIVE-896 has more information, including links to earlier documentation in the initial comments.
All of the windowing and analytics functions operate as per the SQL standard.
The current release supports the following functions for windowing and analytics:
- Windowing functions
- LEAD
- The number of rows to lead can optionally be specified. If the number of rows to lead is not specified, the lead is one row.
- Returns null when the lead for the current row extends beyond the end of the window.
- LAG
- The number of rows to lag can optionally be specified. If the number of rows to lag is not specified, the lag is one row.
- Returns null when the lag for the current row extends before the beginning of the window.
- FIRST_VALUE
- LAST_VALUE
- LEAD
- The OVER clause
- OVER with standard aggregates:
- COUNT
- SUM
- MIN
- MAX
- AVG
- OVER with a PARTITION BY statement with one or more partitioning columns of any primitive datatype.
- OVER with PARTITION BY and ORDER BY with one or more partitioning and/or ordering columns of any datatype.
OVER with a window specification. Windows can be defined separately in a WINDOW clause. Window specifications support these standard options:
ROWS ((CURRENT ROW) | (UNBOUNDED | [num]) PRECEDING) AND (UNBOUNDED | [num]) FOLLOWING
IconThe OVER clause supports the following functions, but it does not support a window with them (see HIVE-4797):
Ranking functions: Rank, NTile, DenseRank, CumeDist, PercentRank.
Lead and Lag functions.
- OVER with standard aggregates:
- Analytics functions
- RANK
- ROW_NUMBER
- DENSE_RANK
- CUME_DIST
- PERCENT_RANK
- NTILE
Examples
This section provides examples of how to use the Hive QL windowing and analytics functions in SELECT statements. See HIVE-896 for additional examples.
PARTITION BY with one partitioning column, no ORDER BY or window specification
SELECT a, COUNT(b) OVER (PARTITION BY c)FROM T; |
PARTITION BY with two partitioning columns, no ORDER BY or window specification
SELECT a, COUNT(b) OVER (PARTITION BY c, d)FROM T; |
PARTITION BY with one partitioning column, one ORDER BY column, and no window specification
SELECT a, SUM(b) OVER (PARTITION BY c ORDER BY d)FROM T; |
PARTITION BY with two partitioning columns, two ORDER BY columns, and no window specification
SELECT a, SUM(b) OVER (PARTITION BY c, d ORDER BY e, f)FROM T; |
PARTITION BY with partitioning, ORDER BY, and window specification
SELECT a, SUM(b) OVER (PARTITION BY c ORDER BY d ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)FROM T; |
SELECT a, AVG(b) OVER (PARTITION BY c ORDER BY d ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)FROM T; |
SELECT a, AVG(b) OVER (PARTITION BY c ORDER BY d ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING)FROM T; |
SELECT a, AVG(b) OVER (PARTITION BY c ORDER BY d ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)FROM T; |
There can be multiple OVER clauses in a single query. A single OVER clause only applies to the immediately preceding function call. In this example, the first OVER clause applies to COUNT(b) and the second OVER clause applies to SUM(b):
SELECT a, COUNT(b) OVER (PARTITION BY c), SUM(b) OVER (PARTITION BY c)FROM T; |
Aliases can be used as well, with or without the keyword AS:
SELECT a, COUNT(b) OVER (PARTITION BY c) AS b_count, SUM(b) OVER (PARTITION BY c) b_sumFROM T; |
WINDOW clause
SELECT a, SUM(b) OVER wFROM T;WINDOW w AS (PARTITION BY c ORDER BY d ROWS UNBOUNDED PRECEDING) |
LEAD using default 1 row lead and not specifying default value
SELECT a, LEAD(a) OVER (PARTITION BY b ORDER BY C ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING)FROM T; |
LAG specifying a lag of 3 rows and default value of 0
SELECT a, LAG(a, 3, 0) OVER (PARTITION BY b ORDER BY C ROWS 3 PRECEDING)FROM T; |
[Hive - LanguageManual ] Windowing and Analytics Functions (待)的更多相关文章
- [Hive - LanguageManual] Create/Drop/Alter -View、 Index 、 Function
Create/Drop/Alter View Create View Drop View Alter View Properties Alter View As Select Version info ...
- [HIve - LanguageManual] Hive Operators and User-Defined Functions (UDFs)
Hive Operators and User-Defined Functions (UDFs) Hive Operators and User-Defined Functions (UDFs) Bu ...
- [Hive - LanguageManual] Select base use
Select Syntax WHERE Clause ALL and DISTINCT Clauses Partition Based Queries HAVING Clause LIMIT Clau ...
- [Hive - LanguageManual ] ]SQL Standard Based Hive Authorization
Status of Hive Authorization before Hive 0.13 SQL Standards Based Hive Authorization (New in Hive 0. ...
- [HIve - LanguageManual] LateralView
Lateral View Syntax Description Example Multiple Lateral Views Outer Lateral Views Lateral View Synt ...
- [HIve - LanguageManual] XPathUDF
Documentation for Built-In User-Defined Functions Related To XPath UDFs xpath, xpath_short, xpath_in ...
- [Hive - LanguageManual] GroupBy
Group By Syntax Simple Examples Select statement and group by clause Advanced Features Multi-Group-B ...
- [Hive - LanguageManual] Import/Export
LanguageManual ImportExport Skip to end of metadata Added by Carl Steinbach, last edited by Le ...
- [Hive - LanguageManual] DML: Load, Insert, Update, Delete
LanguageManual DML Hive Data Manipulation Language Hive Data Manipulation Language Loading files int ...
随机推荐
- Eclipse groovy in action
Eclipse :Version: Juno Service Release 2GrEclipse plugins:http://dist.springsource.org/release/GRECL ...
- linux驱动学习之tasklet分析
tasklet是中断处理下半部分最常用的一种方法,驱动程序一般先申请中断,在中断处理函数内完成中断上半部分的工作后调用tasklet.tasklet有如下特点: 1.tasklet只可以在一个CPU上 ...
- .NET 框架 (转载)
转载:http://www.tracefact.net/CLR-and-Framework/DotNet-Framework.aspx .NET框架 三年前写的<.NET之美>的第六章,现 ...
- hibernate--lazy(懒加载)属性
关联映射文件中<class>标签中的lazy(懒加载)属性 Lazy(懒加载):只有在正真使用该对象时,才会创建这个对象 Hibernate中的lazy(懒加载):只有我们在正真使用时,它 ...
- Android开发之三种动画
转载:http://www.cnblogs.com/angeldevil/archive/2011/12/02/2271096.html http://www.lightskystreet.com/2 ...
- Windows下搭建Mysql集群
Mysql集群的基本架构如下: 基本原理参考:[转]MySQL Cluster (集群)基本原理 这里采用最小配置,用两台机器来分别部署一个management 节点,2个data node, 2个s ...
- gulp 使用mailgun服务器发送邮件
1.首先你需要创建一个 mailgun 账户,没有请去注册一个. 注册之后会有 mailgun 会给你一个默认的子域名,你就可以使用这个子域名去发送邮件了,如下图: 2.gulp创建任务: var s ...
- 【转载】React入门-Todolist制作学习
我直接看的这个React TodoList的例子(非常好!): http://www.reqianduan.com/2297.html 文中示例的代码访问路径:http://127.0.0.1:708 ...
- HDU 2594 (简单KMP) Simpsons’ Hidden Talents
题意: 有两个字符串,找一个最长子串是的该串既是第一个字的前缀,又是第二个串的后缀. 分析: 把两个串并起来然后在中间加一个无关字符,求next数组即可. #include <cstdio> ...
- [源代码] - C#代码搜索器 - 续
在前文 [源代码] - C#代码搜索器 中我开发了一个代码搜索器. 我对其做的最后改动是将索引保存到磁盘中, 以备今后使用. 如今, 我在工作中又接到一项新任务: 有一个大项目, 其中10个负责数据访 ...