Skip to end of metadata

 

Go to start of metadata

 

Windowing and Analytics Functions

Enhancements to Hive QL

Version

Icon

Introduced in Hive version 0.11.

This section introduces the Hive QL enhancements for windowing and analytics functions. See "Windowing Specifications in HQL" (attached to HIVE-4197) for details. HIVE-896 has more information, including links to earlier documentation in the initial comments.

All of the windowing and analytics functions operate as per the SQL standard.

The current release supports the following functions for windowing and analytics:

  1. Windowing functions

    • LEAD

      • The number of rows to lead can optionally be specified. If the number of rows to lead is not specified, the lead is one row.
      • Returns null when the lead for the current row extends beyond the end of the window.
    • LAG
      • The number of rows to lag can optionally be specified. If the number of rows to lag is not specified, the lag is one row.
      • Returns null when the lag for the current row extends before the beginning of the window.
    • FIRST_VALUE
    • LAST_VALUE
  2. The OVER clause
    • OVER with standard aggregates:

      • COUNT
      • SUM
      • MIN
      • MAX
      • AVG
    • OVER with a PARTITION BY statement with one or more partitioning columns of any primitive datatype.
    • OVER with PARTITION BY and ORDER BY with one or more partitioning and/or ordering columns of any datatype.
      • OVER with a window specification. Windows can be defined separately in a WINDOW clause. Window specifications support these standard options:

        ROWS ((CURRENT ROW) | (UNBOUNDED | [num]) PRECEDING) AND (UNBOUNDED | [num]) FOLLOWING
        
        Icon

        The OVER clause supports the following functions, but it does not support a window with them (see HIVE-4797):

        Ranking functions: Rank, NTile, DenseRank, CumeDist, PercentRank.

        Lead and Lag functions.

  3. Analytics functions
    • RANK
    • ROW_NUMBER
    • DENSE_RANK
    • CUME_DIST
    • PERCENT_RANK
    • NTILE

Examples

This section provides examples of how to use the Hive QL windowing and analytics functions in SELECT statements. See HIVE-896 for additional examples.

PARTITION BY with one partitioning column, no ORDER BY or window specification

SELECT a, COUNT(b) OVER (PARTITION BY c)
FROM T;

PARTITION BY with two partitioning columns, no ORDER BY or window specification

SELECT a, COUNT(b) OVER (PARTITION BY c, d)
FROM T;

PARTITION BY with one partitioning column, one ORDER BY column, and no window specification

SELECT a, SUM(b) OVER (PARTITION BY ORDER BY d)
FROM T;

PARTITION BY with two partitioning columns, two ORDER BY columns, and no window specification

SELECT a, SUM(b) OVER (PARTITION BY c, d ORDER BY e, f)
FROM T;

PARTITION BY with partitioning, ORDER BY, and window specification

SELECT a, SUM(b) OVER (PARTITION BY ORDER BY ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM T;
SELECT a, AVG(b) OVER (PARTITION BY ORDER BY ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)
FROM T;
SELECT a, AVG(b) OVER (PARTITION BY ORDER BY ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING)
FROM T;
SELECT a, AVG(b) OVER (PARTITION BY ORDER BY ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
FROM T;

There can be multiple OVER clauses in a single query. A single OVER clause only applies to the immediately preceding function call. In this example, the first OVER clause applies to COUNT(b) and the second OVER clause applies to SUM(b):

SELECT 
 a,
 COUNT(b) OVER (PARTITION BY c),
 SUM(b) OVER (PARTITION BY c)
FROM T;

Aliases can be used as well, with or without the keyword AS:

SELECT 
 a,
 COUNT(b) OVER (PARTITION BY c) AS b_count,
 SUM(b) OVER (PARTITION BY c) b_sum
FROM T;

WINDOW clause

SELECT a, SUM(b) OVER w
FROM T;
WINDOW w AS (PARTITION BY ORDER BY ROWS UNBOUNDED PRECEDING)

LEAD using default 1 row lead and not specifying default value

SELECT a, LEAD(a) OVER (PARTITION BY ORDER BY ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING)
FROM T;

LAG specifying a lag of 3 rows and default value of 0

SELECT a, LAG(a, 3, 0) OVER (PARTITION BY ORDER BY ROWS 3 PRECEDING)
FROM T;
 

[Hive - LanguageManual ] Windowing and Analytics Functions (待)的更多相关文章

  1. [Hive - LanguageManual] Create/Drop/Alter -View、 Index 、 Function

    Create/Drop/Alter View Create View Drop View Alter View Properties Alter View As Select Version info ...

  2. [HIve - LanguageManual] Hive Operators and User-Defined Functions (UDFs)

    Hive Operators and User-Defined Functions (UDFs) Hive Operators and User-Defined Functions (UDFs) Bu ...

  3. [Hive - LanguageManual] Select base use

    Select Syntax WHERE Clause ALL and DISTINCT Clauses Partition Based Queries HAVING Clause LIMIT Clau ...

  4. [Hive - LanguageManual ] ]SQL Standard Based Hive Authorization

    Status of Hive Authorization before Hive 0.13 SQL Standards Based Hive Authorization (New in Hive 0. ...

  5. [HIve - LanguageManual] LateralView

    Lateral View Syntax Description Example Multiple Lateral Views Outer Lateral Views Lateral View Synt ...

  6. [HIve - LanguageManual] XPathUDF

    Documentation for Built-In User-Defined Functions Related To XPath UDFs xpath, xpath_short, xpath_in ...

  7. [Hive - LanguageManual] GroupBy

    Group By Syntax Simple Examples Select statement and group by clause Advanced Features Multi-Group-B ...

  8. [Hive - LanguageManual] Import/Export

    LanguageManual ImportExport     Skip to end of metadata   Added by Carl Steinbach, last edited by Le ...

  9. [Hive - LanguageManual] DML: Load, Insert, Update, Delete

    LanguageManual DML Hive Data Manipulation Language Hive Data Manipulation Language Loading files int ...

随机推荐

  1. System.Windows.Forms.AxHost.InvalidActiveXStateException”类型的异常在 ESRI.ArcGIS.AxControls.dll 中发生,但未在用户代码中进行处理

    private void CopyAndOverwriteMap() { //IObjectCopy接口变量申明 IObjectCopy objectCopy = new ObjectCopyClas ...

  2. Android PullToRefreshExpandableListView的点击事件

    这几天做项目的时候用到了一个开源的下拉刷新的框架(需要的朋友可以加我Q:359222347).其中我使用PullToRefreshExpandableListView的时候发现这个东西没有setOnC ...

  3. [Mongo] error inserting documents: BSONObj size is invalid (mongoimport mongorestore 数据备份恢复)

    解决办法如下, ./mongoimport -port 6066 -d xxx -c xxx --batchSize=10 /root/mong_data/test/xxx 原因转自 http://b ...

  4. Android开发之assets文件夹中资源的获取

    assets中的文件都是保持原始的文件格式,需要使用AssetManager以字节流的形式读取出来 步骤: 1. 先在Activity里面调用getAssets() 来获取AssetManager引用 ...

  5. [Codeforces673C]Bear and Colors(枚举,暴力)

    题目链接:http://codeforces.com/contest/673/problem/C 题意:给一串数,不同大小的区间内出现次数最多的那个数在计数的时候会+1,问所有区间都这样计一次数,所有 ...

  6. CSS+DIV问题!DIV的最小高度问题!

    DIV层的最小高度问题!就是一个DIV有个最小高度,但是如果DIV层中的内容很多,DIV的高度会根据内容而进行拉长!要求IE6.IE7还有firefox都要兼容!我试了很多网上的方法都不好用!请测试后 ...

  7. trackr: An AngularJS app with a Java 8 backend – Part IV 实践篇

    REST API对于前后端或后端与后端之间通讯是一个好的接口,而单页应用Single Page Applications (SPA)非常流行. 我们依然以trackr为案例,这是一个跟踪工作时间 请假 ...

  8. Careerdesign@foxmail.com

    Careerdesign@foxmail.com 相关文章 32岁了,我还有没有机会转行做程序员吗? 如何有效渡过充满迷茫的大学生活 毕业了,我是先择业,还是先就业? 程序员创业,不要把风险带给家人! ...

  9. Oracle学习之集合运算

    一.集合运算操作符  UNION:(并集)返回两个集合去掉重复值的所有的记录  UNION ALL:(并集)返回两个集合去掉重复值的所有的记录 INTERSECT:(交集)返回两个集合的所有记录,重复 ...

  10. hdu 4861 Couple doubi (找规律 )

    题目链接 可以瞎搞一下,找找规律 题意:两个人进行游戏,桌上有k个球,第i个球的值为1i+2i+⋯+(p−1)i%p,两个人轮流取,如果DouBiNan的值大的话就输出YES,否则输出NO. 分析:解 ...