Skip to end of metadata

 

Go to start of metadata

 

Windowing and Analytics Functions

Enhancements to Hive QL

Version

Icon

Introduced in Hive version 0.11.

This section introduces the Hive QL enhancements for windowing and analytics functions. See "Windowing Specifications in HQL" (attached to HIVE-4197) for details. HIVE-896 has more information, including links to earlier documentation in the initial comments.

All of the windowing and analytics functions operate as per the SQL standard.

The current release supports the following functions for windowing and analytics:

  1. Windowing functions

    • LEAD

      • The number of rows to lead can optionally be specified. If the number of rows to lead is not specified, the lead is one row.
      • Returns null when the lead for the current row extends beyond the end of the window.
    • LAG
      • The number of rows to lag can optionally be specified. If the number of rows to lag is not specified, the lag is one row.
      • Returns null when the lag for the current row extends before the beginning of the window.
    • FIRST_VALUE
    • LAST_VALUE
  2. The OVER clause
    • OVER with standard aggregates:

      • COUNT
      • SUM
      • MIN
      • MAX
      • AVG
    • OVER with a PARTITION BY statement with one or more partitioning columns of any primitive datatype.
    • OVER with PARTITION BY and ORDER BY with one or more partitioning and/or ordering columns of any datatype.
      • OVER with a window specification. Windows can be defined separately in a WINDOW clause. Window specifications support these standard options:

        ROWS ((CURRENT ROW) | (UNBOUNDED | [num]) PRECEDING) AND (UNBOUNDED | [num]) FOLLOWING
        
        Icon

        The OVER clause supports the following functions, but it does not support a window with them (see HIVE-4797):

        Ranking functions: Rank, NTile, DenseRank, CumeDist, PercentRank.

        Lead and Lag functions.

  3. Analytics functions
    • RANK
    • ROW_NUMBER
    • DENSE_RANK
    • CUME_DIST
    • PERCENT_RANK
    • NTILE

Examples

This section provides examples of how to use the Hive QL windowing and analytics functions in SELECT statements. See HIVE-896 for additional examples.

PARTITION BY with one partitioning column, no ORDER BY or window specification

SELECT a, COUNT(b) OVER (PARTITION BY c)
FROM T;

PARTITION BY with two partitioning columns, no ORDER BY or window specification

SELECT a, COUNT(b) OVER (PARTITION BY c, d)
FROM T;

PARTITION BY with one partitioning column, one ORDER BY column, and no window specification

SELECT a, SUM(b) OVER (PARTITION BY ORDER BY d)
FROM T;

PARTITION BY with two partitioning columns, two ORDER BY columns, and no window specification

SELECT a, SUM(b) OVER (PARTITION BY c, d ORDER BY e, f)
FROM T;

PARTITION BY with partitioning, ORDER BY, and window specification

SELECT a, SUM(b) OVER (PARTITION BY ORDER BY ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM T;
SELECT a, AVG(b) OVER (PARTITION BY ORDER BY ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)
FROM T;
SELECT a, AVG(b) OVER (PARTITION BY ORDER BY ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING)
FROM T;
SELECT a, AVG(b) OVER (PARTITION BY ORDER BY ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
FROM T;

There can be multiple OVER clauses in a single query. A single OVER clause only applies to the immediately preceding function call. In this example, the first OVER clause applies to COUNT(b) and the second OVER clause applies to SUM(b):

SELECT 
 a,
 COUNT(b) OVER (PARTITION BY c),
 SUM(b) OVER (PARTITION BY c)
FROM T;

Aliases can be used as well, with or without the keyword AS:

SELECT 
 a,
 COUNT(b) OVER (PARTITION BY c) AS b_count,
 SUM(b) OVER (PARTITION BY c) b_sum
FROM T;

WINDOW clause

SELECT a, SUM(b) OVER w
FROM T;
WINDOW w AS (PARTITION BY ORDER BY ROWS UNBOUNDED PRECEDING)

LEAD using default 1 row lead and not specifying default value

SELECT a, LEAD(a) OVER (PARTITION BY ORDER BY ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING)
FROM T;

LAG specifying a lag of 3 rows and default value of 0

SELECT a, LAG(a, 3, 0) OVER (PARTITION BY ORDER BY ROWS 3 PRECEDING)
FROM T;
 

[Hive - LanguageManual ] Windowing and Analytics Functions (待)的更多相关文章

  1. [Hive - LanguageManual] Create/Drop/Alter -View、 Index 、 Function

    Create/Drop/Alter View Create View Drop View Alter View Properties Alter View As Select Version info ...

  2. [HIve - LanguageManual] Hive Operators and User-Defined Functions (UDFs)

    Hive Operators and User-Defined Functions (UDFs) Hive Operators and User-Defined Functions (UDFs) Bu ...

  3. [Hive - LanguageManual] Select base use

    Select Syntax WHERE Clause ALL and DISTINCT Clauses Partition Based Queries HAVING Clause LIMIT Clau ...

  4. [Hive - LanguageManual ] ]SQL Standard Based Hive Authorization

    Status of Hive Authorization before Hive 0.13 SQL Standards Based Hive Authorization (New in Hive 0. ...

  5. [HIve - LanguageManual] LateralView

    Lateral View Syntax Description Example Multiple Lateral Views Outer Lateral Views Lateral View Synt ...

  6. [HIve - LanguageManual] XPathUDF

    Documentation for Built-In User-Defined Functions Related To XPath UDFs xpath, xpath_short, xpath_in ...

  7. [Hive - LanguageManual] GroupBy

    Group By Syntax Simple Examples Select statement and group by clause Advanced Features Multi-Group-B ...

  8. [Hive - LanguageManual] Import/Export

    LanguageManual ImportExport     Skip to end of metadata   Added by Carl Steinbach, last edited by Le ...

  9. [Hive - LanguageManual] DML: Load, Insert, Update, Delete

    LanguageManual DML Hive Data Manipulation Language Hive Data Manipulation Language Loading files int ...

随机推荐

  1. Quartz 并发/单线程

    Quartz 并发/单线程 Quartz定时任务默认都是并发执行的,不会等待上一次任务执行完毕,只要间隔时间到就会执行, 如果定时任执行太长,会长时间占用资源,导致其它任务堵塞.1.在Spring中这 ...

  2. servlet会话技术:Session

    问题的引出 1.在网上购物时,张三和李四购买的商品不一样,他们的购物车中显示的商品也不一样,这是怎么实现的呢? 2.不同的用户登录网站后,不管该用户浏览该网站的那个页面,都可以显示登录人的名字,同时可 ...

  3. swift:高级运算符(位运算符、溢出运算符、优先级和结合性、运算符重载函数)

    swift:高级运算符 http://www.cocoachina.com/ios/20140612/8794.html 除了基本操作符中所讲的运算符,Swift还有许多复杂的高级运算符,包括了C语和 ...

  4. 【Android多屏适配】动态改变Listview item高度

    在ListView的Adapter中去直接获取传入View的LayoutParams是会报空指针异常的,唯一的方法是在xml中嵌套布局一层LinearLayout <?xml version=& ...

  5. Intellij IDEA 新建一个EJB工程(三)

    之前都是用IDEA启动JBoss服务器,并在启动的同时将EJB项目部署上去.在构建 artifacts 时遇到很多问题,明明是EJB项目却不能用EJB导出,真是奇怪~~ 后来用Web Applicat ...

  6. WPF中的Drawing

    以前在用WinForm的时候,可以通过GDI+接口在窗体上动态绘制自定义的图形.在WPF中有没有对应的API呢,最近项目中用到了这个,在这里总结一下. WPF中的Drawing主要提供了几类API: ...

  7. Apache Struts2 s2-020补丁安全绕过漏洞

    CNVD-ID CNVD-2014-01552 发布时间 2014-03-11 危害级别 高 影响产品 Apache struts 2.0.0-2.3.16 BUGTRAQ ID 65999 CVE ...

  8. BZOJ2337: [HNOI2011]XOR和路径

    题解: 异或操作是每一位独立的,所以我们可以考虑每一位分开做. 假设当前正在处理第k位 那令f[i]表示从i到n 为1的概率.因为不是有向无环图(绿豆蛙的归宿),所以我们要用到高斯消元. 若有边i-& ...

  9. POSIX 可移植操作系统接口

    在一些较老的c语言资料,经常会出现“POSIX标准”. 它的专业解释是: 可移植操作系统接口(英语:Portable Operating System Interface,缩写为POSIX),是IEE ...

  10. 【C#学习笔记】载入图片并居中

    using System; using System.Collections.Generic; using System.ComponentModel; using System.Data; usin ...