Documentation for Built-In User-Defined Functions Related To XPath

UDFs

xpath, xpath_short, xpath_int, xpath_long, xpath_float, xpath_double, xpath_number, xpath_string

  • Functions for parsing XML data using XPath expressions.
  • Since version: 0.6.0

    Overview

The xpath family of UDFs are wrappers around the Java XPath library javax.xml.xpath provided by the JDK. The library is based on the XPath 1.0 specification. Please refer to http://java.sun.com/javase/6/docs/api/javax/xml/xpath/package-summary.html for detailed information on the Java XPath library.

All functions follow the form: xpath_*(xml_string, xpath_expression_string). The XPath expression string is compiled and cached. It is reused if the expression in the next input row matches the previous. Otherwise, it is recompiled. So, the xml string is always parsed for every input row, but the xpath expression is precompiled and reused for the vast majority of use cases.

Backward axes are supported. For example:

> select xpath ('<a><b id="1"><c/></b><b id="2"><c/></b></a>','/descendant::c/ancestor::b/@id') from t1 limit 1 ;
[1","2]

Each function returns a specific Hive type given the XPath expression:

  • xpath returns a Hive array of strings.
  • xpath_string returns a string.
  • xpath_boolean returns a boolean.
  • xpath_short returns a short integer.
  • xpath_int returns an integer.
  • xpath_long returns a long integer.
  • xpath_float returns a floating point number.
  • xpath_double,xpath_number returns a double-precision floating point number (xpath_number is an alias for xpath_double).

The UDFs are schema agnostic - no XML validation is performed. However, malformed xml (e.g., <a><b>1</b></aa>) will result in a runtime exception being thrown.

Following are specifics on each xpath UDF variant.

xpath

The xpath() function always returns a hive array of strings. If the expression results in a non-text value (e.g., another xml node) the function will return an empty array. There are 2 primary uses for this function: to get a list of node text values or to get a list of attribute values.

Examples:

Non-matching XPath expression:

> select xpath('<a><b>b1</b><b>b2</b></a>','a/*') from src limit 1 ;
[]

Get a list of node text values:

> select xpath('<a><b>b1</b><b>b2</b></a>','a/*/text()') from src limit 1 ;
[b1","b2]

Get a list of values for attribute 'id':

> select xpath('<a><b id="foo">b1</b><b id="bar">b2</b></a>','//@id') from src limit 1 ;
[foo","bar]

Get a list of node texts for nodes where the 'class' attribute equals 'bb':

> SELECT xpath ('<a><b class="bb">b1</b><b>b2</b><b>b3</b><c class="bb">c1</c><c>c2</c></a>''a/*[@class="bb"]/text()') FROM src LIMIT 1 ;
[b1","c1]

xpath_string

The xpath_string() function returns the text of the first matching node.

Get the text for node 'a/b':

> SELECT xpath_string ('<a><b>bb</b><c>cc</c></a>''a/b') FROM src LIMIT 1 ;
bb

Get the text for node 'a'. Because 'a' has children nodes with text, the result is a composite of text from the children.

> SELECT xpath_string ('<a><b>bb</b><c>cc</c></a>''a') FROM src LIMIT 1 ;
bbcc

Non-matching expression returns an empty string:

> SELECT xpath_string ('<a><b>bb</b><c>cc</c></a>''a/d') FROM src LIMIT 1 ;

Gets the text of the first node that matches '//b':

> SELECT xpath_string ('<a><b>b1</b><b>b2</b></a>''//b') FROM src LIMIT 1 ;
b1

Gets the second matching node:

> SELECT xpath_string ('<a><b>b1</b><b>b2</b></a>''a/b[2]') FROM src LIMIT 1 ;
b2

Gets the text from the first node that has an attribute 'id' with value 'b_2':

> SELECT xpath_string ('<a><b>b1</b><b id="b_2">b2</b></a>''a/b[@id="b_2"]') FROM src LIMIT 1 ;
b2

xpath_boolean

Returns true if the XPath expression evaluates to true, or if a matching node is found.

Match found:

> SELECT xpath_boolean ('<a><b>b</b></a>''a/b') FROM src LIMIT 1 ;
true

No match found:

> SELECT xpath_boolean ('<a><b>b</b></a>''a/c') FROM src LIMIT 1 ;
false

Match found:

> SELECT xpath_boolean ('<a><b>b</b></a>''a/b = "b"') FROM src LIMIT 1 ;
true

No match found:

> SELECT xpath_boolean ('<a><b>10</b></a>''a/b < 10') FROM src LIMIT 1 ;
false

xpath_short, xpath_int, xpath_long

These functions return an integer numeric value, or the value zero if no match is found, or a match is found but the value is non-numeric.
Mathematical operations are supported. In cases where the value overflows the return type, then the maximum value for the type is returned.

No match:

> SELECT xpath_int ('<a>b</a>''a = 10') FROM src LIMIT 1 ;
0

Non-numeric match:

> SELECT xpath_int ('<a>this is not a number</a>''a') FROM src LIMIT 1 ;
0
> SELECT xpath_int ('<a>this 2 is not a number</a>''a') FROM src LIMIT 1 ;
0

Adding values:

> SELECT xpath_int ('<a><b class="odd">1</b><b class="even">2</b><b class="odd">4</b><c>8</c></a>''sum(a/*)') FROM src LIMIT 1 ;
15
> SELECT xpath_int ('<a><b class="odd">1</b><b class="even">2</b><b class="odd">4</b><c>8</c></a>''sum(a/b)') FROM src LIMIT 1 ;
7
> SELECT xpath_int ('<a><b class="odd">1</b><b class="even">2</b><b class="odd">4</b><c>8</c></a>''sum(a/b[@class="odd"])') FROM src LIMIT 1 ;
5

Overflow:

> SELECT xpath_int ('<a><b>2000000000</b><c>40000000000</c></a>''a/b * a/c') FROM src LIMIT 1 ;
2147483647

xpath_float, xpath_double, xpath_number

Similar to xpath_short, xpath_int and xpath_long but with floating point semantics. Non-matches result in zero. However,
non-numeric matches result in NaN. Note that xpath_number() is an alias for xpath_double().

No match:

> SELECT xpath_double ('<a>b</a>''a = 10') FROM src LIMIT 1 ;
0.0

Non-numeric match:

> SELECT xpath_double ('<a>this is not a number</a>''a') FROM src LIMIT 1 ;
NaN

A very large number:

SELECT xpath_double ('<a><b>2000000000</b><c>40000000000</c></a>''a/b * a/c') FROM src LIMIT 1 ;
8.0E19

[HIve - LanguageManual] XPathUDF的更多相关文章

  1. [HIve - LanguageManual] Hive Operators and User-Defined Functions (UDFs)

    Hive Operators and User-Defined Functions (UDFs) Hive Operators and User-Defined Functions (UDFs) Bu ...

  2. [Hive - LanguageManual ] Windowing and Analytics Functions (待)

    LanguageManual WindowingAndAnalytics     Skip to end of metadata   Added by Lefty Leverenz, last edi ...

  3. [Hive - LanguageManual] Import/Export

    LanguageManual ImportExport     Skip to end of metadata   Added by Carl Steinbach, last edited by Le ...

  4. [Hive - LanguageManual] DML: Load, Insert, Update, Delete

    LanguageManual DML Hive Data Manipulation Language Hive Data Manipulation Language Loading files int ...

  5. [Hive - LanguageManual] Alter Table/Partition/Column

    Alter Table/Partition/Column Alter Table Rename Table Alter Table Properties Alter Table Comment Add ...

  6. Hive LanguageManual DDL

    hive语法规则LanguageManual DDL SQL DML 和 DDL 数据操作语言 (DML) 和 数据定义语言 (DDL) 一.数据库 增删改都在文档里说得也很明白,不重复造车轮 二.表 ...

  7. [Hive - LanguageManual ] ]SQL Standard Based Hive Authorization

    Status of Hive Authorization before Hive 0.13 SQL Standards Based Hive Authorization (New in Hive 0. ...

  8. [Hive - LanguageManual] Hive Concurrency Model (待)

    Hive Concurrency Model Hive Concurrency Model Use Cases Turn Off Concurrency Debugging Configuration ...

  9. [Hive - LanguageManual ] Explain (待)

    EXPLAIN Syntax EXPLAIN Syntax Hive provides an EXPLAIN command that shows the execution plan for a q ...

随机推荐

  1. Photoshop:笔刷制作和安装

    笔刷制作 1.新建一个文档,大小为要制作的笔刷大小,把画笔图像放里面 2.选择:菜单->编辑->定义画笔预设,这时在画笔面板中会出现刚定义的画笔 3.存储画笔,可以把当前的笔刷保存为一个. ...

  2. java:抽象类和抽象函数

    面向对象:先抽象后具体 抽象类也叫基类 抽象函数:只有函数的定义,没有函数体的函数, 语法:类必须定义为抽象类,才能调用抽象函数,抽象类里面可以没有抽象函数 abstract class Printe ...

  3. Webbrowser模拟百度一下子点击事件

    Webbrowser模拟百度一下点击事件新建一个form,有一个button和一个webbrowser控件.然后webbrowser一开始加载的就是百度主页.然后在文本框里输入点东西,如何做到点击bu ...

  4. ViewPager介绍和使用说明

    1   ViewPager实现的功能 和实际运行的效果图示意 ViewPager类提供了多界面切换的新效果.新效果有如下特征: [1] 当前显示一组界面中的其中一个界面. [2] 当用户通过左右滑动界 ...

  5. Fragment 和 FragmentActivity的使用(二)

      今天继续完成剩下的学习部分,现在项目很多地方使用viewpager来提供滑动,今天记录学习viewpager配合fragment的显示,增加一个CallLogsFragment配合之前SMSLis ...

  6. 基于XMPP的即时通信系统的建立(二)— XMPP详解

    XMPP详解 XMPP(eXtensible Messaging and Presence Protocol,可扩展消息处理和现场协议)是一种在两个地点间传递小型结构化数据的协议.在此基础上,XMPP ...

  7. BNU 4188 Superprime Rib【BFS】

    题意:给出n,输出n位超级质数,超级质数的定义为“依次去掉右边一位后仍然为质数的数” 因为一个n位质数去掉右边一位数之后仍然为质数,说明它是由n-1位超级质数演变而来的, 同理,n-1位超级质数也由n ...

  8. HDU 2063 过山车 (最大匹配,匈牙利算法)

    题意:中文题目 思路:匈牙利算法解决二分图最大匹配问题. #include <bits/stdc++.h> using namespace std; ; int mapp[N][N]; / ...

  9. 【Unity3D】Unity自带组件—完成第一人称人物控制

    1.导入unity自带的Character Controllers包 2.可以看到First Person Controller组件的构成 Mouse Look() : 随鼠标的移动而使所属物体发生旋 ...

  10. jquery加入购物车飞入的效果

    主要原理是:点击当前图片的时候,复制(克隆)当前图片在当前位置,然后利用jQuery的animate()方法实现图像的飞入效果 效果预览:http://runjs.cn/detail/qmf0mtm1 ...