[HIve - LanguageManual] XPathUDF
Documentation for Built-In User-Defined Functions Related To XPath
UDFs
xpath, xpath_short, xpath_int, xpath_long, xpath_float, xpath_double, xpath_number, xpath_string
- Functions for parsing XML data using XPath expressions.
- Since version: 0.6.0
Overview
The xpath family of UDFs are wrappers around the Java XPath library javax.xml.xpath provided by the JDK. The library is based on the XPath 1.0 specification. Please refer to http://java.sun.com/javase/6/docs/api/javax/xml/xpath/package-summary.html for detailed information on the Java XPath library.
All functions follow the form: xpath_*(xml_string, xpath_expression_string). The XPath expression string is compiled and cached. It is reused if the expression in the next input row matches the previous. Otherwise, it is recompiled. So, the xml string is always parsed for every input row, but the xpath expression is precompiled and reused for the vast majority of use cases.
Backward axes are supported. For example:
> select xpath ('<a><b id="1"><c/></b><b id="2"><c/></b></a>','/descendant::c/ancestor::b/@id') from t1 limit 1 ;[1","2] |
Each function returns a specific Hive type given the XPath expression:
xpathreturns a Hive array of strings.xpath_stringreturns a string.xpath_booleanreturns a boolean.xpath_shortreturns a short integer.xpath_intreturns an integer.xpath_longreturns a long integer.xpath_floatreturns a floating point number.xpath_double,xpath_numberreturns a double-precision floating point number (xpath_numberis an alias forxpath_double).
The UDFs are schema agnostic - no XML validation is performed. However, malformed xml (e.g., <a><b>1</b></aa>) will result in a runtime exception being thrown.
Following are specifics on each xpath UDF variant.
xpath
The xpath() function always returns a hive array of strings. If the expression results in a non-text value (e.g., another xml node) the function will return an empty array. There are 2 primary uses for this function: to get a list of node text values or to get a list of attribute values.
Examples:
Non-matching XPath expression:
> select xpath('<a><b>b1</b><b>b2</b></a>','a/*') from src limit 1 ;[] |
Get a list of node text values:
> select xpath('<a><b>b1</b><b>b2</b></a>','a/*/text()') from src limit 1 ;[b1","b2] |
Get a list of values for attribute 'id':
> select xpath('<a><b id="foo">b1</b><b id="bar">b2</b></a>','//@id') from src limit 1 ;[foo","bar] |
Get a list of node texts for nodes where the 'class' attribute equals 'bb':
> SELECT xpath ('<a><b class="bb">b1</b><b>b2</b><b>b3</b><c class="bb">c1</c><c>c2</c></a>', 'a/*[@class="bb"]/text()') FROM src LIMIT 1 ;[b1","c1] |
xpath_string
The xpath_string() function returns the text of the first matching node.
Get the text for node 'a/b':
> SELECT xpath_string ('<a><b>bb</b><c>cc</c></a>', 'a/b') FROM src LIMIT 1 ;bb |
Get the text for node 'a'. Because 'a' has children nodes with text, the result is a composite of text from the children.
> SELECT xpath_string ('<a><b>bb</b><c>cc</c></a>', 'a') FROM src LIMIT 1 ;bbcc |
Non-matching expression returns an empty string:
> SELECT xpath_string ('<a><b>bb</b><c>cc</c></a>', 'a/d') FROM src LIMIT 1 ; |
Gets the text of the first node that matches '//b':
> SELECT xpath_string ('<a><b>b1</b><b>b2</b></a>', '//b') FROM src LIMIT 1 ;b1 |
Gets the second matching node:
> SELECT xpath_string ('<a><b>b1</b><b>b2</b></a>', 'a/b[2]') FROM src LIMIT 1 ;b2 |
Gets the text from the first node that has an attribute 'id' with value 'b_2':
> SELECT xpath_string ('<a><b>b1</b><b id="b_2">b2</b></a>', 'a/b[@id="b_2"]') FROM src LIMIT 1 ;b2 |
xpath_boolean
Returns true if the XPath expression evaluates to true, or if a matching node is found.
Match found:
> SELECT xpath_boolean ('<a><b>b</b></a>', 'a/b') FROM src LIMIT 1 ;true |
No match found:
> SELECT xpath_boolean ('<a><b>b</b></a>', 'a/c') FROM src LIMIT 1 ;false |
Match found:
> SELECT xpath_boolean ('<a><b>b</b></a>', 'a/b = "b"') FROM src LIMIT 1 ;true |
No match found:
> SELECT xpath_boolean ('<a><b>10</b></a>', 'a/b < 10') FROM src LIMIT 1 ;false |
xpath_short, xpath_int, xpath_long
These functions return an integer numeric value, or the value zero if no match is found, or a match is found but the value is non-numeric.
Mathematical operations are supported. In cases where the value overflows the return type, then the maximum value for the type is returned.
No match:
> SELECT xpath_int ('<a>b</a>', 'a = 10') FROM src LIMIT 1 ;0 |
Non-numeric match:
> SELECT xpath_int ('<a>this is not a number</a>', 'a') FROM src LIMIT 1 ;0> SELECT xpath_int ('<a>this 2 is not a number</a>', 'a') FROM src LIMIT 1 ;0 |
Adding values:
> SELECT xpath_int ('<a><b class="odd">1</b><b class="even">2</b><b class="odd">4</b><c>8</c></a>', 'sum(a/*)') FROM src LIMIT 1 ;15> SELECT xpath_int ('<a><b class="odd">1</b><b class="even">2</b><b class="odd">4</b><c>8</c></a>', 'sum(a/b)') FROM src LIMIT 1 ;7> SELECT xpath_int ('<a><b class="odd">1</b><b class="even">2</b><b class="odd">4</b><c>8</c></a>', 'sum(a/b[@class="odd"])') FROM src LIMIT 1 ;5 |
Overflow:
> SELECT xpath_int ('<a><b>2000000000</b><c>40000000000</c></a>', 'a/b * a/c') FROM src LIMIT 1 ;2147483647 |
xpath_float, xpath_double, xpath_number
Similar to xpath_short, xpath_int and xpath_long but with floating point semantics. Non-matches result in zero. However,
non-numeric matches result in NaN. Note that xpath_number() is an alias for xpath_double().
No match:
> SELECT xpath_double ('<a>b</a>', 'a = 10') FROM src LIMIT 1 ;0.0 |
Non-numeric match:
> SELECT xpath_double ('<a>this is not a number</a>', 'a') FROM src LIMIT 1 ;NaN |
A very large number:
SELECT xpath_double ('<a><b>2000000000</b><c>40000000000</c></a>', 'a/b * a/c') FROM src LIMIT 1 ;8.0E19 |
[HIve - LanguageManual] XPathUDF的更多相关文章
- [HIve - LanguageManual] Hive Operators and User-Defined Functions (UDFs)
Hive Operators and User-Defined Functions (UDFs) Hive Operators and User-Defined Functions (UDFs) Bu ...
- [Hive - LanguageManual ] Windowing and Analytics Functions (待)
LanguageManual WindowingAndAnalytics Skip to end of metadata Added by Lefty Leverenz, last edi ...
- [Hive - LanguageManual] Import/Export
LanguageManual ImportExport Skip to end of metadata Added by Carl Steinbach, last edited by Le ...
- [Hive - LanguageManual] DML: Load, Insert, Update, Delete
LanguageManual DML Hive Data Manipulation Language Hive Data Manipulation Language Loading files int ...
- [Hive - LanguageManual] Alter Table/Partition/Column
Alter Table/Partition/Column Alter Table Rename Table Alter Table Properties Alter Table Comment Add ...
- Hive LanguageManual DDL
hive语法规则LanguageManual DDL SQL DML 和 DDL 数据操作语言 (DML) 和 数据定义语言 (DDL) 一.数据库 增删改都在文档里说得也很明白,不重复造车轮 二.表 ...
- [Hive - LanguageManual ] ]SQL Standard Based Hive Authorization
Status of Hive Authorization before Hive 0.13 SQL Standards Based Hive Authorization (New in Hive 0. ...
- [Hive - LanguageManual] Hive Concurrency Model (待)
Hive Concurrency Model Hive Concurrency Model Use Cases Turn Off Concurrency Debugging Configuration ...
- [Hive - LanguageManual ] Explain (待)
EXPLAIN Syntax EXPLAIN Syntax Hive provides an EXPLAIN command that shows the execution plan for a q ...
随机推荐
- 在C#中怎么调用Resources文件中的图片
譬如资源中有名为myPic的图片,在代码中可以这么使用: this.BackgroundImage = Properties.Resources.myPic; 如有疑问,继续追问.
- AC题目简解-线段树
线段树: http://www.notonlysuccess.com/index.php/segment-tree-complete/鉴于notonlysuccess大牛的博客对于题目的思路写的很简陋 ...
- Windows 7更改SVN账户密码
首先说明下我的系统是Windows7 今天更改了SVN账号和密码,然后想要更改一下Eclipse的SVN登录用户名和密码 但是网上找了一大推说什么客户端的,靠净扯淡. 本人亲测最有效的方法是删除C盘下 ...
- 高难度(3)RenderScript
RenderScript RenderScript is a framework for running computationally intensive tasks at high perform ...
- Android Monkey测试
Monkey测试1——Monkey的使用 原文地址: http://www.douban.com/note/257029872/ (转自豆瓣,版权属于豆瓣及豆瓣网友,如有冒犯请见谅并联系我们) Mon ...
- poj 3258 River Hopscotch(二分+贪心)
题目:http://poj.org/problem?id=3258 题意: 一条河长度为 L,河的起点(Start)和终点(End)分别有2块石头,S到E的距离就是L. 河中有n块石头,每块石头到S都 ...
- sdut 2819 比赛排名(边表 拓扑排序)
题目:http://acm.sdut.edu.cn/sdutoj/problem.php?action=showproblem&problemid=2819 #include <iost ...
- crontab无法调用java的问题解决
本来想将写的代码挂在crontab下运行,谁知道无法运行,没有任何输出,试着用ls -al >> 1.log试了一下,确定crontab是正常运行的. 从网站上找了下问题,原因出在cron ...
- bzoj3242
如果是树,那么一定选择树的直径的中点 套了个环?裸的想法显然是断开环,然后求所有树的直径的最小值 环套树dp的一般思路,先把环放到根,把环上点下面的子树dp出来,然后再处理环上的情况 设f[i]表示以 ...
- UVa 1347 (双线程DP) Tour
题意: 平面上有n个坐标均为正数的点,按照x坐标从小到大一次给出.求一条最短路线,从最左边的点出发到最右边的点,再回到最左边的点.除了第一个和最右一个点其他点恰好只经过一次. 分析: 可以等效为两个人 ...