Documentation for Built-In User-Defined Functions Related To XPath

UDFs

xpath, xpath_short, xpath_int, xpath_long, xpath_float, xpath_double, xpath_number, xpath_string

  • Functions for parsing XML data using XPath expressions.
  • Since version: 0.6.0

    Overview

The xpath family of UDFs are wrappers around the Java XPath library javax.xml.xpath provided by the JDK. The library is based on the XPath 1.0 specification. Please refer to http://java.sun.com/javase/6/docs/api/javax/xml/xpath/package-summary.html for detailed information on the Java XPath library.

All functions follow the form: xpath_*(xml_string, xpath_expression_string). The XPath expression string is compiled and cached. It is reused if the expression in the next input row matches the previous. Otherwise, it is recompiled. So, the xml string is always parsed for every input row, but the xpath expression is precompiled and reused for the vast majority of use cases.

Backward axes are supported. For example:

> select xpath ('<a><b id="1"><c/></b><b id="2"><c/></b></a>','/descendant::c/ancestor::b/@id') from t1 limit 1 ;
[1","2]

Each function returns a specific Hive type given the XPath expression:

  • xpath returns a Hive array of strings.
  • xpath_string returns a string.
  • xpath_boolean returns a boolean.
  • xpath_short returns a short integer.
  • xpath_int returns an integer.
  • xpath_long returns a long integer.
  • xpath_float returns a floating point number.
  • xpath_double,xpath_number returns a double-precision floating point number (xpath_number is an alias for xpath_double).

The UDFs are schema agnostic - no XML validation is performed. However, malformed xml (e.g., <a><b>1</b></aa>) will result in a runtime exception being thrown.

Following are specifics on each xpath UDF variant.

xpath

The xpath() function always returns a hive array of strings. If the expression results in a non-text value (e.g., another xml node) the function will return an empty array. There are 2 primary uses for this function: to get a list of node text values or to get a list of attribute values.

Examples:

Non-matching XPath expression:

> select xpath('<a><b>b1</b><b>b2</b></a>','a/*') from src limit 1 ;
[]

Get a list of node text values:

> select xpath('<a><b>b1</b><b>b2</b></a>','a/*/text()') from src limit 1 ;
[b1","b2]

Get a list of values for attribute 'id':

> select xpath('<a><b id="foo">b1</b><b id="bar">b2</b></a>','//@id') from src limit 1 ;
[foo","bar]

Get a list of node texts for nodes where the 'class' attribute equals 'bb':

> SELECT xpath ('<a><b class="bb">b1</b><b>b2</b><b>b3</b><c class="bb">c1</c><c>c2</c></a>''a/*[@class="bb"]/text()') FROM src LIMIT 1 ;
[b1","c1]

xpath_string

The xpath_string() function returns the text of the first matching node.

Get the text for node 'a/b':

> SELECT xpath_string ('<a><b>bb</b><c>cc</c></a>''a/b') FROM src LIMIT 1 ;
bb

Get the text for node 'a'. Because 'a' has children nodes with text, the result is a composite of text from the children.

> SELECT xpath_string ('<a><b>bb</b><c>cc</c></a>''a') FROM src LIMIT 1 ;
bbcc

Non-matching expression returns an empty string:

> SELECT xpath_string ('<a><b>bb</b><c>cc</c></a>''a/d') FROM src LIMIT 1 ;

Gets the text of the first node that matches '//b':

> SELECT xpath_string ('<a><b>b1</b><b>b2</b></a>''//b') FROM src LIMIT 1 ;
b1

Gets the second matching node:

> SELECT xpath_string ('<a><b>b1</b><b>b2</b></a>''a/b[2]') FROM src LIMIT 1 ;
b2

Gets the text from the first node that has an attribute 'id' with value 'b_2':

> SELECT xpath_string ('<a><b>b1</b><b id="b_2">b2</b></a>''a/b[@id="b_2"]') FROM src LIMIT 1 ;
b2

xpath_boolean

Returns true if the XPath expression evaluates to true, or if a matching node is found.

Match found:

> SELECT xpath_boolean ('<a><b>b</b></a>''a/b') FROM src LIMIT 1 ;
true

No match found:

> SELECT xpath_boolean ('<a><b>b</b></a>''a/c') FROM src LIMIT 1 ;
false

Match found:

> SELECT xpath_boolean ('<a><b>b</b></a>''a/b = "b"') FROM src LIMIT 1 ;
true

No match found:

> SELECT xpath_boolean ('<a><b>10</b></a>''a/b < 10') FROM src LIMIT 1 ;
false

xpath_short, xpath_int, xpath_long

These functions return an integer numeric value, or the value zero if no match is found, or a match is found but the value is non-numeric.
Mathematical operations are supported. In cases where the value overflows the return type, then the maximum value for the type is returned.

No match:

> SELECT xpath_int ('<a>b</a>''a = 10') FROM src LIMIT 1 ;
0

Non-numeric match:

> SELECT xpath_int ('<a>this is not a number</a>''a') FROM src LIMIT 1 ;
0
> SELECT xpath_int ('<a>this 2 is not a number</a>''a') FROM src LIMIT 1 ;
0

Adding values:

> SELECT xpath_int ('<a><b class="odd">1</b><b class="even">2</b><b class="odd">4</b><c>8</c></a>''sum(a/*)') FROM src LIMIT 1 ;
15
> SELECT xpath_int ('<a><b class="odd">1</b><b class="even">2</b><b class="odd">4</b><c>8</c></a>''sum(a/b)') FROM src LIMIT 1 ;
7
> SELECT xpath_int ('<a><b class="odd">1</b><b class="even">2</b><b class="odd">4</b><c>8</c></a>''sum(a/b[@class="odd"])') FROM src LIMIT 1 ;
5

Overflow:

> SELECT xpath_int ('<a><b>2000000000</b><c>40000000000</c></a>''a/b * a/c') FROM src LIMIT 1 ;
2147483647

xpath_float, xpath_double, xpath_number

Similar to xpath_short, xpath_int and xpath_long but with floating point semantics. Non-matches result in zero. However,
non-numeric matches result in NaN. Note that xpath_number() is an alias for xpath_double().

No match:

> SELECT xpath_double ('<a>b</a>''a = 10') FROM src LIMIT 1 ;
0.0

Non-numeric match:

> SELECT xpath_double ('<a>this is not a number</a>''a') FROM src LIMIT 1 ;
NaN

A very large number:

SELECT xpath_double ('<a><b>2000000000</b><c>40000000000</c></a>''a/b * a/c') FROM src LIMIT 1 ;
8.0E19

[HIve - LanguageManual] XPathUDF的更多相关文章

  1. [HIve - LanguageManual] Hive Operators and User-Defined Functions (UDFs)

    Hive Operators and User-Defined Functions (UDFs) Hive Operators and User-Defined Functions (UDFs) Bu ...

  2. [Hive - LanguageManual ] Windowing and Analytics Functions (待)

    LanguageManual WindowingAndAnalytics     Skip to end of metadata   Added by Lefty Leverenz, last edi ...

  3. [Hive - LanguageManual] Import/Export

    LanguageManual ImportExport     Skip to end of metadata   Added by Carl Steinbach, last edited by Le ...

  4. [Hive - LanguageManual] DML: Load, Insert, Update, Delete

    LanguageManual DML Hive Data Manipulation Language Hive Data Manipulation Language Loading files int ...

  5. [Hive - LanguageManual] Alter Table/Partition/Column

    Alter Table/Partition/Column Alter Table Rename Table Alter Table Properties Alter Table Comment Add ...

  6. Hive LanguageManual DDL

    hive语法规则LanguageManual DDL SQL DML 和 DDL 数据操作语言 (DML) 和 数据定义语言 (DDL) 一.数据库 增删改都在文档里说得也很明白,不重复造车轮 二.表 ...

  7. [Hive - LanguageManual ] ]SQL Standard Based Hive Authorization

    Status of Hive Authorization before Hive 0.13 SQL Standards Based Hive Authorization (New in Hive 0. ...

  8. [Hive - LanguageManual] Hive Concurrency Model (待)

    Hive Concurrency Model Hive Concurrency Model Use Cases Turn Off Concurrency Debugging Configuration ...

  9. [Hive - LanguageManual ] Explain (待)

    EXPLAIN Syntax EXPLAIN Syntax Hive provides an EXPLAIN command that shows the execution plan for a q ...

随机推荐

  1. QT里使用sqlite的问题,好多坑

    1. 我使用sqlite,开发机上好好的,测试机上却不行.后来发现是缺少驱动(Driver not loaded Driver not loaded),代码检查了又检查,发现应该是缺少dll文件(系统 ...

  2. WCF入门(七)---自托管消费WCF服务

    费自托管WCF服务的整个过程,一步步地解释以及充足的编码和屏幕截图是非常有必要. 第1步:服务托管,现在我们需要实现的代理类客户端.创建代理的方式不同. 使用svcutil.exe,我们可以创建代理类 ...

  3. 内存单元按字节编址,地址0000A000H~0000BFFFH共有几个存储单元

    一般可以这样:按十六进制(bffff-a000)+1=1fff+12000H=2x16x16x16=81928192/1024=8 最后是8k或者按二进制bfff-a000=0001 1111 111 ...

  4. UVa 11922 - Permutation Transformer 伸展树

    第一棵伸展树,各种调试模板……TVT 对于 1 n 这种查询我处理的不太好,之前序列前后没有添加冗余节点,一直Runtime Error. 后来加上冗余节点之后又出了别的状况,因为多了 0 和 n+1 ...

  5. Java解压上传zip或rar文件,并解压遍历文件中的html的路径

    1.本文只提供了一个功能的代码 public String addFreeMarker() throws Exception { HttpSession session = request.getSe ...

  6. 【C#设计模式——创建型模式】简单工场模式

    进入码农行列也有一年半载了,仍然感觉自己混混沌沌,无所事事,无所作为,,,想想都下气,下气归下气,仍要奋起潜行,像愤怒的小鸟一边又一遍的冲向猪头也好,像蜗牛一样往前蹭也罢,总之要有蚂蚁啃骨头的精神!! ...

  7. Android权限安全(9)Android权限特点及权限管理服务AppOps Service

    Android权限特点 权限管理服务AppOps Service 图中元素介绍: Ignore 是不提示的,Allow 是允许,Reject 是拒绝 Client是一个使用sms 的应用, AppOp ...

  8. Oracle数据库ORA-12154: TNS: 无法解析指定的连接标识符详解

    ORA-12154: TNS: 无法解析指定的连接标识符(转自http://www.cnblogs.com/psforever/p/3929064.html) 相信使用过Oracle数据库的人一定碰到 ...

  9. csv,txt,excel文件之间的转换,perl脚本

    最近接触一些需要csv,txt,excel文件之间的转换,根据一些网上搜索加上自己的改动,实现自己想要的结果为主要目的,代码的出处已经找不到了,还请见谅,以下主要是针对csv&excel 和t ...

  10. linux/unix网络编程之 select

    转自http://www.cnblogs.com/zhuwbox/p/4221934.html linux 下的 select 知识点 unp 的第六章已经描述的很清楚,我们这里简单的说下 selec ...