[HIve - LanguageManual] Transform [没懂]
Transform/Map-Reduce Syntax
Users can also plug in their own custom mappers and reducers in the data stream by using features natively supported in the Hive 2.0 language. e.g. in order to run a custom mapper script - map_script - and a custom reducer script - reduce_script - the user can issue the following command which uses the TRANSFORM clause to embed the mapper and the reducer scripts.
By default, columns will be transformed to STRING and delimited by TAB before feeding to the user script; similarly, all NULL values will be converted to the literal string \N in order to differentiate NULL values from empty strings. The standard output of the user script will be treated as TAB-separated STRINGcolumns, any cell containing only \N will be re-interpreted as a NULL, and then the resulting STRING column will be cast to the data type specified in the table declaration in the usual way. User scripts can output debug information to standard error which will be shown on the task detail page on hadoop. These defaults can be overridden with ROW FORMAT ....
In windows, use "cmd /c your_script" instead of just "your_script"
Warning
Icon
It is your responsibility to sanitize any STRING columns prior to transformation. If your STRING column contains tabs, an identity transformer will not give you back what you started with! To help with this, see REGEXP_REPLACE and replace the tabs with some other character on their way into the TRANSFORM() call.
Warning
Icon
Formally, MAP ... and REDUCE ... are syntactic transformations of SELECT TRANSFORM ( ... ). In other words, they serve as comments or notes to the reader of the query. BEWARE: Use of these keywords may be dangerous as (e.g.) typing "REDUCE" does not force a reduce phase to occur and typing "MAP" does not force a new map phase!
Please also see Sort By / Cluster By / Distribute By and Larry Ogrodnek's blog post.
clusterBy: CLUSTER BY colName (',' colName)*distributeBy: DISTRIBUTE BY colName (',' colName)*sortBy: SORT BY colName (ASC | DESC)? (',' colName (ASC | DESC)?)*rowFormat : ROW FORMAT (DELIMITED [FIELDS TERMINATED BY char] [COLLECTION ITEMS TERMINATED BY char] [MAP KEYS TERMINATED BY char] [ESCAPED BY char] [LINES SEPARATED BY char] | SERDE serde_name [WITH SERDEPROPERTIES property_name=property_value, property_name=property_value, ...])outRowFormat : rowFormatinRowFormat : rowFormatoutRecordReader : RECORDREADER classNamequery: FROM ( FROM src MAP expression (',' expression)* (inRowFormat)? USING 'my_map_script' ( AS colName (',' colName)* )? (outRowFormat)? (outRecordReader)? ( clusterBy? | distributeBy? sortBy? ) src_alias ) REDUCE expression (',' expression)* (inRowFormat)? USING 'my_reduce_script' ( AS colName (',' colName)* )? (outRowFormat)? (outRecordReader)? FROM ( FROM src SELECT TRANSFORM '(' expression (',' expression)* ')' (inRowFormat)? USING 'my_map_script' ( AS colName (',' colName)* )? (outRowFormat)? (outRecordReader)? ( clusterBy? | distributeBy? sortBy? ) src_alias ) SELECT TRANSFORM '(' expression (',' expression)* ')' (inRowFormat)? USING 'my_reduce_script' ( AS colName (',' colName)* )? (outRowFormat)? (outRecordReader)? |
SQL Standard Based Authorization Disallows TRANSFORM
The TRANSFORM clause is disallowed when SQL standard based authorization is configured in Hive 0.13.0 and later releases (HIVE-6415).
TRANSFORM Examples
Example #1:
FROM ( FROM pv_users MAP pv_users.userid, pv_users.date USING 'map_script' AS dt, uid CLUSTER BY dt) map_outputINSERT OVERWRITE TABLE pv_users_reduced REDUCE map_output.dt, map_output.uid USING 'reduce_script' AS date, count;FROM ( FROM pv_users SELECT TRANSFORM(pv_users.userid, pv_users.date) USING 'map_script' AS dt, uid CLUSTER BY dt) map_outputINSERT OVERWRITE TABLE pv_users_reduced SELECT TRANSFORM(map_output.dt, map_output.uid) USING 'reduce_script' AS date, count; |
Example #2
FROM ( FROM src SELECT TRANSFORM(src.key, src.value) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.TypedBytesSerDe' USING '/bin/cat' AS (tkey, tvalue) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.TypedBytesSerDe' RECORDREADER 'org.apache.hadoop.hive.ql.exec.TypedBytesRecordReader') tmapINSERT OVERWRITE TABLE dest1 SELECT tkey, tvalue |
Schema-less Map-reduce Scripts
If there is no AS clause after USING my_script, Hive assumes that the output of the script contains 2 parts: key which is before the first tab, and value which is the rest after the first tab. Note that this is different from specifying AS key, value because in that case, value will only contain the portion between the first tab and the second tab if there are multiple tabs.
Note that we can directly do CLUSTER BY key without specifying the output schema of the scripts.
FROM ( FROM pv_users MAP pv_users.userid, pv_users.date USING 'map_script' CLUSTER BY key) map_outputINSERT OVERWRITE TABLE pv_users_reduced REDUCE map_output.key, map_output.value USING 'reduce_script' AS date, count; |
Typing the output of TRANSFORM
The output fields from a script are typed as strings by default; for example in
SELECT TRANSFORM(stuff)USING 'script'AS thing1, thing2 |
They can be immediately casted with the syntax:
SELECT TRANSFORM(stuff)USING 'script'AS (thing1 INT, thing2 INT) |
[HIve - LanguageManual] Transform [没懂]的更多相关文章
- [HIve - LanguageManual] Hive Operators and User-Defined Functions (UDFs)
Hive Operators and User-Defined Functions (UDFs) Hive Operators and User-Defined Functions (UDFs) Bu ...
- Hive的Transform功能
Hive的TRANSFORM关键字提供了在SQL中调用自写脚本的功能,适合实现Hive中没有的功能又不想写UDF的情况.例如,按日期统计每天出现的uid数,通常用如下的SQL SELECT date, ...
- HIVE的transform函数的使用
Hive的TRANSFORM关键字提供了在SQL中调用自写脚本的功能,适合实现Hive中没有的功能又不想写UDF的情况.例如,按日期统计每天出现的uid数,通常用如下的SQL SELECT date, ...
- [Hive - LanguageManual ] ]SQL Standard Based Hive Authorization
Status of Hive Authorization before Hive 0.13 SQL Standards Based Hive Authorization (New in Hive 0. ...
- [Hive - LanguageManual ] Windowing and Analytics Functions (待)
LanguageManual WindowingAndAnalytics Skip to end of metadata Added by Lefty Leverenz, last edi ...
- [HIve - LanguageManual] Sort/Distribute/Cluster/Order By
Syntax of Order By Syntax of Sort By Difference between Sort By and Order By Setting Types for Sort ...
- [Hive - LanguageManual] Import/Export
LanguageManual ImportExport Skip to end of metadata Added by Carl Steinbach, last edited by Le ...
- [Hive - LanguageManual] DML: Load, Insert, Update, Delete
LanguageManual DML Hive Data Manipulation Language Hive Data Manipulation Language Loading files int ...
- [Hive - LanguageManual] Alter Table/Partition/Column
Alter Table/Partition/Column Alter Table Rename Table Alter Table Properties Alter Table Comment Add ...
随机推荐
- 常用Shell的路径
#define REG_SHELL "HKEY_CURRENT_USER\\Software\\Microsoft\\Windows\\CurrentVersion\\Explorer\\S ...
- NPOI技术,
using(FileStream stream=new FileStream("C:\Users\XXXXXX\Desktop\1.xls",FileMode.Open)) ...
- 如何删除ArcSde Service服务
1)打开“控制面板”,“服务”,找到“ArcSde Service(somename)”,这里somename就是你的ArcSde服务的真实的名字,记住这个名字(为叙述方便,以下用somename表示 ...
- JavaScript 节点操作Dom属性和方法(转)
JavaScript 节点操作Dom属性和方法 一些常用的dom属性和方法,列出来作为手册用. 属性: 1.Attributes 存储节点的属性列表(只读) 2.childNodes 存储 ...
- hadoop拾遗(一)---- 避免切分map文件
有些程序可能不希望文件被切分,而是用一个mapper完整处理每一个输入文件.例如,检查一个文件中所有记录是否有序,一个简单的方法是顺序扫描第一条记录并并比较后一条记录是否比前一条要小.如果将它实现为一 ...
- jquery index()方法
搜索匹配的元素,并返回相应元素的索引值,从0开始计数. 如果不给 .index() 方法传递参数,那么返回值就是这个jQuery对象集合中第一个元素相对于其同辈元素的位置. 如果参数是一 ...
- 车牌识别LPR(二)-- 车牌特征及难点
第二篇:车牌的特征及难点 2.1 对我国车牌的认识 我国目前使用的汽车牌号标准是 2007 年开始实施的<中华人民共和国机动车号牌>GA36-2007(2010 年修订).根据 GA36 ...
- OAuth2.0和SSO授权的区别
OAuth2.0和SSO授权 一.OAuth2.0授权协议 一种安全的登陆协议,用户提交的账户密码不提交到本APP,而是提交到授权服务器,待服务器确认后,返回本APP一个访问令牌,本APP即可用该 ...
- IOS ARC与非ARC混合编译
要开启ARC的:-fobjc-arc不开启ARC的:-fno-objc-arc 是否使用arc: 在build setting里找automatic reference counting,YES/NO
- if(username.equals(“zxx”){}
1. if(username.equals(“zxx”){} username可能为NULL,会报空指针错误:改为"zxx".equals(username) 2. int x ...