[HIve - LanguageManual] Transform [没懂]
Transform/Map-Reduce Syntax
Users can also plug in their own custom mappers and reducers in the data stream by using features natively supported in the Hive 2.0 language. e.g. in order to run a custom mapper script - map_script - and a custom reducer script - reduce_script - the user can issue the following command which uses the TRANSFORM clause to embed the mapper and the reducer scripts.
By default, columns will be transformed to STRING and delimited by TAB before feeding to the user script; similarly, all NULL values will be converted to the literal string \N in order to differentiate NULL values from empty strings. The standard output of the user script will be treated as TAB-separated STRINGcolumns, any cell containing only \N will be re-interpreted as a NULL, and then the resulting STRING column will be cast to the data type specified in the table declaration in the usual way. User scripts can output debug information to standard error which will be shown on the task detail page on hadoop. These defaults can be overridden with ROW FORMAT ....
In windows, use "cmd /c your_script" instead of just "your_script"
Warning
Icon
It is your responsibility to sanitize any STRING columns prior to transformation. If your STRING column contains tabs, an identity transformer will not give you back what you started with! To help with this, see REGEXP_REPLACE and replace the tabs with some other character on their way into the TRANSFORM() call.
Warning
Icon
Formally, MAP ... and REDUCE ... are syntactic transformations of SELECT TRANSFORM ( ... ). In other words, they serve as comments or notes to the reader of the query. BEWARE: Use of these keywords may be dangerous as (e.g.) typing "REDUCE" does not force a reduce phase to occur and typing "MAP" does not force a new map phase!
Please also see Sort By / Cluster By / Distribute By and Larry Ogrodnek's blog post.
clusterBy: CLUSTER BY colName (',' colName)*distributeBy: DISTRIBUTE BY colName (',' colName)*sortBy: SORT BY colName (ASC | DESC)? (',' colName (ASC | DESC)?)*rowFormat : ROW FORMAT (DELIMITED [FIELDS TERMINATED BY char] [COLLECTION ITEMS TERMINATED BY char] [MAP KEYS TERMINATED BY char] [ESCAPED BY char] [LINES SEPARATED BY char] | SERDE serde_name [WITH SERDEPROPERTIES property_name=property_value, property_name=property_value, ...])outRowFormat : rowFormatinRowFormat : rowFormatoutRecordReader : RECORDREADER classNamequery: FROM ( FROM src MAP expression (',' expression)* (inRowFormat)? USING 'my_map_script' ( AS colName (',' colName)* )? (outRowFormat)? (outRecordReader)? ( clusterBy? | distributeBy? sortBy? ) src_alias ) REDUCE expression (',' expression)* (inRowFormat)? USING 'my_reduce_script' ( AS colName (',' colName)* )? (outRowFormat)? (outRecordReader)? FROM ( FROM src SELECT TRANSFORM '(' expression (',' expression)* ')' (inRowFormat)? USING 'my_map_script' ( AS colName (',' colName)* )? (outRowFormat)? (outRecordReader)? ( clusterBy? | distributeBy? sortBy? ) src_alias ) SELECT TRANSFORM '(' expression (',' expression)* ')' (inRowFormat)? USING 'my_reduce_script' ( AS colName (',' colName)* )? (outRowFormat)? (outRecordReader)? |
SQL Standard Based Authorization Disallows TRANSFORM
The TRANSFORM clause is disallowed when SQL standard based authorization is configured in Hive 0.13.0 and later releases (HIVE-6415).
TRANSFORM Examples
Example #1:
FROM ( FROM pv_users MAP pv_users.userid, pv_users.date USING 'map_script' AS dt, uid CLUSTER BY dt) map_outputINSERT OVERWRITE TABLE pv_users_reduced REDUCE map_output.dt, map_output.uid USING 'reduce_script' AS date, count;FROM ( FROM pv_users SELECT TRANSFORM(pv_users.userid, pv_users.date) USING 'map_script' AS dt, uid CLUSTER BY dt) map_outputINSERT OVERWRITE TABLE pv_users_reduced SELECT TRANSFORM(map_output.dt, map_output.uid) USING 'reduce_script' AS date, count; |
Example #2
FROM ( FROM src SELECT TRANSFORM(src.key, src.value) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.TypedBytesSerDe' USING '/bin/cat' AS (tkey, tvalue) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.TypedBytesSerDe' RECORDREADER 'org.apache.hadoop.hive.ql.exec.TypedBytesRecordReader') tmapINSERT OVERWRITE TABLE dest1 SELECT tkey, tvalue |
Schema-less Map-reduce Scripts
If there is no AS clause after USING my_script, Hive assumes that the output of the script contains 2 parts: key which is before the first tab, and value which is the rest after the first tab. Note that this is different from specifying AS key, value because in that case, value will only contain the portion between the first tab and the second tab if there are multiple tabs.
Note that we can directly do CLUSTER BY key without specifying the output schema of the scripts.
FROM ( FROM pv_users MAP pv_users.userid, pv_users.date USING 'map_script' CLUSTER BY key) map_outputINSERT OVERWRITE TABLE pv_users_reduced REDUCE map_output.key, map_output.value USING 'reduce_script' AS date, count; |
Typing the output of TRANSFORM
The output fields from a script are typed as strings by default; for example in
SELECT TRANSFORM(stuff)USING 'script'AS thing1, thing2 |
They can be immediately casted with the syntax:
SELECT TRANSFORM(stuff)USING 'script'AS (thing1 INT, thing2 INT) |
[HIve - LanguageManual] Transform [没懂]的更多相关文章
- [HIve - LanguageManual] Hive Operators and User-Defined Functions (UDFs)
Hive Operators and User-Defined Functions (UDFs) Hive Operators and User-Defined Functions (UDFs) Bu ...
- Hive的Transform功能
Hive的TRANSFORM关键字提供了在SQL中调用自写脚本的功能,适合实现Hive中没有的功能又不想写UDF的情况.例如,按日期统计每天出现的uid数,通常用如下的SQL SELECT date, ...
- HIVE的transform函数的使用
Hive的TRANSFORM关键字提供了在SQL中调用自写脚本的功能,适合实现Hive中没有的功能又不想写UDF的情况.例如,按日期统计每天出现的uid数,通常用如下的SQL SELECT date, ...
- [Hive - LanguageManual ] ]SQL Standard Based Hive Authorization
Status of Hive Authorization before Hive 0.13 SQL Standards Based Hive Authorization (New in Hive 0. ...
- [Hive - LanguageManual ] Windowing and Analytics Functions (待)
LanguageManual WindowingAndAnalytics Skip to end of metadata Added by Lefty Leverenz, last edi ...
- [HIve - LanguageManual] Sort/Distribute/Cluster/Order By
Syntax of Order By Syntax of Sort By Difference between Sort By and Order By Setting Types for Sort ...
- [Hive - LanguageManual] Import/Export
LanguageManual ImportExport Skip to end of metadata Added by Carl Steinbach, last edited by Le ...
- [Hive - LanguageManual] DML: Load, Insert, Update, Delete
LanguageManual DML Hive Data Manipulation Language Hive Data Manipulation Language Loading files int ...
- [Hive - LanguageManual] Alter Table/Partition/Column
Alter Table/Partition/Column Alter Table Rename Table Alter Table Properties Alter Table Comment Add ...
随机推荐
- [iOS]利用Appicon and Launchimage Maker生成并配置iOSApp的图标和启动页
一.先来研究下这个软件->Appicon and Launchimage Maker 首先打开你电脑上的AppStore,然后搜索:AppIcon 然后回车: 这里我们先使用免费版的点击下载.( ...
- JDynamic :支持Json反序列化为Dynamic对象
JDynamic :支持Json反序列化为Dynamic对象 2010年 .NET 4.0 发布前后,从3.5向4.0迁移,那时也有一些异构系统的需求,主要是和PHP打交道,通信使用的HTTP 格 ...
- Java IO 遇到的错误
1.java.io.FileNotFoundException: /storage/emulated/0/xxx.txt: open failed: EISDIR (Is a directory) 该 ...
- windows线程同步
一.前言 之前在项目中,由于需要使用到多线程,多线程能够提高执行的效率,同时也带来线程同步的问题,故特此总结如下. 二.windows线程同步机制 windows线程同步机制常用的有几种:Event. ...
- MemSQL Start[c]UP 2.0 - Round 1 B. 4-point polyline (线段的 枚举)
昨天cf做的不好,居然挂零了,还是1点开始的呢.,,, a题少了一个条件,没判断长度. 写一下B题吧 题目链接 题意: 给出(n, m),可以得到一个矩形 让你依次连接矩形内的4个点使它们的长度和最长 ...
- Java与正则表达式
Java与正则表达式 标签: Java基础 正则 正如正则的名字所显示的是描述了一个规则, 通过这个规则去匹配字符串. 学习正则就是学习正则表达式的语法规则 正则语法 普通字符 字母, 数字, 汉字, ...
- UVa 1328 (KMP求字符串周期) Period
当初学KMP的时候也做过这道题,现在看来还是刘汝佳的代码要精简一些,毕竟代码越短越好记,越不容易出错. 而且KMP的递推失配函数的代码风格和后面的Aho-Corasick自动机求失配函数的代码风格也是 ...
- Mysql的 时间戳转换 和 c# 的时间戳转换 (以秒来进行转换,非毫秒,主要是mysql不能存毫秒)
Mysql 时间戳函数 => 从时间 转成 时间戳 UNIX_TIMESTAMP() 获取当前服务器时间的时间戳 UNIX_TIMESTAMP('2013-01-01 12:33:19') ...
- [反汇编练习] 160个CrackMe之008
[反汇编练习] 160个CrackMe之008. 本系列文章的目的是从一个没有任何经验的新手的角度(其实就是我自己),一步步尝试将160个CrackMe全部破解,如果可以,通过任何方式写出一个类似于注 ...
- 图文详解YUV420数据格式
YUV格式有两大类:planar和packed.对于planar的YUV格式,先连续存储所有像素点的Y,紧接着存储所有像素点的U,随后是所有像素点的V.对于packed的YUV格式,每个像素点的Y,U ...