hive中 udf,udaf,udtf

1.hive中基本操作；

DDL，DML

2.hive中函数

User-Defined Functions : UDF(用户自定义函数，简称JDF函数)
UDF: 一进一出 upper lower substring（进来一条记录，出去还是一条记录）
UDAF：Aggregation（用户自定的聚合函数）多进一出 count max min sum ...
UDTF: Table-Generation 一进多出

3.举例

show functions显示系统支持的函数

行数举例：split(),explode()

exercise：使用hive统计单词出现次数

explode把数组转成多行的数据

[hadoop@hadoop000 data]$ vi hive-wc.txt

hello,world,welcome

hello,welcome

hive> create table hive_wc(sentence string);

OK

Time taken: 1.083 seconds

hive> load data local inpath '/home/hadoop/data/hive-wc.txt' into table hive_wc;

Loading data to table default.hive_wc

Table default.hive_wc stats: [numFiles=, totalSize=]

OK

Time taken: 1.539 seconds

hive> select * from hive_wc;

OK

hello,world,welcome

hello,welcome

Time taken: 0.536 seconds, Fetched:  row(s)

hive> select split(sentence,",") from hive_wc;

OK

["hello","world","welcome"]

["hello","welcome"]

[""]

Time taken: 0.161 seconds, Fetched:  row(s）

"hello"

"world"

"welcome"

"hello"

"welcome"

用一个SQL完成wordcount统计：

hive> select word, count() as c

    > from (select explode(split(sentence,",")) as word from hive_wc) t

    > group by word ;

Query ID = hadoop_20180613094545_920c2e72--47eb-9a9c-5e5a30ebb1ae

Total jobs =

Launching Job  out of

Number of reduce tasks not specified. Estimated from input data size:

In order to change the average load for a reducer (in bytes):

  set hive.exec.reducers.bytes.per.reducer=<number>

In order to limit the maximum number of reducers:

  set hive.exec.reducers.max=<number>

In order to set a constant number of reducers:

  set mapreduce.job.reduces=<number>

Starting Job = job_1528851144815_0001, Tracking URL = http://hadoop000:8088/proxy/application_1528851144815_0001/

Kill Command = /home/hadoop/app/hadoop-2.6.-cdh5.7.0/bin/hadoop job  -kill job_1528851144815_0001

Hadoop job information for Stage-: number of mappers: ; number of reducers:

-- ::, Stage- map = %,  reduce = %

-- ::, Stage- map = %,  reduce = %, Cumulative CPU 2.42 sec

-- ::, Stage- map = %,  reduce = %, Cumulative CPU 4.31 sec

MapReduce Total cumulative CPU time:  seconds  msec

Ended Job = job_1528851144815_0001

MapReduce Jobs Launched:

Stage-Stage-: Map:   Reduce:    Cumulative CPU: 4.31 sec   HDFS Read:  HDFS Write:  SUCCESS

Total MapReduce CPU Time Spent:  seconds  msec

OK

hello

welcome

world

Time taken: 26.859 seconds, Fetched:  row(s)

4.json类型数据

使用到的文件： rating.json

创建一张表 rating_json，上传数据，并查看前十行数据信息：

hive> create table rating_json(json string);

OK

hive> load data local inpath '/home/hadoop/data/rating.json' into table rating_json;

Loading data to table default.rating_json

Table default.rating_json stats: [numFiles=, totalSize=]

OK

hive> select * from rating_json limit ;

OK

{"movie":"","rate":"","time":"","userid":""}

{"movie":"","rate":"","time":"","userid":""}

{"movie":"","rate":"","time":"","userid":""}

{"movie":"","rate":"","time":"","userid":""}

{"movie":"","rate":"","time":"","userid":""}

{"movie":"","rate":"","time":"","userid":""}

{"movie":"","rate":"","time":"","userid":""}

{"movie":"","rate":"","time":"","userid":""}

{"movie":"","rate":"","time":"","userid":""}

{"movie":"","rate":"","time":"","userid":""}

Time taken: 0.195 seconds, Fetched:  row(s)

对json的数据进行处理，json_tuple 是一个UDTF是 Hive0.7版本引进的：

hive> select

    > json_tuple(json,"movie","rate","time","userid") as (movie,rate,time,userid)

    > from rating_json limit ;

OK

Time taken: 0.189 seconds, Fetched:  row(s)

5.时间类型的转换：

[hadoop@hadoop000 data]$ more hive_row_number.txt

,,ruoze,M

,,jepson,M

,,wangwu,F

,,zhaoliu,F

,,tianqi,M

,,wangba,F

[hadoop@hadoop000 data]$

hive> create table hive_rownumber(id int,age int, name string, sex string)

    > row format delimited fields terminated by ',';

OK

Time taken: 0.451 seconds

hive> load data local inpath '/home/hadoop/data/hive_row_number.txt' into table hive_rownumber;

Loading data to table hive3.hive_rownumber

Table hive3.hive_rownumber stats: [numFiles=, totalSize=]

OK

Time taken: 1.381 seconds

hive> select * from hive_rownumber ;

OK

             ruoze   M

             jepson  M

             wangwu  F

             zhaoliu F

             tianqi  M

             wangba  F

Time taken: 0.455 seconds, Fetched:  row(s)

需求：查询出每种性别中年龄最大的两条数据 -- > topn：

分析：order by 是全局的排序，是做不到分组内的排序的；组内进行排序，就要用到窗口函数or分析函数

select id,age,name.sex

from

(select id,age,name,sex,

row_number() over(partition by sex order by age desc)

from hive_rownumber) t

where rank<=2;

hive> select id,age,name,sex

    > from

    > (select id,age,name,sex,

    > row_number() over(partition by sex order by age desc) as rank

    > from hive_rownumber) t

    > where rank<=;

Query ID = hadoop_20180614202525_9829dc42-3c37--8b12-89c416589ebc

Total jobs =

Launching Job  out of

Number of reduce tasks not specified. Estimated from input data size:

In order to change the average load for a reducer (in bytes):

  set hive.exec.reducers.bytes.per.reducer=<number>

In order to limit the maximum number of reducers:

  set hive.exec.reducers.max=<number>

In order to set a constant number of reducers:

  set mapreduce.job.reduces=<number>

Starting Job = job_1528975858636_0001, Tracking URL = http://hadoop000:/proxy/application_1528975858636_0001/

Kill Command = /home/hadoop/app/hadoop-2.6.-cdh5.7.0/bin/hadoop job  -kill job_1528975858636_0001

Hadoop job information for Stage-: number of mappers: ; number of reducers:

-- ::, Stage- map = %,  reduce = %

-- ::, Stage- map = %,  reduce = %, Cumulative CPU 1.48 sec

-- ::, Stage- map = %,  reduce = %, Cumulative CPU 3.86 sec

MapReduce Total cumulative CPU time:  seconds  msec

Ended Job = job_1528975858636_0001

MapReduce Jobs Launched:

Stage-Stage-: Map:   Reduce:    Cumulative CPU: 3.86 sec   HDFS Read:  HDFS Write:  SUCCESS

Total MapReduce CPU Time Spent:  seconds  msec

OK

             wangba  F

             wangwu  F

             tianqi  M

             jepson  M

Time taken: 29.262 seconds, Fetched:  row(s)

hive中 udf,udaf,udtf的更多相关文章

hive中UDF、UDAF和UDTF使用
Hive进行UDF开发十分简单,此处所说UDF为Temporary的function,所以需要hive版本在0.4.0以上才可以. 一.背景:Hive是基于Hadoop中的MapReduce,提供HQ ...
【转】hive中UDF、UDAF和UDTF使用
原博文出自于: http://blog.csdn.net/liuj2511981/article/details/8523084 感谢! Hive进行UDF开发十分简单,此处所说UDF为Tempora ...
[转]HIVE UDF/UDAF/UDTF的Map Reduce代码框架模板
FROM : http://hugh-wangp.iteye.com/blog/1472371 自己写代码时候的利用到的模板 UDF步骤: 1.必须继承org.apache.hadoop.hive ...
Hive 自定义函数 UDF UDAF UDTF
1.UDF:用户定义(普通)函数,只对单行数值产生作用: 继承UDF类,添加方法 evaluate() /** * @function 自定义UDF统计最小值 * @author John * */ ...
【转】HIVE UDF UDAF UDTF 区别使用
原博文出自于:http://blog.csdn.net/longzilong216/article/details/23921235(暂时) 感谢! 自己写代码时候的利用到的模板 UDF步骤: 1 ...
在hive中UDF和UDAF使用说明
Hive进行UDF开发十分简单,此处所说UDF为Temporary的function,所以需要hive版本在0.4.0以上才可以. 一.背景:Hive是基于Hadoop中的MapReduce,提供HQ ...
简述UDF/UDAF/UDTF是什么，各自解决问题及应用场景
UDF User-Defined-Function 自定义函数 .一进一出: 背景系统内置函数无法解决实际的业务问题,需要开发者自己编写函数实现自身的业务实现诉求. 应用场景非常多,面临的业务不同导 ...
Hive中的UDF详解
hive作为一个sql查询引擎,自带了一些基本的函数,比如count(计数),sum(求和),有时候这些基本函数满足不了我们的需求,这时候就要写hive hdf(user defined funati ...
hive自定义UDF
udf udaf udtf 使用方式 hiverc文件 1.jar包放到安装日录下或者指定目录下 2.${HIVE_HOME}/bin目录下有个.hiverc文件,它是隐藏文件. 3.把初始化语句加载 ...

随机推荐

html基值仿淘宝
$(function(){ var scale = 1 / devicePixelRatio; document.querySelector('meta[name="viewport&quo ...
智能机器人“小昆”的实现（五）MainActivty的实现及项目结束
好了,一切准备工作都完成了,下面我们就可以真正的编写MainActivity了.在MainActivity中,我们要为ListView设定适配器,并为发送按钮设定点击事件.我们的逻辑就是点击发送按钮, ...
MyBatis框架（4）全局文件
本次全部学习内容:MyBatisLearning 全局配置文件(本次案例中):
第六章.MyBatis缓存结构
一级缓存测试案例: MyBatisTest.java //缓存 @Test public void testFindCustomerCache1() throws Exception{ SqlSes ...
火狐中jq的attr出现的bug问题用prop代替
再工作的时候遇到一个很奇怪的问题 ,就是attr属性不好使!就问度娘去了...... 结果如下: .prop() 1..prop( propertyName ) 获取匹配集合中第一个元素的Prop ...
HDU 2096 小明A+B（%的运用）
传送门: 小明A+B Time Limit: 1000/1000 MS (Java/Others) Memory Limit: 32768/32768 K (Java/Others)Total ...
Java参数传递对象引用传递失效
产线问题排查,方法传递对象引用但返回后对象一直为空原因: null作为参数传递的时候,就不是引用传参了 Java参数引用传递之例外:null 众所周知的是,java中除基本类型外,参数都是引用传递. ...
javascript入门教程 (1)
对于刚刚接触前端开发或者刚开始学习javascript的同学来说,js能用来做些什么,它是如何诞生的,它的组成结构是怎么的,在这些问题上可能都只有一些模糊的概念, js的入门篇就是希望可以从0开始深 ...
C#自定义异常
继承自System.ApplicationException类,并使用Exception作为自定义异常类名的结尾三个构造函数:一个无参构造函数:一个字符串参数的构造函数:一个字符串参数,一个内部异常 ...
HTML中IMG标签总结
一.Img标签有两个重要的属性: 1.src 属性:图片的地址 2.alt 属性:图片不显示是现实的文字二.Img标签是行级元素: img.input属于行内替换元素.height/width/ ...

hive中 udf,udaf,udtf

hive中 udf,udaf,udtf的更多相关文章

随机推荐

热门专题