hive--UDF、UDAF
package com.example.hive.udf; import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text; public final class Lower extends UDF {
public Text evaluate(final Text s) {
if (s == null) { return null; }
return new Text(s.toString().toLowerCase());
}
}
add jar my_jar.jar;
create temporary function my_lower as 'com.example.hive.udf.Lower';
主要描述了实现一个udf的过程,首先自然是实现一个UDF函数,然后编译为jar并加入到hive的classpath中,最后创建一个临时变量名字让hive中调用。
package org.apache.hadoop.hive.contrib.udaf.example; import org.apache.hadoop.hive.ql.exec.UDAF;
import org.apache.hadoop.hive.ql.exec.UDAFEvaluator; /**
* This is a simple UDAF that calculates average.
*
* It should be very easy to follow and can be used as an example for writing
* new UDAFs.
*
* Note that Hive internally uses a different mechanism (called GenericUDAF) to
* implement built-in aggregation functions, which are harder to program but
* more efficient.
*
*/
public final class UDAFExampleAvg extends UDAF { /**
* The internal state of an aggregation for average.
*
* Note that this is only needed if the internal state cannot be represented
* by a primitive.
*
* The internal state can also contains fields with types like
* ArrayList<String> and HashMap<String,Double> if needed.
*/
public static class UDAFAvgState {
private long mCount;
private double mSum;
} /**
* The actual class for doing the aggregation. Hive will automatically look
* for all internal classes of the UDAF that implements UDAFEvaluator.
*/
public static class UDAFExampleAvgEvaluator implements UDAFEvaluator { UDAFAvgState state; public UDAFExampleAvgEvaluator() {
super();
state = new UDAFAvgState();
init();
} /**
* Reset the state of the aggregation.
*/
public void init() {
state.mSum = 0;
state.mCount = 0;
} /**
* Iterate through one row of original data.
*
* The number and type of arguments need to the same as we call this UDAF
* from Hive command line.
*
* This function should always return true.
*/
public boolean iterate(Double o) {
if (o != null) {
state.mSum += o;
state.mCount++;
}
return true;
} /**
* Terminate a partial aggregation and return the state. If the state is a
* primitive, just return primitive Java classes like Integer or String.
*/
public UDAFAvgState terminatePartial() {
// This is SQL standard - average of zero items should be null.
return state.mCount == 0 ? null : state;
} /**
* Merge with a partial aggregation.
*
* This function should always have a single argument which has the same
* type as the return value of terminatePartial().
*/
public boolean merge(UDAFAvgState o) {
if (o != null) {
state.mSum += o.mSum;
state.mCount += o.mCount;
}
return true;
} /**
* Terminates the aggregation and return the final result.
*/
public Double terminate() {
// This is SQL standard - average of zero items should be null.
return state.mCount == 0 ? null : Double.valueOf(state.mSum
/ state.mCount);
}
} private UDAFExampleAvg() {
// prevent instantiation
} }
关于UDAF开发注意点:
1.需要import org.apache.hadoop.hive.ql.exec.UDAF以及org.apache.hadoop.hive.ql.exec.UDAFEvaluator,这两个包都是必须的
2.函数类需要继承UDAF类,内部类Evaluator实现UDAFEvaluator接口
3.Evaluator需要实现 init、iterate、terminatePartial、merge、terminate这几个函数
1)init函数类似于构造函数,用于UDAF的初始化
2)iterate接收传入的参数,并进行内部的轮转。其返回类型为boolean
3)terminatePartial无参数,其为iterate函数轮转结束后,返回乱转数据,iterate和terminatePartial类似于hadoop的Combiner
4)merge接收terminatePartial的返回结果,进行数据merge操作,其返回类型为boolean
5)terminate返回最终的聚集函数结果
hive--UDF、UDAF的更多相关文章
- Hive 10、Hive的UDF、UDAF、UDTF
Hive自定义函数包括三种UDF.UDAF.UDTF UDF(User-Defined-Function) 一进一出 UDAF(User- Defined Aggregation Funcation) ...
- hive中UDF、UDAF和UDTF使用
Hive进行UDF开发十分简单,此处所说UDF为Temporary的function,所以需要hive版本在0.4.0以上才可以. 一.背景:Hive是基于Hadoop中的MapReduce,提供HQ ...
- 【转】hive中UDF、UDAF和UDTF使用
原博文出自于: http://blog.csdn.net/liuj2511981/article/details/8523084 感谢! Hive进行UDF开发十分简单,此处所说UDF为Tempora ...
- HIVE函数的UDF、UDAF、UDTF
一.词义解析 UDF(User-Defined-Function) 一进一出 UDAF(User- Defined Aggregation Funcation) 多进一出 (聚合函数,MR) UDTF ...
- 【Spark-SQL学习之三】 UDF、UDAF、开窗函数
环境 虚拟机:VMware 10 Linux版本:CentOS-6.5-x86_64 客户端:Xshell4 FTP:Xftp4 jdk1.8 scala-2.10.4(依赖jdk1.8) spark ...
- UDF、UDAF、UDTF函数编写
一.UDF函数编写 1.步骤 1.继承UDF类 2.重写evalute方法 .继承GenericUDF .实现initialize.evaluate.getDisplayString方法 2.案例 实 ...
- Kafka:ZK+Kafka+Spark Streaming集群环境搭建(十五)Spark编写UDF、UDAF、Agg函数
Spark Sql提供了丰富的内置函数让开发者来使用,但实际开发业务场景可能很复杂,内置函数不能够满足业务需求,因此spark sql提供了可扩展的内置函数. UDF:是普通函数,输入一个或多个参数, ...
- Hive 编程之DDL、DML、UDF、Select总结
Hive的基本理论与安装可参看作者上一篇博文<Apache Hive 基本理论与安装指南>. 一.Hive命令行 所有的hive命令都可以通过hive命令行去执行,hive命令行中仍有许多 ...
- 在hive中UDF和UDAF使用说明
Hive进行UDF开发十分简单,此处所说UDF为Temporary的function,所以需要hive版本在0.4.0以上才可以. 一.背景:Hive是基于Hadoop中的MapReduce,提供HQ ...
- [转]HIVE UDF/UDAF/UDTF的Map Reduce代码框架模板
FROM : http://hugh-wangp.iteye.com/blog/1472371 自己写代码时候的利用到的模板 UDF步骤: 1.必须继承org.apache.hadoop.hive ...
随机推荐
- 浅谈iOS中的视图优化
引言: 让我们来思考几个问题,你开发过的产品,它还有可以优化的地方吗?能增加它的帧率吗?能减少多余的CPU计算吗?是不是存在多余的GPU渲染?业务这点工作量对于越来越强大的设备面前显得微不足道,但作为 ...
- Java基础知识强化之IO流笔记64:合并流SequenceInputStream
1. SequenceInputStream合并流的概述: SequenceInputStream类可以将多个输入流串联在一起,合并为一个输入流,因此,该流也被称为合并流. 2. Sequence ...
- 关于Git的merge和rebase命令解析
git rebase是对提交执行变基的操作.即可以实现将指定范围的提交"嫁接"到另外一个提交智商. 其常用的命令格式有: 用法1:git rebase --onto <new ...
- wireshark的ubuntu更新ppa源
默认的ppa源安装的是1.8.3的,这个源直接更新到1.11.0 $ sudo add-apt-repository ppa:dreibh/ppa $ sudo apt-get update $ su ...
- linux版本qq的安装
下载http://download.csdn.net/detail/gg296231363/3728117原谅我吧,1分而已,可以自己google,到处有. tar xzvf linuxqq_v1.0 ...
- Leetcode 102. Binary Tree Level Order Traversal(二叉树的层序遍历)
Given a binary tree, return the level order traversal of its nodes' values. (ie, from left to right, ...
- Windows之vmware安装破解版错误汇总
A.错误: units specified don't exist, SHSUCDX can't install A.解决: 虚拟机配置->CD/DVD->IDE(0,0) B:错误: n ...
- .Net 两个对像之间的映射
using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.T ...
- MySQL 테이블 타입(Heap, MyIsam, InnoDB...) 변경하기
alter table 을 이용해서 기존의 생성된 테이블의 타입(Heap, MyIsam, InnoDB...)을 변경하는 명령어 입니다. 잠시 까먹은 분은 계실지 몰라도 원래 모르는 ...
- scala学习笔记:理解lazy值
scala> var counter = 0 counter: Int = 0 scala> def foo = {counter += 1; counter} foo: Int scal ...