hive--UDF、UDAF

1、UDF

package com.example.hive.udf;

import org.apache.hadoop.hive.ql.exec.UDF;

import org.apache.hadoop.io.Text;

public final class Lower extends UDF {

  public Text evaluate(final Text s) {

    if (s == null) { return null; }

    return new Text(s.toString().toLowerCase());

  }

}

add jar my_jar.jar;

create temporary function my_lower as 'com.example.hive.udf.Lower';

主要描述了实现一个udf的过程，首先自然是实现一个UDF函数，然后编译为jar并加入到hive的classpath中，最后创建一个临时变量名字让hive中调用。

2、UDAF

package org.apache.hadoop.hive.contrib.udaf.example;

import org.apache.hadoop.hive.ql.exec.UDAF;

import org.apache.hadoop.hive.ql.exec.UDAFEvaluator;

/**

 * This is a simple UDAF that calculates average.

 *

 * It should be very easy to follow and can be used as an example for writing

 * new UDAFs.

 *

 * Note that Hive internally uses a different mechanism (called GenericUDAF) to

 * implement built-in aggregation functions, which are harder to program but

 * more efficient.

 *

 */

public final class UDAFExampleAvg extends UDAF {

  /**

   * The internal state of an aggregation for average.

   *

   * Note that this is only needed if the internal state cannot be represented

   * by a primitive.

   *

   * The internal state can also contains fields with types like

   * ArrayList<String> and HashMap<String,Double> if needed.

   */

  public static class UDAFAvgState {

    private long mCount;

    private double mSum;

  }

  /**

   * The actual class for doing the aggregation. Hive will automatically look

   * for all internal classes of the UDAF that implements UDAFEvaluator.

   */

  public static class UDAFExampleAvgEvaluator implements UDAFEvaluator {

    UDAFAvgState state;

    public UDAFExampleAvgEvaluator() {

      super();

      state = new UDAFAvgState();

      init();

    }

    /**

     * Reset the state of the aggregation.

     */

    public void init() {

      state.mSum = 0;

      state.mCount = 0;

    }

    /**

     * Iterate through one row of original data.

     *

     * The number and type of arguments need to the same as we call this UDAF

     * from Hive command line.

     *

     * This function should always return true.

     */

    public boolean iterate(Double o) {

      if (o != null) {

        state.mSum += o;

        state.mCount++;

      }

      return true;

    }

    /**

     * Terminate a partial aggregation and return the state. If the state is a

     * primitive, just return primitive Java classes like Integer or String.

     */

    public UDAFAvgState terminatePartial() {

      // This is SQL standard - average of zero items should be null.

      return state.mCount == 0 ? null : state;

    }

    /**

     * Merge with a partial aggregation.

     *

     * This function should always have a single argument which has the same

     * type as the return value of terminatePartial().

     */

    public boolean merge(UDAFAvgState o) {

      if (o != null) {

        state.mSum += o.mSum;

        state.mCount += o.mCount;

      }

      return true;

    }

    /**

     * Terminates the aggregation and return the final result.

     */

    public Double terminate() {

      // This is SQL standard - average of zero items should be null.

      return state.mCount == 0 ? null : Double.valueOf(state.mSum

          / state.mCount);

    }

  }

  private UDAFExampleAvg() {

    // prevent instantiation

  }

}

关于UDAF开发注意点：

1.需要import org.apache.hadoop.hive.ql.exec.UDAF以及org.apache.hadoop.hive.ql.exec.UDAFEvaluator,这两个包都是必须的

2.函数类需要继承UDAF类，内部类Evaluator实现UDAFEvaluator接口

3.Evaluator需要实现 init、iterate、terminatePartial、merge、terminate这几个函数

1）init函数类似于构造函数，用于UDAF的初始化

2）iterate接收传入的参数，并进行内部的轮转。其返回类型为boolean

3）terminatePartial无参数，其为iterate函数轮转结束后，返回乱转数据，iterate和terminatePartial类似于hadoop的Combiner

4）merge接收terminatePartial的返回结果，进行数据merge操作，其返回类型为boolean

5）terminate返回最终的聚集函数结果

hive--UDF、UDAF的更多相关文章

Hive 10、Hive的UDF、UDAF、UDTF
Hive自定义函数包括三种UDF.UDAF.UDTF UDF(User-Defined-Function) 一进一出 UDAF(User- Defined Aggregation Funcation) ...
hive中UDF、UDAF和UDTF使用
Hive进行UDF开发十分简单,此处所说UDF为Temporary的function,所以需要hive版本在0.4.0以上才可以. 一.背景:Hive是基于Hadoop中的MapReduce,提供HQ ...
【转】hive中UDF、UDAF和UDTF使用
原博文出自于: http://blog.csdn.net/liuj2511981/article/details/8523084 感谢! Hive进行UDF开发十分简单,此处所说UDF为Tempora ...
HIVE函数的UDF、UDAF、UDTF
一.词义解析 UDF(User-Defined-Function) 一进一出 UDAF(User- Defined Aggregation Funcation) 多进一出 (聚合函数,MR) UDTF ...
【Spark-SQL学习之三】 UDF、UDAF、开窗函数
环境虚拟机:VMware 10 Linux版本:CentOS-6.5-x86_64 客户端:Xshell4 FTP:Xftp4 jdk1.8 scala-2.10.4(依赖jdk1.8) spark ...
UDF、UDAF、UDTF函数编写
一.UDF函数编写 1.步骤 1.继承UDF类 2.重写evalute方法 .继承GenericUDF .实现initialize.evaluate.getDisplayString方法 2.案例实 ...
Kafka：ZK+Kafka+Spark Streaming集群环境搭建（十五）Spark编写UDF、UDAF、Agg函数
Spark Sql提供了丰富的内置函数让开发者来使用,但实际开发业务场景可能很复杂,内置函数不能够满足业务需求,因此spark sql提供了可扩展的内置函数. UDF:是普通函数,输入一个或多个参数, ...
Hive 编程之DDL、DML、UDF、Select总结
Hive的基本理论与安装可参看作者上一篇博文<Apache Hive 基本理论与安装指南>. 一.Hive命令行所有的hive命令都可以通过hive命令行去执行,hive命令行中仍有许多 ...
在hive中UDF和UDAF使用说明
Hive进行UDF开发十分简单,此处所说UDF为Temporary的function,所以需要hive版本在0.4.0以上才可以. 一.背景:Hive是基于Hadoop中的MapReduce,提供HQ ...
[转]HIVE UDF/UDAF/UDTF的Map Reduce代码框架模板
FROM : http://hugh-wangp.iteye.com/blog/1472371 自己写代码时候的利用到的模板 UDF步骤: 1.必须继承org.apache.hadoop.hive ...

随机推荐

文件夹添加右键DOS快捷入口
1.自带的方法 win7: 按住shift键然后右键点击文件夹,菜单里会出现“在此处打开命令窗口”一项,其实就相当于在当前位置打开Dos窗口,这个是系统自带的. winxp: 打开“我的电脑”,点击菜 ...
What is a heap?--reference
A heap is a partially sorted binary tree. Although a heap is not completely in order, it conforms to ...
CSS 之清除 float 常用的方法
大多数前端使用.clearfix:after{ .....} 和 .clearit{....}的组合来清除浮动. 前端开发经常用到浮动 float:left; float:right; 有浮动就需要 ...
Android_menu_SubMenu
menu.xml <menu xmlns:android="http://schemas.android.com/apk/res/android" > <!-- ...
MongoDB自定义函数部分定义及引用
1. //定义一个Sum的函数 db.system.js.save({_id:"Sum", value:function(key,values) { ; ;i <values ...
简单实体Json序列化(输出JSON的属性可变)
简单实体Json序列化(输出JSON的属性可变) 一.先看效果可以看出 , 我们在序列化一个对像时, 只给出了我们想要输出的两个字段名, 实际实体有5个属性, 经过可变属性序列化后的JSON ...
盒模型Box Model（浮动）
一.标准盒模型的大小:border+padding+content(width) 怪异盒模型大小:padding+border 二.display inline 默认,且变为行由内 ...
快速了解Scala技术栈
http://www.infoq.com/cn/articles/scala-technology/ 我无可救药地成为了Scala的超级粉丝.在我使用Scala开发项目以及编写框架后,它就仿佛凝聚成为 ...
Open Flash Chart在php中的使用教程
http://www.cnblogs.com/huangcong/archive/2013/01/27/2878650.html 为了画一个漂亮的表格,我从网上找到了OpenFlashChart(of ...
RSA算法使用介绍
http://www.cnblogs.com/AloneSword/p/3326750.html RSA是目前最有影响力的公钥加密算法,该算法基于一个十分简单的数论事实:将两个大素数相乘十分容易,但那 ...

hive--UDF、UDAF

hive--UDF、UDAF的更多相关文章

随机推荐

热门专题