DeveloperGuide Hive UDF

微信-大数据从业者 2024-10-12 08:10:06 原文

Creating Custom UDFs

First, you need to create a new class that extends UDF, with one or more methods named evaluate.

package com.example.hive.udf;

import org.apache.hadoop.hive.ql.exec.UDF;

import org.apache.hadoop.io.Text;

public final class Lower extends UDF {

public Text evaluate(final Text s) {

if (s == null) { return null; }

return new Text(s.toString().toLowerCase());

}

}

(Note that there's already a built-in function for this, it's just an easy example).

After compiling your code to a jar, you need to add this to the Hive classpath. See the section below on deploying jars.

Once Hive is started up with your jars in the classpath, the final step is to register your function as described in Create Function:

create temporary function my_lower as 'com.example.hive.udf.Lower';

Now you can start using it:

hive> select my_lower(title), sum(freq) from titles group by my_lower(title);

...

Ended Job = job_200906231019_0006

OK

cmo 13.0

vp 7.0

For a more involved example, see this page.

As of Hive 0.13, you can register your function as a permanent UDF either in the current database or in a specified database, as described in Permanent Functions. For example:

create function my_db.my_lower as 'com.example.hive.udf.Lower';

Deploying Jars for User Defined Functions and User Defined SerDes

In order to start using your UDF, you first need to add the code to the classpath:

hive> add jar my_jar.jar;

Added my_jar.jar to class path

By default, it will look in the current directory. You can also specify a full path:

hive> add jar /tmp/my_jar.jar;

Added /tmp/my_jar.jar to class path

Your jar will then be on the classpath for all jobs initiated from that session. To see which jars have been added to the classpath you can use:

hive> list jars;

my_jar.jar

See Hive CLI for full syntax and more examples.

As of Hive 0.13, UDFs also have the option of being able to specify required jars in the CREATE FUNCTION statement:

CREATE FUNCTION myfunc AS 'myclass' USING JAR 'hdfs:///path/to/jar';

This will add the jar to the classpath as if ADD JAR had been called on that jar.

DeveloperGuide Hive UDF的更多相关文章

Hive UDF初探
1. 引言在前一篇中,解决了Hive表中复杂数据结构平铺化以导入Kylin的问题,但是平铺之后计算广告日志的曝光PV是翻倍的,因为一个用户对应于多个标签.所以,为了计算曝光PV,我们得另外创建视图. ...
Hive UDF 实验1
项目中使用的hive版本低于0.11,无法使用hive在0.11中新加的开窗分析函数. 在项目中需要使用到row_number()函数的地方,有人写了udf来实现这个功能. new java proj ...
hive UDF添加方式
hive UDF添加的方式 1.添加临时函数,只能在此会话中生效,退出hive自动失效 hive> add jar /home/jtdata/hiveUDF/out0.jar; Added [/ ...
[转]HIVE UDF/UDAF/UDTF的Map Reduce代码框架模板
FROM : http://hugh-wangp.iteye.com/blog/1472371 自己写代码时候的利用到的模板 UDF步骤: 1.必须继承org.apache.hadoop.hive ...
2、Hive UDF编程实例
Hive的UDF包括3种:UDF(User-Defined Function).UDAF(User-Defined Aggregate Function)和UDTF(User-Defined Tabl ...
Hive UDF 用户自定义函数编程及使用
首先创建工程编写UDF 代码,示例如下: 1. 新建Maven项目 udf 本机Hadoop版本为2.7.7, Hive版本为1.2.2,所以选择对应版本的jar ,其它版本也不影响编译. 2. po ...
Hive UDF开发-简介
Hive进行UDF开发十分简单,此处所说UDF为Temporary的function,所以需要hive版本在0.4.0以上才可以. Hive的UDF开发只需要重构UDF类的evaluate函数即可.例 ...
【转】HIVE UDF UDAF UDTF 区别使用
原博文出自于:http://blog.csdn.net/longzilong216/article/details/23921235(暂时) 感谢! 自己写代码时候的利用到的模板 UDF步骤: 1 ...
HIVE udf实例
本例中udf来自<hive编程指南>其中13章自定义函数中一个例子. 按照步骤,第一步,建立一个项目,创建 GenericUDFNvl 类. /** * 不能接受第一个参数为null的情况 ...

随机推荐

从零开始学习PYTHON3讲义（四）让程序更友好
<从零开始PYTHON3>第四讲先看看上一讲的练习答案. 程序完成的是功能,功能来自于"程序需求"("需求"这个词忘记了什么意思的去复习一下第二讲 ...
手工在Docker for mac上安装Kubernetes
此文发布时间比较早,当前已经有更好的办法,请参考网页: https://github.com/AliyunContainerService/k8s-for-docker-desktop 以下为原文通 ...
Android Native App自动化测试实战讲解（下）（基于python）
6.Appuim自动化测试框架API讲解与案例实践(三) 如图1,可以在主函数里通过TestSuite来指定执行某一个测试用例: 6.1,scroll():如图2 从图3中可以看到当前页面的所有元素r ...
Chapter 5 Blood Type——10
"What?" “什么?” "Your boyfriend seems to think I'm being unpleasant to you — he's debat ...
linux centos 安装Jenkins(非docker方式)
写在前面我之前写过Asp.net Core 使用Jenkins + Dockor 实现持续集成.自动化部署(一):Jenkins安装这jenkisn的安装过程,但这篇使用的是docker的方式安装的 ...
js反爬-从入门到精通webdriver
学习JS反爬地址:http://openlaw.cn/login.jsp 想在指导案例中抓些内容,需要登陆账号密码发送会以下面方式发送所以需要找到_csrf和加密后的password,_csrf ...
linux进程管理和系统状态常用命令简介
1 进程管理简介进程(Process)是计算机中的程序关于某数据集合上的一次运行活动,是系统进行资源分配和调度的基本单位,是操作系统结构的基础 2 常用命令 2.1 pstree 2.1.1 功能描 ...
LeetCode专题-Python实现之第7题：Reverse Integer
导航页-LeetCode专题-Python实现相关代码已经上传到github:https://github.com/exploitht/leetcode-python 文中代码为了不动官网提供的初始 ...
kernel pwn 入门环境搭建
刚开始上手kernel pwn,光环境就搭了好几天,应该是我太菜了.. 好下面进入正题,环境总共就由两部分构成,qemu和gdb.这两个最好都需要使用源码安装. 我使用的安装环境为 qemu:安装前要 ...
10年架构师告诉你，他眼中的Spring容器是什么样子的
相关文章如何慢慢地快速成长起来? 成长的故事之Spring Core系列你是如何看待Spring容器的,是这样子吗? Spring的启动过程,你有认真思考过吗?(待写) 面向切面编程,你指的是Sp ...