get top k elements of the same key in hive
key points:
1. group by key and sort by using distribute by and sort by.
2. get top k elements by a UDF (user defined function) RANK
---------Here is the source code.--------------
package com.example.hive.udf;
import org.apache.hadoop.hive.ql.exec.UDF;
public final class Rank extends UDF{
private int counter;
private String last_key;
public int evaluate(final String key){
if ( !key.equalsIgnoreCase(this.last_key) ) {
this.counter = 0;
this.last_key = key;
}
return this.counter++;
}
}
The details are as the following.
---original data, table region(region_nbr, region_id)---
100 10
200 12
300 33
100 4
100 8
200 20
300 31
300 3
400 4
200 2
-----what I need is as below-----
100 10
100 8
200 20
200 12
300 33
300 31
400 4
---
1. step1. compile java with a shell, compile_udf.sh.
#!/bin/bash
if [ $# != 1 ]; then
echo "Usage: $0 <java file>"
exit 1
fi
CNAME=${1%.java}
JARNAME=$CNAME.jar
JARDIR=/tmp/hive_jars/$CNAME
HIVE_HOME2="/usr/local/hive-0.9.0"
CLASSPATH=$(ls $HIVE_HOME2/lib/hive-serde-*.jar):$(ls $HIVE_HOME2/lib/hive-exec-*.jar):$(ls /home/oicq/hadoop/hadoop-1.0.2/hadoop-core-*.jar)
function tell {
echo
echo "$1 successfully compiled. In Hive run:"
#echo "$> add jar $JARNAME;"
#echo "$> create temporary function $CNAME as 'com.example.hive.udf.$CNAME';"
echo
}
mkdir -p $JARDIR
javac -classpath $CLASSPATH -d $JARDIR/ $1 && jar -cf $JARNAME -C $JARDIR/ . && tell $1
step 2. run hive
hive -e "add jar /data/ginobili/UDF/Rank.jar; create temporary function Rank as 'com.example.hive.udf.Rank'; select region_nbr, region_id from ( select region_nbr, region_id, Rank(region_nbr) as rank from (select * from test_gino.region distribute by region_nbr sort by region_nbr, region_id desc)a )b where rank < 2"

REFERENCE
http://stackoverflow.com/questions/9390698/hive-getting-top-n-records-in-group-by-query
get top k elements of the same key in hive的更多相关文章
- Leetcode 347. Top K Frequent Elements
Given a non-empty array of integers, return the k most frequent elements. For example,Given [1,1,1,2 ...
- 347. Top K Frequent Elements
Given a non-empty array of integers, return the k most frequent elements. For example,Given [1,1,1,2 ...
- [Swift]LeetCode347. 前K个高频元素 | Top K Frequent Elements
Given a non-empty array of integers, return the k most frequent elements. Example 1: Input: nums = [ ...
- C#版(打败99.28%的提交) - Leetcode 347. Top K Frequent Elements - 题解
版权声明: 本文为博主Bravo Yeung(知乎UserName同名)的原创文章,欲转载请先私信获博主允许,转载时请附上网址 http://blog.csdn.net/lzuacm. C#版 - L ...
- 347. Top K Frequent Elements (sort map)
Given a non-empty array of integers, return the k most frequent elements. Example 1: Input: nums = [ ...
- [LeetCode] Top K Frequent Elements 前K个高频元素
Given a non-empty array of integers, return the k most frequent elements. For example,Given [1,1,1,2 ...
- [leetcode]347. Top K Frequent Elements K个最常见元素
Given a non-empty array of integers, return the k most frequent elements. Example 1: Input: nums = [ ...
- Top K Frequent Elements 前K个高频元素
Top K Frequent Elements 347. Top K Frequent Elements [LeetCode] Top K Frequent Elements 前K个高频元素
- [LeetCode] 347. Top K Frequent Elements 前K个高频元素
Given a non-empty array of integers, return the k most frequent elements. Example 1: Input: nums = [ ...
随机推荐
- Android 添加子视图(addView和setView)
我们在添加视图文件的时候有两种方式,一种是通过在xml文件定义layout,另一种方式是在java代码中动态生成布局文件. 在xml中定义的layout要想转化为view,需要使用到LayoutInf ...
- USBSpirit(USB精灵)更新到1.2.300.105
USBSpirit(USB精灵)是CopyU!的内核引擎,CopyU!的主要功能均由该引擎提供,此次更新主要内容如下:(版本号:1.2.300.105) 1.[修复]:修复了几处引擎的资源泄露问题,提 ...
- Delphi 在使用exports中的方法 带参数的用法
最近项目中,需要在一个bpl中调用另一个bpl中的单元的方法, 方法如下: 在被调用的单元中定义: procedure Inner_Ex(VoucherType: TVoucherType); exp ...
- google官方提供的下拉刷新控件SwipeRefreshLayout
摘自:http://www.stormzhang.com/android/2014/03/29/android-swiperefreshlayout/ SwipeRefreshLayout Swipe ...
- 你应该了解的 7个Linux ls 命令技巧
在前面我们系列报道的两篇文章中,我们已经涵盖了关于‘ls’命令的绝大多数内容.本文时‘ls命令’系列的最后一部分.如果你还没有读过该系列的其它两篇文章,你可以访问下面的链接. 15 个‘ls’命令的面 ...
- (转)最近研究xcodebuild批量打包的一些心得
以前的时候只知道做安卓开发的兄弟挺辛苦的,不但开发的时候要适配一堆的机型,好不容易开发完了还要打一堆不同的包给不同的市场.没想到现在这些市场都开辟iOS市场,于是需要打一堆的包给不同的市场,面对暂时给 ...
- Demon_Tank (坦克移动发射子弹)
using UnityEngine; using System.Collections; public class Tank : MonoBehaviour { //子弹预设体 public Game ...
- linux串口驱动分析
linux串口驱动分析 硬件资源及描写叙述 s3c2440A 通用异步接收器和发送器(UART)提供了三个独立的异步串行 I/O(SIO)port,每一个port都能够在中断模式或 DMA 模式下操作 ...
- Ignatius and the Princess III --undo
Ignatius and the Princess III Time Limit : 2000/1000ms (Java/Other) Memory Limit : 65536/32768K (J ...
- C#Transfrom
代码如下: private void btnConvertType_Click(object sender, EventArgs e) { if (rdo_btn_ConvertObject.Chec ...