key points:

1. group by key and sort by using distribute by and sort by.

2. get top k elements by a UDF (user defined function) RANK

---------Here is the source code.--------------

package com.example.hive.udf;
import org.apache.hadoop.hive.ql.exec.UDF;

public final class Rank extends UDF{
private int counter;
private String last_key;
public int evaluate(final String key){
if ( !key.equalsIgnoreCase(this.last_key) ) {
this.counter = 0;
this.last_key = key;
}
return this.counter++;
}
}

The details are as the following.

---original data, table region(region_nbr, region_id)---

100 10
200 12
300 33
100 4
100 8
200 20
300 31
300 3
400 4
200 2

-----what I need is as below-----

100 10
100 8
200 20
200 12
300 33
300 31
400 4

---

1. step1. compile java with a shell, compile_udf.sh.

#!/bin/bash

if [ $# != 1 ]; then
echo "Usage: $0 <java file>"
exit 1
fi

CNAME=${1%.java}
JARNAME=$CNAME.jar
JARDIR=/tmp/hive_jars/$CNAME
HIVE_HOME2="/usr/local/hive-0.9.0"
CLASSPATH=$(ls $HIVE_HOME2/lib/hive-serde-*.jar):$(ls $HIVE_HOME2/lib/hive-exec-*.jar):$(ls /home/oicq/hadoop/hadoop-1.0.2/hadoop-core-*.jar)

function tell {
echo
echo "$1 successfully compiled. In Hive run:"
#echo "$> add jar $JARNAME;"
#echo "$> create temporary function $CNAME as 'com.example.hive.udf.$CNAME';"
echo
}

mkdir -p $JARDIR
javac -classpath $CLASSPATH -d $JARDIR/ $1 && jar -cf $JARNAME -C $JARDIR/ . && tell $1

step 2. run hive

hive -e "add jar /data/ginobili/UDF/Rank.jar; create temporary function Rank as 'com.example.hive.udf.Rank'; select region_nbr, region_id from ( select region_nbr, region_id, Rank(region_nbr) as rank from (select * from test_gino.region distribute by region_nbr sort by region_nbr, region_id desc)a )b where rank < 2"

REFERENCE

http://stackoverflow.com/questions/9390698/hive-getting-top-n-records-in-group-by-query

http://findingscience.com/hadoop/hive/2011/01/07/compiling-user-defined-functions-for-hive-on-hadoop.html

http://stackoverflow.com/questions/11405446/find-top-10-latest-record-for-each-buyer-id-for-yesterdays-date

get top k elements of the same key in hive的更多相关文章

  1. Leetcode 347. Top K Frequent Elements

    Given a non-empty array of integers, return the k most frequent elements. For example,Given [1,1,1,2 ...

  2. 347. Top K Frequent Elements

    Given a non-empty array of integers, return the k most frequent elements. For example,Given [1,1,1,2 ...

  3. [Swift]LeetCode347. 前K个高频元素 | Top K Frequent Elements

    Given a non-empty array of integers, return the k most frequent elements. Example 1: Input: nums = [ ...

  4. C#版(打败99.28%的提交) - Leetcode 347. Top K Frequent Elements - 题解

    版权声明: 本文为博主Bravo Yeung(知乎UserName同名)的原创文章,欲转载请先私信获博主允许,转载时请附上网址 http://blog.csdn.net/lzuacm. C#版 - L ...

  5. 347. Top K Frequent Elements (sort map)

    Given a non-empty array of integers, return the k most frequent elements. Example 1: Input: nums = [ ...

  6. [LeetCode] Top K Frequent Elements 前K个高频元素

    Given a non-empty array of integers, return the k most frequent elements. For example,Given [1,1,1,2 ...

  7. [leetcode]347. Top K Frequent Elements K个最常见元素

    Given a non-empty array of integers, return the k most frequent elements. Example 1: Input: nums = [ ...

  8. Top K Frequent Elements 前K个高频元素

    Top K Frequent Elements 347. Top K Frequent Elements [LeetCode] Top K Frequent Elements 前K个高频元素

  9. [LeetCode] 347. Top K Frequent Elements 前K个高频元素

    Given a non-empty array of integers, return the k most frequent elements. Example 1: Input: nums = [ ...

随机推荐

  1. App项目升级Xcode7&iOS9(续) - This bundle is invalid. The bundle identifier contains disallowed characters

    金田 iOS 9发布已经有2月有余,现在Xcode已经有升级到Xcode7.1,开发环境安装等一系列相关的流程,以及Xcode 7 & iOS 9升级相关的一些部分,在这里就不再多加赘述(详见 ...

  2. Create XHR

    var createXHR = function() { var xhr, last_e; var PROGIDS = [ "Msxml2.XMLHTTP.6.0", //&quo ...

  3. 基于HTML5 Canvas的线性区域图表教程

    之前我们看到过很多用jQuery实现的网页图表,有些还是比较实用的.今天我们来介绍一款基于HTML5 Canvas的线性区域图表应用,这个图表应用允许你使用多组数据来同时展示,并且将数据结果以线性图的 ...

  4. HDU_1010——小狗走迷宫DFS

    Problem Description The doggie found a bone in an ancient maze, which fascinated him a lot. However, ...

  5. Codeforce 222 div1

    A 假设只有一个连通块,任选一个点入队,按bfs/dfs序删除即可. trick: 要考虑有多个连通块的情况,不一定无解. #define rep(i,n) for(int i=0 ; i<(n ...

  6. 使用java创建kafka的生产者和消费者

    创建一个Kafka的主题,连接到zk集群,副本因子3,分区3,主题名是test111        [root@h5 kafka]# bin/kafka-topics.sh --create --zo ...

  7. java笔记5之逻辑运算符以及&&与&的区别

    1 &逻辑与:有false则false.         |逻辑或:有true则true.         ^逻辑异或:相同为false,不同为true.            举例:情侣关系 ...

  8. js 获取10个不重复随机数

    var arr1 = new Array(); var arr2 = new Array(); for(var i = 0; i<20; i++){ arr1.push(i); } for(va ...

  9. 为什么要使用Nginx?

    这里做了些基准测试表明nginx打败了其它的轻量级的web服务器和代理服务器,同样也赢了相对不是那么轻量级的产品. 有人说这些基准测试是不准确的,因为在这样那样的环境下,做的比较不一致.我倾向同意基准 ...

  10. 远程连接mysql,mysql如何开启远程连接

    很多时候,mysql只需要开本地连接,也就是本机(服务器本身)连接就可以,默认也是这样,默认也不支持远程连接 但有的时候,我们需要将mysql独立出一台主机或数据库,放到另一台机器的时候,这时,就需要 ...