key points:

1. group by key and sort by using distribute by and sort by.

2. get top k elements by a UDF (user defined function) RANK

---------Here is the source code.--------------

package com.example.hive.udf;
import org.apache.hadoop.hive.ql.exec.UDF;

public final class Rank extends UDF{
private int counter;
private String last_key;
public int evaluate(final String key){
if ( !key.equalsIgnoreCase(this.last_key) ) {
this.counter = 0;
this.last_key = key;
}
return this.counter++;
}
}

The details are as the following.

---original data, table region(region_nbr, region_id)---

100 10
200 12
300 33
100 4
100 8
200 20
300 31
300 3
400 4
200 2

-----what I need is as below-----

100 10
100 8
200 20
200 12
300 33
300 31
400 4

---

1. step1. compile java with a shell, compile_udf.sh.

#!/bin/bash

if [ $# != 1 ]; then
echo "Usage: $0 <java file>"
exit 1
fi

CNAME=${1%.java}
JARNAME=$CNAME.jar
JARDIR=/tmp/hive_jars/$CNAME
HIVE_HOME2="/usr/local/hive-0.9.0"
CLASSPATH=$(ls $HIVE_HOME2/lib/hive-serde-*.jar):$(ls $HIVE_HOME2/lib/hive-exec-*.jar):$(ls /home/oicq/hadoop/hadoop-1.0.2/hadoop-core-*.jar)

function tell {
echo
echo "$1 successfully compiled. In Hive run:"
#echo "$> add jar $JARNAME;"
#echo "$> create temporary function $CNAME as 'com.example.hive.udf.$CNAME';"
echo
}

mkdir -p $JARDIR
javac -classpath $CLASSPATH -d $JARDIR/ $1 && jar -cf $JARNAME -C $JARDIR/ . && tell $1

step 2. run hive

hive -e "add jar /data/ginobili/UDF/Rank.jar; create temporary function Rank as 'com.example.hive.udf.Rank'; select region_nbr, region_id from ( select region_nbr, region_id, Rank(region_nbr) as rank from (select * from test_gino.region distribute by region_nbr sort by region_nbr, region_id desc)a )b where rank < 2"

REFERENCE

http://stackoverflow.com/questions/9390698/hive-getting-top-n-records-in-group-by-query

http://findingscience.com/hadoop/hive/2011/01/07/compiling-user-defined-functions-for-hive-on-hadoop.html

http://stackoverflow.com/questions/11405446/find-top-10-latest-record-for-each-buyer-id-for-yesterdays-date

get top k elements of the same key in hive的更多相关文章

  1. Leetcode 347. Top K Frequent Elements

    Given a non-empty array of integers, return the k most frequent elements. For example,Given [1,1,1,2 ...

  2. 347. Top K Frequent Elements

    Given a non-empty array of integers, return the k most frequent elements. For example,Given [1,1,1,2 ...

  3. [Swift]LeetCode347. 前K个高频元素 | Top K Frequent Elements

    Given a non-empty array of integers, return the k most frequent elements. Example 1: Input: nums = [ ...

  4. C#版(打败99.28%的提交) - Leetcode 347. Top K Frequent Elements - 题解

    版权声明: 本文为博主Bravo Yeung(知乎UserName同名)的原创文章,欲转载请先私信获博主允许,转载时请附上网址 http://blog.csdn.net/lzuacm. C#版 - L ...

  5. 347. Top K Frequent Elements (sort map)

    Given a non-empty array of integers, return the k most frequent elements. Example 1: Input: nums = [ ...

  6. [LeetCode] Top K Frequent Elements 前K个高频元素

    Given a non-empty array of integers, return the k most frequent elements. For example,Given [1,1,1,2 ...

  7. [leetcode]347. Top K Frequent Elements K个最常见元素

    Given a non-empty array of integers, return the k most frequent elements. Example 1: Input: nums = [ ...

  8. Top K Frequent Elements 前K个高频元素

    Top K Frequent Elements 347. Top K Frequent Elements [LeetCode] Top K Frequent Elements 前K个高频元素

  9. [LeetCode] 347. Top K Frequent Elements 前K个高频元素

    Given a non-empty array of integers, return the k most frequent elements. Example 1: Input: nums = [ ...

随机推荐

  1. oracle11g rac asm 实例内存修改

    ASM实例内存修改 memory_max_target(它为静态参数,修改完成后需要重启实例) memory_target(它为动态参数,不需要重启实例) SQL> select name,is ...

  2. PERL DBI 自动重连问题

    [root@wx03 mojo]# cat relink.pl use Mojolicious::Lite; use JSON qw/encode_json decode_json/; use Enc ...

  3. Linux企业级项目实践之网络爬虫(29)——遵守robots.txt

    Robots协议(也称为爬虫协议.机器人协议等)的全称是"网络爬虫排除标准"(Robots Exclusion Protocol),网站通过Robots协议告诉搜索引擎哪些页面可以 ...

  4. uva 10859 - Placing Lampposts dp

    题意: 有n个节点,m条边,无向无环图,求最少点覆盖,并且在同样点数下保证被覆盖两次的变最多 分析: 1.统一化目标,本题需要优化目标有两个,一个最小灯数a,一个最大双覆盖边数b,一大一小,应该归一成 ...

  5. java中的“包”与C#中的“命名空间

    原文地址:http://www.cnblogs.com/lidabo/archive/2012/12/15/2819865.html Package vs. Namespace 我们知道,重用性(re ...

  6. 关于springMVC框架访问web-inf下的jsp文件

    问题:springMVC框架访问web-inf下的jsp文件,具体如下: 使用springMVC,一般都会使用springMVC的视图解析器,大概会这样配置 <property name=&qu ...

  7. MySQL事务处理2

    MySQL5.X 都已经发布好久了,但是还有很多人认为MySQL是不支持事务处理的,这不得不怪他们是孤陋寡闻的,其实,只要你的MySQL版本支持BDB或 InnoDB表类型,那么你的MySQL就具有事 ...

  8. 解决Xcode8 输出一对字符串问题

    在Product->Scheme->Edit Scheme->Run->Environment Variables下添加键:OS_ACTIVITY_MODE, 值:Disabl ...

  9. textarea固定大小,不可拖动

    写前端,经常很多小东西容易忽略忘记,今天写页面碰到设定一个输入框大小,死活记不起怎么固定,故找了一下度娘,其实添加一个css属性就好了: resize: none; 随笔记一下!

  10. centos7上使用yum安装mysql

    centos yum是没有mysql的,集成的是新的Mariadb,怎么用yum的方式在centos7上安装mysql呢? 1. 下载mysql的repo源 wget http://repo.mysq ...