hive:数据库“行专列”操作---使用collect_set/collect_list/collect_all & row_number()over(partition by 分组字段 [order by 排序字段])
方案一:请参考《数据库“行专列”操作---使用row_number()over(partition by 分组字段 [order by 排序字段])》,该方案是sqlserver,oracle,mysql,hive均适用的。
在hive中的方案分为以下两种方案:
创建测试表,并插入测试数据:
--hive 测试 行转列 collect_set collect_list
create table tommyduan_test(
gridid string,
height int,
cell string,
mrcount int,
weakmrcount int
); insert into tommyduan_test values('g1',1,'cell1',12,3);
insert into tommyduan_test values('g1',1,'cell2',22,3);
insert into tommyduan_test values('g1',1,'cell3',23,3);
insert into tommyduan_test values('g1',1,'cell4',1,3);
insert into tommyduan_test values('g1',1,'cell5',3,3);
insert into tommyduan_test values('g1',1,'cell6',4,3);
insert into tommyduan_test values('g1',1,'cell19',21,3); insert into tommyduan_test values('g2',1,'cell4',1,3);
insert into tommyduan_test values('g2',1,'cell5',3,3);
insert into tommyduan_test values('g2',1,'cell6',4,3);
insert into tommyduan_test values('g2',1,'cell19',21,3);
方案二:使用collect_set方案
注意:collect_set是一个set集合,不允许重复的记录插入
select gridid,height,collect_list(cell) cellArray,collect_list(mrcount) mrcountArray,collect_list(weakmrcount) weakmrcountArray
from (
select gridid,height,cell,mrcount,weakmrcount,row_number()over(partition by gridid,height order by mrcount desc) rn
from tommyduan_test
group by gridid,height,cell,mrcount,weakmrcount
) t10
where rn<4
group by gridid,height;
+---------+---------+-----------------------------+---------------+-------------------+--+
| gridid | height | cellarray | mrcountarray | weakmrcountarray |
+---------+---------+-----------------------------+---------------+-------------------+--+
| g1 | 1 | ["cell3","cell2","cell19"] | [23,22,21] | [3,3,3] |
| g2 | 1 | ["cell19","cell6","cell5"] | [21,4,3] | [3,3,3] |
+---------+---------+-----------------------------+---------------+-------------------+--+ select gridid,height,
(case when size(cellArray)>0 then cellArray[] else '-9999' end) as cell1,
(case when size(cellArray)>0 then mrcountArray[] else '-9999' end) as cell1_mrcount,
(case when size(cellArray)>0 then weakmrcountArray[] else '-9999' end) as cell1_weakmrcount,
(case when size(cellArray)>1 then cellArray[] else '-9999' end) as cell2,
(case when size(cellArray)>1 then mrcountArray[] else '-9999' end) as cell2_mrcount,
(case when size(cellArray)>1 then weakmrcountArray[] else '-9999' end) as cell2_weakmrcount,
(case when size(cellArray)>2 then cellArray[] else '-9999' end) as cell3,
(case when size(cellArray)>2 then mrcountArray[] else '-9999' end) as cell3_mrcount,
(case when size(cellArray)>2 then weakmrcountArray[] else '-9999' end) as cell3_weakmrcount
from
(
select gridid,height,collect_list(cell) cellArray,collect_list(mrcount) mrcountArray,collect_list(weakmrcount) weakmrcountArray
from (
select gridid,height,cell,mrcount,weakmrcount,row_number()over(partition by gridid,height order by mrcount desc) rn
from tommyduan_test
group by gridid,height,cell,mrcount,weakmrcount
) t10
where rn<4
group by gridid,height
) t12;
+---------+---------+---------+----------------+--------------------+--------+----------------+--------------------+---------+----------------+--------------------+--+
| gridid | height | cell1 | cell1_mrcount | cell1_weakmrcount | cell2 | cell2_mrcount | cell2_weakmrcount | cell3 | cell3_mrcount | cell3_weakmrcount |
+---------+---------+---------+----------------+--------------------+--------+----------------+--------------------+---------+----------------+--------------------+--+
| g1 | 1 | cell3 | 23 | 3 | cell2 | 22 | 3 | cell19 | 21 | 3 |
| g2 | 1 | cell19 | 21 | 3 | cell6 | 4 | 3 | cell5 | 3 | 3 |
+---------+---------+---------+----------------+--------------------+--------+----------------+--------------------+---------+----------------+--------------------+--+
方案三:使用collect_list/collect_all方案
注意:collect_set是一个set集合,不允许重复的记录插入
select gridid,height,collect_set(cell),collect_set(mrcount),collect_set(weakmrcount)
from (select * from tommyduan_test order by gridid,height,mrcount desc) t10
group by gridid,height;
+---------+---------+-------------------------------------------------------------+----------------------+------+--+
| gridid | height | _c2 | _c3 | _c4 |
+---------+---------+-------------------------------------------------------------+----------------------+------+--+
| g1 | 1 | ["cell3","cell2","cell19","cell1","cell6","cell5","cell4"] | [23,22,21,12,4,3,1] | [] |
| g2 | 1 | ["cell19","cell6","cell5","cell4"] | [21,4,3,1] | [] |
+---------+---------+-------------------------------------------------------------+----------------------+------+--+ select gridid,height,collect_set(cell) cellArray,collect_set(mrcount) mrcountArray,collect_set(weakmrcount) weakmrcountArray
from (
select gridid,height,cell,mrcount,weakmrcount,row_number()over(partition by gridid,height order by mrcount desc) rn
from tommyduan_test
group by gridid,height,cell,mrcount,weakmrcount
) t10
where rn<4
group by gridid,height;
+---------+---------+-----------------------------+---------------+-------------------+--+
| gridid | height | cellarray | mrcountarray | weakmrcountarray |
+---------+---------+-----------------------------+---------------+-------------------+--+
| g1 | 1 | ["cell3","cell2","cell19"] | [23,22,21] | [] |
| g2 | 1 | ["cell19","cell6","cell5"] | [21,4,3] | [] |
+---------+---------+-----------------------------+---------------+-------------------+--+ select gridid,height,collect_set(concat_ws(',',cell,cast(mrcount as string), cast(weakmrcount as string))) as cellArray
from (
select gridid,height,cell,mrcount,weakmrcount,row_number()over(partition by gridid,height order by mrcount desc) rn
from tommyduan_test
group by gridid,height,cell,mrcount,weakmrcount
) t10
where rn<4
group by gridid,height
+---------+---------+--------------------------------------------+--+
| gridid | height | cellarray |
+---------+---------+--------------------------------------------+--+
| g1 | 1 | ["cell3,23,3","cell2,22,3","cell19,21,3"] |
| g2 | 1 | ["cell19,21,3","cell6,4,3","cell5,3,3"] |
+---------+---------+--------------------------------------------+--+ select gridid,height,
(case when size(cellArray)>0 then split(cellArray[],'_')[] else '-9999' end) as cell1,
(case when size(cellArray)>0 then split(cellArray[],'_')[] else '-9999' end) as cell1_mrcount,
(case when size(cellArray)>0 then split(cellArray[],'_')[] else '-9999' end) as cell1_weakmrcount,
(case when size(cellArray)>1 then split(cellArray[],'_')[] else '-9999' end) as cell2,
(case when size(cellArray)>1 then split(cellArray[],'_')[] else '-9999' end) as cell2_mrcount,
(case when size(cellArray)>1 then split(cellArray[],'_')[] else '-9999' end) as cell2_weakmrcount,
(case when size(cellArray)>2 then split(cellArray[],'_')[] else '-9999' end) as cell3,
(case when size(cellArray)>2 then split(cellArray[],'_')[] else '-9999' end) as cell3_mrcount,
(case when size(cellArray)>2 then split(cellArray[],'_')[] else '-9999' end) as cell3_weakmrcount
from
(
select gridid,height,collect_set(concat_ws('_',cell,cast(mrcount as string), cast(weakmrcount as string))) as cellArray
from (
select gridid,height,cell,mrcount,weakmrcount,row_number()over(partition by gridid,height order by mrcount desc) rn
from tommyduan_test
group by gridid,height,cell,mrcount,weakmrcount
) t10
where rn<4
group by gridid,height
) t12;
+---------+---------+---------+----------------+--------------------+--------+----------------+--------------------+---------+----------------+--------------------+--+
| gridid | height | cell1 | cell1_mrcount | cell1_weakmrcount | cell2 | cell2_mrcount | cell2_weakmrcount | cell3 | cell3_mrcount | cell3_weakmrcount |
+---------+---------+---------+----------------+--------------------+--------+----------------+--------------------+---------+----------------+--------------------+--+
| g1 | 1 | cell3 | 23 | 3 | cell2 | 22 | 3 | cell19 | 21 | 3 |
| g2 | 1 | cell19 | 21 | 3 | cell6 | 4 | 3 | cell5 | 3 | 3 |
+---------+---------+---------+----------------+--------------------+--------+----------------+--------------------+---------+----------------+--------------------+--+
hive:数据库“行专列”操作---使用collect_set/collect_list/collect_all & row_number()over(partition by 分组字段 [order by 排序字段])的更多相关文章
- 数据库“行专列”操作---使用row_number()over(partition by 分组字段 [order by 排序字段])
测试样例: create table test(rsrp string,rsrq string,tkey string,distan string); '); '); '); '); select * ...
- dos命令行连接操作ORACLE数据库
C:\Adminstrator> sqlplus "/as sysdba" 查看是否连接到数据库 SQL> select status from v$instance; ...
- hive函数应用之操作json
1.创建表 createtable.sql中存放的创建表语句如下 create external table adt.jsontest ( appKey string comment "AP ...
- Python(数据库之表操作)
一.修改表 1. 修改表名 ALTER TABLE 表名 RENAME 新表名; #mysql中库名.表名对大小写不敏感 2. 增加字段 ALTER TABLE 表名ADD 字段名 数据类型 [完整性 ...
- SQL Server数据库--》top关键字,order by排序,distinct去除重复记录,sql聚合函数,模糊查询,通配符,空值处理。。。。
top关键字:写在select后面 字段的前面 比如你要显示查询的前5条记录,如下所示: select top 5 * from Student 一般情况下,top是和order by连用的 orde ...
- hive 分组排序函数 row_number() over(partition by " " order by " "desc
语法:row_number() over (partition by 字段a order by 计算项b desc ) rank --这里rank是别名 partition by:类似hive的建表, ...
- Hive数据库操作
Hive数据结构 除了基本数据类型(与java类似),hive支持三种集合类型 Hive集合类型数据 array.map.structs hive (default)> create table ...
- 大数据开发实战:离线大数据处理的主要技术--Hive,概念,SQL,Hive数据库
1.Hive出现背景 Hive是Facebook开发并贡献给Hadoop开源社区的.它是建立在Hadoop体系架构上的一层SQL抽象,使得数据相关人员使用他们最为熟悉的SQL语言就可以进行海量数据的处 ...
- HIVE的sql语句操作
Hive 是基于Hadoop 构建的一套数据仓库分析系统,它提供了丰富的SQL查询方式来分析存储在hadoop 分布式文件系统中的数据,可以将结构 化的数据文件映射为一张数据库表,并提供完整的SQL查 ...
随机推荐
- 【Java一看就懂】浅克隆和深克隆
一.何为克隆 在Java的体系中,数据类型分为基本数据类型和引用数据类型. 基本数据类型包括byte,short,int,long,float,double,boolean,char 8种,其克隆可通 ...
- 初探云服务器ECS(Linux系统)
PS: 购买的阿里云服务器(ECS,Linux系统),使用的弹性公网IP(EIP). 一.使用Xshell链接ECS 1.将公网IP填入主机即可 2.用户名一般为root,密码是自己设置的,填入即可. ...
- 有关Redis的Add和Set方法的比较
测试发现,如果key已经存在,则调用Redis.Add(key, value)则不能添加或修改此key的内容value: 这样的话,我们在添加一个key和value的时候,不得不判断一次Contain ...
- 笔记:Maven 反应堆
在一个多模块的Maven项目中,反应堆(Reactor)是指所有模块组成的一个构建结构,对于单个模块的项目,反应堆就是该模块本身,但对于多模块项目来说,反应堆就包含了各模块之间继承与依赖的关系,从而能 ...
- html-简单的简历表制作
代码如下: <!DOCTYOE html> <html> <head> <meta charset='UTF-8'/> <title>课后作 ...
- .net core2.0下Ioc容器Autofac使用
.net core发布有一段时间了,最近两个月开始使用.net core2.0开发项目,大大小小遇到了一些问题.准备写个系列介绍一下是如何解决这些问题以及对应技术.先从IOC容器Autofac开始该系 ...
- Oracle查询优化改写--------------------报表和数据仓库运算
一.行转列 二.列传行 '
- django中使用Model的update_or_create函数时报错
官方使用示例: obj, created = Person.objects.update_or_create( first_name='John', last_name='Lennon', defau ...
- JAVA连接SAP
1.首先需要在SAP事务码SE37中新建一个可以被远程调用的RFC 事务码:SE37 新建一个函数组:输入事务码SE37回车后,来到函数构建器屏幕,到上面一排菜单栏:转到 -> 函数组 -> ...
- hibernate框架学习错误集锦-org.springframework.dao.InvalidDataAccessApiUsageException: Write operations are not allowed in read-only mode (FlushMode.MANUAL)
最近学习ssh框架,总是出现这问题,后查证是没有开启事务. 如果采用注解方式,直接在业务层加@Transactional 并引入import org.springframework.transacti ...