hive:数据库“行专列”操作---使用collect_set/collect_list/collect_all & row_number()over(partition by 分组字段 [order by 排序字段])
方案一:请参考《数据库“行专列”操作---使用row_number()over(partition by 分组字段 [order by 排序字段])》,该方案是sqlserver,oracle,mysql,hive均适用的。
在hive中的方案分为以下两种方案:
创建测试表,并插入测试数据:
--hive 测试 行转列 collect_set collect_list
create table tommyduan_test(
gridid string,
height int,
cell string,
mrcount int,
weakmrcount int
); insert into tommyduan_test values('g1',1,'cell1',12,3);
insert into tommyduan_test values('g1',1,'cell2',22,3);
insert into tommyduan_test values('g1',1,'cell3',23,3);
insert into tommyduan_test values('g1',1,'cell4',1,3);
insert into tommyduan_test values('g1',1,'cell5',3,3);
insert into tommyduan_test values('g1',1,'cell6',4,3);
insert into tommyduan_test values('g1',1,'cell19',21,3); insert into tommyduan_test values('g2',1,'cell4',1,3);
insert into tommyduan_test values('g2',1,'cell5',3,3);
insert into tommyduan_test values('g2',1,'cell6',4,3);
insert into tommyduan_test values('g2',1,'cell19',21,3);
方案二:使用collect_set方案
注意:collect_set是一个set集合,不允许重复的记录插入
select gridid,height,collect_list(cell) cellArray,collect_list(mrcount) mrcountArray,collect_list(weakmrcount) weakmrcountArray
from (
select gridid,height,cell,mrcount,weakmrcount,row_number()over(partition by gridid,height order by mrcount desc) rn
from tommyduan_test
group by gridid,height,cell,mrcount,weakmrcount
) t10
where rn<4
group by gridid,height;
+---------+---------+-----------------------------+---------------+-------------------+--+
| gridid | height | cellarray | mrcountarray | weakmrcountarray |
+---------+---------+-----------------------------+---------------+-------------------+--+
| g1 | 1 | ["cell3","cell2","cell19"] | [23,22,21] | [3,3,3] |
| g2 | 1 | ["cell19","cell6","cell5"] | [21,4,3] | [3,3,3] |
+---------+---------+-----------------------------+---------------+-------------------+--+ select gridid,height,
(case when size(cellArray)>0 then cellArray[] else '-9999' end) as cell1,
(case when size(cellArray)>0 then mrcountArray[] else '-9999' end) as cell1_mrcount,
(case when size(cellArray)>0 then weakmrcountArray[] else '-9999' end) as cell1_weakmrcount,
(case when size(cellArray)>1 then cellArray[] else '-9999' end) as cell2,
(case when size(cellArray)>1 then mrcountArray[] else '-9999' end) as cell2_mrcount,
(case when size(cellArray)>1 then weakmrcountArray[] else '-9999' end) as cell2_weakmrcount,
(case when size(cellArray)>2 then cellArray[] else '-9999' end) as cell3,
(case when size(cellArray)>2 then mrcountArray[] else '-9999' end) as cell3_mrcount,
(case when size(cellArray)>2 then weakmrcountArray[] else '-9999' end) as cell3_weakmrcount
from
(
select gridid,height,collect_list(cell) cellArray,collect_list(mrcount) mrcountArray,collect_list(weakmrcount) weakmrcountArray
from (
select gridid,height,cell,mrcount,weakmrcount,row_number()over(partition by gridid,height order by mrcount desc) rn
from tommyduan_test
group by gridid,height,cell,mrcount,weakmrcount
) t10
where rn<4
group by gridid,height
) t12;
+---------+---------+---------+----------------+--------------------+--------+----------------+--------------------+---------+----------------+--------------------+--+
| gridid | height | cell1 | cell1_mrcount | cell1_weakmrcount | cell2 | cell2_mrcount | cell2_weakmrcount | cell3 | cell3_mrcount | cell3_weakmrcount |
+---------+---------+---------+----------------+--------------------+--------+----------------+--------------------+---------+----------------+--------------------+--+
| g1 | 1 | cell3 | 23 | 3 | cell2 | 22 | 3 | cell19 | 21 | 3 |
| g2 | 1 | cell19 | 21 | 3 | cell6 | 4 | 3 | cell5 | 3 | 3 |
+---------+---------+---------+----------------+--------------------+--------+----------------+--------------------+---------+----------------+--------------------+--+
方案三:使用collect_list/collect_all方案
注意:collect_set是一个set集合,不允许重复的记录插入
select gridid,height,collect_set(cell),collect_set(mrcount),collect_set(weakmrcount)
from (select * from tommyduan_test order by gridid,height,mrcount desc) t10
group by gridid,height;
+---------+---------+-------------------------------------------------------------+----------------------+------+--+
| gridid | height | _c2 | _c3 | _c4 |
+---------+---------+-------------------------------------------------------------+----------------------+------+--+
| g1 | 1 | ["cell3","cell2","cell19","cell1","cell6","cell5","cell4"] | [23,22,21,12,4,3,1] | [] |
| g2 | 1 | ["cell19","cell6","cell5","cell4"] | [21,4,3,1] | [] |
+---------+---------+-------------------------------------------------------------+----------------------+------+--+ select gridid,height,collect_set(cell) cellArray,collect_set(mrcount) mrcountArray,collect_set(weakmrcount) weakmrcountArray
from (
select gridid,height,cell,mrcount,weakmrcount,row_number()over(partition by gridid,height order by mrcount desc) rn
from tommyduan_test
group by gridid,height,cell,mrcount,weakmrcount
) t10
where rn<4
group by gridid,height;
+---------+---------+-----------------------------+---------------+-------------------+--+
| gridid | height | cellarray | mrcountarray | weakmrcountarray |
+---------+---------+-----------------------------+---------------+-------------------+--+
| g1 | 1 | ["cell3","cell2","cell19"] | [23,22,21] | [] |
| g2 | 1 | ["cell19","cell6","cell5"] | [21,4,3] | [] |
+---------+---------+-----------------------------+---------------+-------------------+--+ select gridid,height,collect_set(concat_ws(',',cell,cast(mrcount as string), cast(weakmrcount as string))) as cellArray
from (
select gridid,height,cell,mrcount,weakmrcount,row_number()over(partition by gridid,height order by mrcount desc) rn
from tommyduan_test
group by gridid,height,cell,mrcount,weakmrcount
) t10
where rn<4
group by gridid,height
+---------+---------+--------------------------------------------+--+
| gridid | height | cellarray |
+---------+---------+--------------------------------------------+--+
| g1 | 1 | ["cell3,23,3","cell2,22,3","cell19,21,3"] |
| g2 | 1 | ["cell19,21,3","cell6,4,3","cell5,3,3"] |
+---------+---------+--------------------------------------------+--+ select gridid,height,
(case when size(cellArray)>0 then split(cellArray[],'_')[] else '-9999' end) as cell1,
(case when size(cellArray)>0 then split(cellArray[],'_')[] else '-9999' end) as cell1_mrcount,
(case when size(cellArray)>0 then split(cellArray[],'_')[] else '-9999' end) as cell1_weakmrcount,
(case when size(cellArray)>1 then split(cellArray[],'_')[] else '-9999' end) as cell2,
(case when size(cellArray)>1 then split(cellArray[],'_')[] else '-9999' end) as cell2_mrcount,
(case when size(cellArray)>1 then split(cellArray[],'_')[] else '-9999' end) as cell2_weakmrcount,
(case when size(cellArray)>2 then split(cellArray[],'_')[] else '-9999' end) as cell3,
(case when size(cellArray)>2 then split(cellArray[],'_')[] else '-9999' end) as cell3_mrcount,
(case when size(cellArray)>2 then split(cellArray[],'_')[] else '-9999' end) as cell3_weakmrcount
from
(
select gridid,height,collect_set(concat_ws('_',cell,cast(mrcount as string), cast(weakmrcount as string))) as cellArray
from (
select gridid,height,cell,mrcount,weakmrcount,row_number()over(partition by gridid,height order by mrcount desc) rn
from tommyduan_test
group by gridid,height,cell,mrcount,weakmrcount
) t10
where rn<4
group by gridid,height
) t12;
+---------+---------+---------+----------------+--------------------+--------+----------------+--------------------+---------+----------------+--------------------+--+
| gridid | height | cell1 | cell1_mrcount | cell1_weakmrcount | cell2 | cell2_mrcount | cell2_weakmrcount | cell3 | cell3_mrcount | cell3_weakmrcount |
+---------+---------+---------+----------------+--------------------+--------+----------------+--------------------+---------+----------------+--------------------+--+
| g1 | 1 | cell3 | 23 | 3 | cell2 | 22 | 3 | cell19 | 21 | 3 |
| g2 | 1 | cell19 | 21 | 3 | cell6 | 4 | 3 | cell5 | 3 | 3 |
+---------+---------+---------+----------------+--------------------+--------+----------------+--------------------+---------+----------------+--------------------+--+
hive:数据库“行专列”操作---使用collect_set/collect_list/collect_all & row_number()over(partition by 分组字段 [order by 排序字段])的更多相关文章
- 数据库“行专列”操作---使用row_number()over(partition by 分组字段 [order by 排序字段])
测试样例: create table test(rsrp string,rsrq string,tkey string,distan string); '); '); '); '); select * ...
- dos命令行连接操作ORACLE数据库
C:\Adminstrator> sqlplus "/as sysdba" 查看是否连接到数据库 SQL> select status from v$instance; ...
- hive函数应用之操作json
1.创建表 createtable.sql中存放的创建表语句如下 create external table adt.jsontest ( appKey string comment "AP ...
- Python(数据库之表操作)
一.修改表 1. 修改表名 ALTER TABLE 表名 RENAME 新表名; #mysql中库名.表名对大小写不敏感 2. 增加字段 ALTER TABLE 表名ADD 字段名 数据类型 [完整性 ...
- SQL Server数据库--》top关键字,order by排序,distinct去除重复记录,sql聚合函数,模糊查询,通配符,空值处理。。。。
top关键字:写在select后面 字段的前面 比如你要显示查询的前5条记录,如下所示: select top 5 * from Student 一般情况下,top是和order by连用的 orde ...
- hive 分组排序函数 row_number() over(partition by " " order by " "desc
语法:row_number() over (partition by 字段a order by 计算项b desc ) rank --这里rank是别名 partition by:类似hive的建表, ...
- Hive数据库操作
Hive数据结构 除了基本数据类型(与java类似),hive支持三种集合类型 Hive集合类型数据 array.map.structs hive (default)> create table ...
- 大数据开发实战:离线大数据处理的主要技术--Hive,概念,SQL,Hive数据库
1.Hive出现背景 Hive是Facebook开发并贡献给Hadoop开源社区的.它是建立在Hadoop体系架构上的一层SQL抽象,使得数据相关人员使用他们最为熟悉的SQL语言就可以进行海量数据的处 ...
- HIVE的sql语句操作
Hive 是基于Hadoop 构建的一套数据仓库分析系统,它提供了丰富的SQL查询方式来分析存储在hadoop 分布式文件系统中的数据,可以将结构 化的数据文件映射为一张数据库表,并提供完整的SQL查 ...
随机推荐
- 智能合约语言 Solidity 教程系列7 - 以太单位及时间单位
这是Solidity教程系列文章第7篇介绍以太单位及时间单位,系列带你全面深入理解Solidity语言. 写在前面 Solidity 是以太坊智能合约编程语言,阅读本文前,你应该对以太坊.智能合约有所 ...
- PHP MVC框架核心类
PHP MVC框架核心类 现在我们举几个核心框架的例子演示:在framework/core下建立一个Framework.class.php的文件.写入以下代码: // framework/core/F ...
- 给我一台全新的服务器,使用nginx+gunicorn+supervisor部署django
0.准备工作 在一台全新的服务器中新建用户以及用户的工作目录,之后的操作都以这个用户的身份进行,而不是直接用root. 举个栗子: 在服务器下新建用户rinka并赋予sudo权限 1) root登陆, ...
- SpringBoot快速开发REST服务最佳实践
一.为什么选择SpringBoot Spring Boot是由Pivotal团队提供的全新框架,被很多业内资深人士认为是可能改变游戏规则的新项目.早期我们搭建一个SSH或者Spring Web应用,需 ...
- JVM学习六:JVM之类加载器之双亲委派机制
前面我们知道类加载有系统自带的3种加载器,也有自定义的加载器,那么这些加载器之间的关系是什么,已经在加载类的时候,谁去加载呢?这节,我们将进行讲解. 一.双亲委派机制 JVM的ClassLoader采 ...
- C++迭代器的使用和操作总结
迭代器是一种检查容器内元素并遍历元素的数据类型.C++更趋向于使用迭代器而不是下标操作,因为标准库为每一种标准容器(如vector)定义了一种迭代器类型,而只用少数容器(如vector)支持下标操作访 ...
- (转)关于 awk 的 pattern(模式)
本文转自chinaunix http://bbs.chinaunix.net/thread-4246512-1-1.html 作者reyleon 我们知道, awk程序由一系列 pattern 以 ...
- SQLite学习手册(数据表和视图)
如何列出SQLite数据库中的所有表 SQLite数据库中的信息存在于一个内置表sqlite_master中,在查询器中可以用 select * from sqlite_master 来查看,如果只要 ...
- 微信公众平台开发,图文回复、access_token生成调用、以及微信SDK的实现(2)
上一节课,我给大家分享了微信API接入以及事件推送的回复,这是微信开发的第二节课,重点给说一说单图文回复,多图文回复,access_token,微信SDK. 公众号消息回复很多种形式,常见的形式有,文 ...
- ibatis.net 入门demo 实现基本增删改查
1.项目架构体系 DAO(数据访问层) Domain(实体层) Text(表示层) 2.比较重要的是需要添加两个dll的引用,以及两个配置文件和一个XML文件 两个 IbatisNet.Com ...