hive:数据库“行专列”操作---使用collect_set/collect_list/collect_all & row_number()over(partition by 分组字段 [order by 排序字段])
方案一:请参考《数据库“行专列”操作---使用row_number()over(partition by 分组字段 [order by 排序字段])》,该方案是sqlserver,oracle,mysql,hive均适用的。
在hive中的方案分为以下两种方案:
创建测试表,并插入测试数据:
--hive 测试 行转列 collect_set collect_list
create table tommyduan_test(
gridid string,
height int,
cell string,
mrcount int,
weakmrcount int
); insert into tommyduan_test values('g1',1,'cell1',12,3);
insert into tommyduan_test values('g1',1,'cell2',22,3);
insert into tommyduan_test values('g1',1,'cell3',23,3);
insert into tommyduan_test values('g1',1,'cell4',1,3);
insert into tommyduan_test values('g1',1,'cell5',3,3);
insert into tommyduan_test values('g1',1,'cell6',4,3);
insert into tommyduan_test values('g1',1,'cell19',21,3); insert into tommyduan_test values('g2',1,'cell4',1,3);
insert into tommyduan_test values('g2',1,'cell5',3,3);
insert into tommyduan_test values('g2',1,'cell6',4,3);
insert into tommyduan_test values('g2',1,'cell19',21,3);
方案二:使用collect_set方案
注意:collect_set是一个set集合,不允许重复的记录插入
select gridid,height,collect_list(cell) cellArray,collect_list(mrcount) mrcountArray,collect_list(weakmrcount) weakmrcountArray
from (
select gridid,height,cell,mrcount,weakmrcount,row_number()over(partition by gridid,height order by mrcount desc) rn
from tommyduan_test
group by gridid,height,cell,mrcount,weakmrcount
) t10
where rn<4
group by gridid,height;
+---------+---------+-----------------------------+---------------+-------------------+--+
| gridid | height | cellarray | mrcountarray | weakmrcountarray |
+---------+---------+-----------------------------+---------------+-------------------+--+
| g1 | 1 | ["cell3","cell2","cell19"] | [23,22,21] | [3,3,3] |
| g2 | 1 | ["cell19","cell6","cell5"] | [21,4,3] | [3,3,3] |
+---------+---------+-----------------------------+---------------+-------------------+--+ select gridid,height,
(case when size(cellArray)>0 then cellArray[] else '-9999' end) as cell1,
(case when size(cellArray)>0 then mrcountArray[] else '-9999' end) as cell1_mrcount,
(case when size(cellArray)>0 then weakmrcountArray[] else '-9999' end) as cell1_weakmrcount,
(case when size(cellArray)>1 then cellArray[] else '-9999' end) as cell2,
(case when size(cellArray)>1 then mrcountArray[] else '-9999' end) as cell2_mrcount,
(case when size(cellArray)>1 then weakmrcountArray[] else '-9999' end) as cell2_weakmrcount,
(case when size(cellArray)>2 then cellArray[] else '-9999' end) as cell3,
(case when size(cellArray)>2 then mrcountArray[] else '-9999' end) as cell3_mrcount,
(case when size(cellArray)>2 then weakmrcountArray[] else '-9999' end) as cell3_weakmrcount
from
(
select gridid,height,collect_list(cell) cellArray,collect_list(mrcount) mrcountArray,collect_list(weakmrcount) weakmrcountArray
from (
select gridid,height,cell,mrcount,weakmrcount,row_number()over(partition by gridid,height order by mrcount desc) rn
from tommyduan_test
group by gridid,height,cell,mrcount,weakmrcount
) t10
where rn<4
group by gridid,height
) t12;
+---------+---------+---------+----------------+--------------------+--------+----------------+--------------------+---------+----------------+--------------------+--+
| gridid | height | cell1 | cell1_mrcount | cell1_weakmrcount | cell2 | cell2_mrcount | cell2_weakmrcount | cell3 | cell3_mrcount | cell3_weakmrcount |
+---------+---------+---------+----------------+--------------------+--------+----------------+--------------------+---------+----------------+--------------------+--+
| g1 | 1 | cell3 | 23 | 3 | cell2 | 22 | 3 | cell19 | 21 | 3 |
| g2 | 1 | cell19 | 21 | 3 | cell6 | 4 | 3 | cell5 | 3 | 3 |
+---------+---------+---------+----------------+--------------------+--------+----------------+--------------------+---------+----------------+--------------------+--+
方案三:使用collect_list/collect_all方案
注意:collect_set是一个set集合,不允许重复的记录插入
select gridid,height,collect_set(cell),collect_set(mrcount),collect_set(weakmrcount)
from (select * from tommyduan_test order by gridid,height,mrcount desc) t10
group by gridid,height;
+---------+---------+-------------------------------------------------------------+----------------------+------+--+
| gridid | height | _c2 | _c3 | _c4 |
+---------+---------+-------------------------------------------------------------+----------------------+------+--+
| g1 | 1 | ["cell3","cell2","cell19","cell1","cell6","cell5","cell4"] | [23,22,21,12,4,3,1] | [] |
| g2 | 1 | ["cell19","cell6","cell5","cell4"] | [21,4,3,1] | [] |
+---------+---------+-------------------------------------------------------------+----------------------+------+--+ select gridid,height,collect_set(cell) cellArray,collect_set(mrcount) mrcountArray,collect_set(weakmrcount) weakmrcountArray
from (
select gridid,height,cell,mrcount,weakmrcount,row_number()over(partition by gridid,height order by mrcount desc) rn
from tommyduan_test
group by gridid,height,cell,mrcount,weakmrcount
) t10
where rn<4
group by gridid,height;
+---------+---------+-----------------------------+---------------+-------------------+--+
| gridid | height | cellarray | mrcountarray | weakmrcountarray |
+---------+---------+-----------------------------+---------------+-------------------+--+
| g1 | 1 | ["cell3","cell2","cell19"] | [23,22,21] | [] |
| g2 | 1 | ["cell19","cell6","cell5"] | [21,4,3] | [] |
+---------+---------+-----------------------------+---------------+-------------------+--+ select gridid,height,collect_set(concat_ws(',',cell,cast(mrcount as string), cast(weakmrcount as string))) as cellArray
from (
select gridid,height,cell,mrcount,weakmrcount,row_number()over(partition by gridid,height order by mrcount desc) rn
from tommyduan_test
group by gridid,height,cell,mrcount,weakmrcount
) t10
where rn<4
group by gridid,height
+---------+---------+--------------------------------------------+--+
| gridid | height | cellarray |
+---------+---------+--------------------------------------------+--+
| g1 | 1 | ["cell3,23,3","cell2,22,3","cell19,21,3"] |
| g2 | 1 | ["cell19,21,3","cell6,4,3","cell5,3,3"] |
+---------+---------+--------------------------------------------+--+ select gridid,height,
(case when size(cellArray)>0 then split(cellArray[],'_')[] else '-9999' end) as cell1,
(case when size(cellArray)>0 then split(cellArray[],'_')[] else '-9999' end) as cell1_mrcount,
(case when size(cellArray)>0 then split(cellArray[],'_')[] else '-9999' end) as cell1_weakmrcount,
(case when size(cellArray)>1 then split(cellArray[],'_')[] else '-9999' end) as cell2,
(case when size(cellArray)>1 then split(cellArray[],'_')[] else '-9999' end) as cell2_mrcount,
(case when size(cellArray)>1 then split(cellArray[],'_')[] else '-9999' end) as cell2_weakmrcount,
(case when size(cellArray)>2 then split(cellArray[],'_')[] else '-9999' end) as cell3,
(case when size(cellArray)>2 then split(cellArray[],'_')[] else '-9999' end) as cell3_mrcount,
(case when size(cellArray)>2 then split(cellArray[],'_')[] else '-9999' end) as cell3_weakmrcount
from
(
select gridid,height,collect_set(concat_ws('_',cell,cast(mrcount as string), cast(weakmrcount as string))) as cellArray
from (
select gridid,height,cell,mrcount,weakmrcount,row_number()over(partition by gridid,height order by mrcount desc) rn
from tommyduan_test
group by gridid,height,cell,mrcount,weakmrcount
) t10
where rn<4
group by gridid,height
) t12;
+---------+---------+---------+----------------+--------------------+--------+----------------+--------------------+---------+----------------+--------------------+--+
| gridid | height | cell1 | cell1_mrcount | cell1_weakmrcount | cell2 | cell2_mrcount | cell2_weakmrcount | cell3 | cell3_mrcount | cell3_weakmrcount |
+---------+---------+---------+----------------+--------------------+--------+----------------+--------------------+---------+----------------+--------------------+--+
| g1 | 1 | cell3 | 23 | 3 | cell2 | 22 | 3 | cell19 | 21 | 3 |
| g2 | 1 | cell19 | 21 | 3 | cell6 | 4 | 3 | cell5 | 3 | 3 |
+---------+---------+---------+----------------+--------------------+--------+----------------+--------------------+---------+----------------+--------------------+--+
hive:数据库“行专列”操作---使用collect_set/collect_list/collect_all & row_number()over(partition by 分组字段 [order by 排序字段])的更多相关文章
- 数据库“行专列”操作---使用row_number()over(partition by 分组字段 [order by 排序字段])
测试样例: create table test(rsrp string,rsrq string,tkey string,distan string); '); '); '); '); select * ...
- dos命令行连接操作ORACLE数据库
C:\Adminstrator> sqlplus "/as sysdba" 查看是否连接到数据库 SQL> select status from v$instance; ...
- hive函数应用之操作json
1.创建表 createtable.sql中存放的创建表语句如下 create external table adt.jsontest ( appKey string comment "AP ...
- Python(数据库之表操作)
一.修改表 1. 修改表名 ALTER TABLE 表名 RENAME 新表名; #mysql中库名.表名对大小写不敏感 2. 增加字段 ALTER TABLE 表名ADD 字段名 数据类型 [完整性 ...
- SQL Server数据库--》top关键字,order by排序,distinct去除重复记录,sql聚合函数,模糊查询,通配符,空值处理。。。。
top关键字:写在select后面 字段的前面 比如你要显示查询的前5条记录,如下所示: select top 5 * from Student 一般情况下,top是和order by连用的 orde ...
- hive 分组排序函数 row_number() over(partition by " " order by " "desc
语法:row_number() over (partition by 字段a order by 计算项b desc ) rank --这里rank是别名 partition by:类似hive的建表, ...
- Hive数据库操作
Hive数据结构 除了基本数据类型(与java类似),hive支持三种集合类型 Hive集合类型数据 array.map.structs hive (default)> create table ...
- 大数据开发实战:离线大数据处理的主要技术--Hive,概念,SQL,Hive数据库
1.Hive出现背景 Hive是Facebook开发并贡献给Hadoop开源社区的.它是建立在Hadoop体系架构上的一层SQL抽象,使得数据相关人员使用他们最为熟悉的SQL语言就可以进行海量数据的处 ...
- HIVE的sql语句操作
Hive 是基于Hadoop 构建的一套数据仓库分析系统,它提供了丰富的SQL查询方式来分析存储在hadoop 分布式文件系统中的数据,可以将结构 化的数据文件映射为一张数据库表,并提供完整的SQL查 ...
随机推荐
- Atlas安装配置
准备环境 192.168.1.1(Altas) 192.168.1.2(MySQL主) 192.168.1.3(MySQL从) 官方链接:https://github.com/Qihoo360/Atl ...
- 部署Flask项目到腾讯云服务器CentOS7
部署Flask项目到腾讯云服务器CentOS7 安装git yum install git 安装依赖包 支持SSL传输协议 解压功能 C语言解析XML文档的 安装gdbm数据库 实现自动补全功能 sq ...
- IntelliJ IDEA 2017.1.5迁移eclipse,SSM项目,通过jrebel实现热部署
1.首先打开idea,配置SVN版本控制器的路径 2.配置maven 3.配置jrebel热部署的路径 4.从svn到出项目 5.配置配置tomacat参数-server -XX:PermSize=1 ...
- spring jpa 自定义查询数据库的某个字段
spring jpa 提供的查询很强大, 就看你会不会用了. 先上代码, 后面在解释吧 1. 想查单个表的某个字段 在repository中 @Query(value = "select i ...
- 第八届蓝桥杯B组java第四题
标题:取数位 求1个整数的第k位数字有很多种方法.以下的方法就是一种.对于题目中的测试数据,应该打印5.请仔细分析源码,并补充划线部分所缺少的代码.注意:只提交缺失的代码,不要填写任何已有内容或说明性 ...
- Mycat 配置说明(server.xml)
server.xml 几乎保存了所有mycat需要的系统配置信息,包括 mycat 用户管理.DML权限管理等,其在代码内直接的映射类为SystemConfig 类. user 标签 该标签主要用于定 ...
- 【Ansible】 基于SSH的远程管理工具
[Ansible] 参考文档:[http://www.ansible.com.cn/docs/intro.html] 和ansible类似的工具还有saltstack,puppet,sshpass等, ...
- 【Python】 垃圾回收机制和gc模块
垃圾回收机制和gc模块 Py的一个大好处,就是灵活的变量声明和动态变量类型.虽然这使得学习py起来非常方便快捷,但是同时也带来了py在性能上的一些不足.其中相关内存比较主要的一点就是py不会对已经销毁 ...
- 阿里图标库iconfont入门使用
目前大多数的互联网公司,前端开发和UI设计师配合中,针对设计师给图的效果图,前端开发工程师不再像往常一样对于细小图标进行切图,取而代之的是引用阿里图标库(http://iconfont.cn/):简单 ...
- 网络通信 --> ZMQ安装和使用
ZMQ安装和使用 ZMQ 并不像是一个传统意义上的消息队列服务器,事实上,它也根本不是一个服务器,它更像是一个底层的网络通讯库,在 Socket API 之上做了一层封装,将网络通讯.进程通讯和线程通 ...