hive之案例分析(grouping sets,lateral view explode, concat_ws)
有这样一组搜索结果数据:
租户,平台, 登录用户, 搜索关键词, 搜索的商品结果List
{"tenantcode":"0000001", "platform":"IOS","loginName":"13111111111", "keywords":"手机","goodsList":[{"skuCode":"sku00001","skuName":"skuname1","spuCode":"spuCode1","spuName":"spuName1"},{"skuCode":"sku00002","skuName":"skuname2","spuCode":"spuCode2","spuName":"spuName2"}]}
{"tenantcode":"0000001", "platform":"IOS","loginName":"13111111111", "keywords":"外国手机","goodsList":[]}
{"tenantcode":"0000001", "platform":"IOS","loginName":"13111111112", "keywords":"手机壳","goodsList":[{"skuCode":"sku00001","skuName":"skuname1","spuCode":"spuCode1","spuName":"spuName1"},{"skuCode":"sku00003","skuName":"skuname2","spuCode":"spuCode2","spuName":"spuName2"}]}
现在需要统计每个商品被哪些关键词搜索到,最终结果如下:

这里最关键的是sku对应到命中的关键词:
操作步骤1:
将给出的数据goodslist一列转为多行结构如下,重点用到了lateral view explode来解析。
select tenantcode,
nvl(platform,0) as platform,
keywords,
'day' as dim_code,
'' as dim_value,
gl['skucode'] as skucode,
gl['skuname'] as skuname,
gl['spucode'] as spucode,
gl['spuname'] as spuname
from dw_mdl.m_search_result2
lateral view explode(goodsList) gl as gl
where dt = '';
显示如下:

操作步骤2:
根据商品,汇总关键词列,这里考虑到平台,时间维度等。
grouping sets 分组汇总数据
collect_set 多行合并并且去重
collect_list 多行合并不去重
with tmp_a as (
select tenantcode,
nvl(platform,0) as platform,
keywords,
'day' as dim_code,
'' as dim_value,
gl['skucode'] as skucode,
gl['skuname'] as skuname,
gl['spucode'] as spucode,
gl['spuname'] as spuname
from dw_mdl.m_search_result2
lateral view explode(goodsList) gl as gl
where dt = ''
) select tenantcode,
nvl(platform,'all') as platform,
skucode,
dim_code,
dim_value,
count(skuname) as search_times,
collect_set(keywords) as keywords
from tmp_a
group by tenantcode,platform,skucode,dim_code,dim_value
grouping sets((tenantcode,platform,skucode,dim_code,dim_value),(tenantcode,skucode,dim_code,dim_value))

操作步骤3:
数组转字符串: concat_ws('分隔符',数组)
with tmp_a as (
select tenantcode,
nvl(platform,0) as platform,
keywords,
'day' as dim_code,
'' as dim_value,
gl['skucode'] as skucode,
gl['skuname'] as skuname,
gl['spucode'] as spucode,
gl['spuname'] as spuname
from dw_mdl.m_search_result2
lateral view explode(goodsList) gl as gl
where dt = ''
),
tmp_b as (
select tenantcode,
nvl(platform,'all') as platform,
skucode,
dim_code,
dim_value,
count(skuname) as search_times,
concat_ws(',',collect_set(keywords)) as keywords
from tmp_a
group by tenantcode,platform,skucode,dim_code,dim_value
grouping sets((tenantcode,platform,skucode,dim_code,dim_value),(tenantcode,skucode,dim_code,dim_value))
)
select * from tmp_b;

是不是太简单了。
hive之案例分析(grouping sets,lateral view explode, concat_ws)的更多相关文章
- Hive lateral view explode
select 'hello', x from dual lateral view explode(array(1,2,3,4,5)) vt as x 结果是: hello 1 hello 2 ...
- hive lateral view 与 explode详解
ref:https://blog.csdn.net/bitcarmanlee/article/details/51926530 1.explode hive wiki对于expolde的解释如下: e ...
- hive splict, explode, lateral view, concat_ws
hive> create table arrays (x array<string>) > row format delimited fields terminated by ...
- hive 使用笔记(table format;lateral view)
1. create table 创建一张目标表,指定分隔符和存储格式: create table tmp_2 (resource_id bigint ,v int) ROW FORMAT DELIMI ...
- hive 使用笔记(table format;lateral view横表转纵表)
1. create table 创建一张目标表,指定分隔符和存储格式: create table tmp_2 (resource_id bigint ,v int) ROW FORMAT DELIMI ...
- hive中的lateral view 与 explode函数的使用
hive中的lateral view 与 explode函数的使用 背景介绍: explode与lateral view在关系型数据库中本身是不该出现的. 因为他的出现本身就是在操作不满足第一范式的数 ...
- 【Hive学习之六】Hive Lateral View &视图&索引
环境 虚拟机:VMware 10 Linux版本:CentOS-6.5-x86_64 客户端:Xshell4 FTP:Xftp4 jdk8 hadoop-3.1.1 apache-hive-3.1.1 ...
- hive grouping sets 实现原理
先下结论: 看了hive 1.1.0 grouping sets 实现(从源码及执行计划都可以看出与kylin实现不一样),(前提是可累加,如sum函数)他并没有像kylin一样先按照group by ...
- 【hive】lateral view的使用
当使用UDTF函数的时候,hive只允许对拆分字段进行访问的 例如: select id,explode(arry1) from table; —错误 会报错FAILED: SemanticExcep ...
随机推荐
- 常用DC-DC;AC-DC电源芯片
求推荐几个常用的开关电源芯片http://bbs.21ic.com/icview-1245974-1-1.html(出处: 21ic电子技术论坛) 1.1 DC-DC电源转换器 1.低噪声电荷泵DC- ...
- 如何将shell的打印日志输入到日志文件
如果shell打印的日志很多,屏幕无法完全显示,需要查看shell执行的情况,这是就需要输入到日值了: 如:echo "2012-6-14" | tee -a my.log -a表 ...
- python opencv 按一定间隔截取视频帧
前言关于opencvOpenCV 是 Intel 开源计算机视觉库 (Computer Version) .它由一系列 C 函数和少量 C++ 类构成,实现了图像处理和计算机视觉方面的很多通用算法. ...
- mysql 大数据提取
今天要重五百多万的一个数据库表 提取 大约五十万条数据,刚开始的解决思路是: 先把数据查询出来,然后再导出来,然后再设计一个数据库表格,把这些数据导入,最后导出数据和导入数据花费了很多时间,最后向同事 ...
- 中控考勤机SDK使用中员工姓名的处理( c# )
公司使用的考勤机是中控的指纹考勤机,但是中控的型号乱七八糟,通过程序读出来的型号和实际标的型号不一致. 另外,提供的开发包的C#版本的Demo中调用 axCZKEM1.ReadAllUserID(iM ...
- 微信小程序JS导出和导入
1. 导出 1.1 方法和变量导出(写在被导出方法和变量的js文件) module.exports = { variable: value, method : methodName } 1.2 cla ...
- kafka 监控(eagle)
topic:创建时topic名称 partition:分区编号 offset:表示该parition已经消费了多少条message logSize:表示该partition已经写了多少条message ...
- php分享十五:php的数据库操作
一:术语解释: What is an Extension? API和扩展不能理解为一个东西,因为扩展不一定暴露一个api给用户 The PDO MySQL driver extension, for ...
- ARC指南 strong和weak指针
一.简介 ARC是自iOS 5之后增加的新特性,完全消除了手动管理内存的烦琐,编译器会自动在适当的地方插入适当的retain.release.autorelease语句.你不再需要担心内存管理,因为编 ...
- javascript基础拾遗(三)
1.map数组映射操作 function add(x) { return x+1 } var nums = [1,3,5,7,9] result = nums.map(add) console.log ...