hive之案例分析(grouping sets,lateral view explode, concat_ws)
有这样一组搜索结果数据:
租户,平台, 登录用户, 搜索关键词, 搜索的商品结果List
{"tenantcode":"0000001", "platform":"IOS","loginName":"13111111111", "keywords":"手机","goodsList":[{"skuCode":"sku00001","skuName":"skuname1","spuCode":"spuCode1","spuName":"spuName1"},{"skuCode":"sku00002","skuName":"skuname2","spuCode":"spuCode2","spuName":"spuName2"}]}
{"tenantcode":"0000001", "platform":"IOS","loginName":"13111111111", "keywords":"外国手机","goodsList":[]}
{"tenantcode":"0000001", "platform":"IOS","loginName":"13111111112", "keywords":"手机壳","goodsList":[{"skuCode":"sku00001","skuName":"skuname1","spuCode":"spuCode1","spuName":"spuName1"},{"skuCode":"sku00003","skuName":"skuname2","spuCode":"spuCode2","spuName":"spuName2"}]}
现在需要统计每个商品被哪些关键词搜索到,最终结果如下:
这里最关键的是sku对应到命中的关键词:
操作步骤1:
将给出的数据goodslist一列转为多行结构如下,重点用到了lateral view explode来解析。
select tenantcode,
nvl(platform,0) as platform,
keywords,
'day' as dim_code,
'' as dim_value,
gl['skucode'] as skucode,
gl['skuname'] as skuname,
gl['spucode'] as spucode,
gl['spuname'] as spuname
from dw_mdl.m_search_result2
lateral view explode(goodsList) gl as gl
where dt = '';
显示如下:
操作步骤2:
根据商品,汇总关键词列,这里考虑到平台,时间维度等。
grouping sets 分组汇总数据
collect_set 多行合并并且去重
collect_list 多行合并不去重
with tmp_a as (
select tenantcode,
nvl(platform,0) as platform,
keywords,
'day' as dim_code,
'' as dim_value,
gl['skucode'] as skucode,
gl['skuname'] as skuname,
gl['spucode'] as spucode,
gl['spuname'] as spuname
from dw_mdl.m_search_result2
lateral view explode(goodsList) gl as gl
where dt = ''
) select tenantcode,
nvl(platform,'all') as platform,
skucode,
dim_code,
dim_value,
count(skuname) as search_times,
collect_set(keywords) as keywords
from tmp_a
group by tenantcode,platform,skucode,dim_code,dim_value
grouping sets((tenantcode,platform,skucode,dim_code,dim_value),(tenantcode,skucode,dim_code,dim_value))
操作步骤3:
数组转字符串: concat_ws('分隔符',数组)
with tmp_a as (
select tenantcode,
nvl(platform,0) as platform,
keywords,
'day' as dim_code,
'' as dim_value,
gl['skucode'] as skucode,
gl['skuname'] as skuname,
gl['spucode'] as spucode,
gl['spuname'] as spuname
from dw_mdl.m_search_result2
lateral view explode(goodsList) gl as gl
where dt = ''
),
tmp_b as (
select tenantcode,
nvl(platform,'all') as platform,
skucode,
dim_code,
dim_value,
count(skuname) as search_times,
concat_ws(',',collect_set(keywords)) as keywords
from tmp_a
group by tenantcode,platform,skucode,dim_code,dim_value
grouping sets((tenantcode,platform,skucode,dim_code,dim_value),(tenantcode,skucode,dim_code,dim_value))
)
select * from tmp_b;
是不是太简单了。
hive之案例分析(grouping sets,lateral view explode, concat_ws)的更多相关文章
- Hive lateral view explode
select 'hello', x from dual lateral view explode(array(1,2,3,4,5)) vt as x 结果是: hello 1 hello 2 ...
- hive lateral view 与 explode详解
ref:https://blog.csdn.net/bitcarmanlee/article/details/51926530 1.explode hive wiki对于expolde的解释如下: e ...
- hive splict, explode, lateral view, concat_ws
hive> create table arrays (x array<string>) > row format delimited fields terminated by ...
- hive 使用笔记(table format;lateral view)
1. create table 创建一张目标表,指定分隔符和存储格式: create table tmp_2 (resource_id bigint ,v int) ROW FORMAT DELIMI ...
- hive 使用笔记(table format;lateral view横表转纵表)
1. create table 创建一张目标表,指定分隔符和存储格式: create table tmp_2 (resource_id bigint ,v int) ROW FORMAT DELIMI ...
- hive中的lateral view 与 explode函数的使用
hive中的lateral view 与 explode函数的使用 背景介绍: explode与lateral view在关系型数据库中本身是不该出现的. 因为他的出现本身就是在操作不满足第一范式的数 ...
- 【Hive学习之六】Hive Lateral View &视图&索引
环境 虚拟机:VMware 10 Linux版本:CentOS-6.5-x86_64 客户端:Xshell4 FTP:Xftp4 jdk8 hadoop-3.1.1 apache-hive-3.1.1 ...
- hive grouping sets 实现原理
先下结论: 看了hive 1.1.0 grouping sets 实现(从源码及执行计划都可以看出与kylin实现不一样),(前提是可累加,如sum函数)他并没有像kylin一样先按照group by ...
- 【hive】lateral view的使用
当使用UDTF函数的时候,hive只允许对拆分字段进行访问的 例如: select id,explode(arry1) from table; —错误 会报错FAILED: SemanticExcep ...
随机推荐
- Oracle的执行计划(来自百度文库)
如何开启oracle执行计划 http://wenku.baidu.com/view/7d1ff6bc960590c69ec37636.html怎样看懂Oracle的执行计划 http://wenku ...
- openvpn 的安装和使用
这里我参考的文章有 OpenVpn https://my.oschina.net/mn1127/blog/855842http://linuxchina.blog.51cto.com/938835/1 ...
- rocketMq排坑:如何设置rocketMq broker的ip地址
在工作中遇到了一个这个问题,就是我们rocketmq是部署在云主机上的 但是我们的开发同事在自己的电脑连接rocketmq链接不上 报错显示Caused by: org.apache.rocketmq ...
- jQuery学习笔记(jquery.form插件)
官网: http://malsup.com/jquery/form/ jQuery Form插件是一个优秀的Ajax表单插件,可以非常容易地.无侵入地升级HTML表单以支持Ajax.jQuery Fo ...
- jQuery学习笔记(事件)
1. 加载DOM jQuery用$(document).ready()方法来代替传统JavaScrpt的window.onload方法.但它们执行时机有所不同,window.onload在网页所有元素 ...
- JDK1.5新特性,基础类库篇,XML增强
Document Object Model (DOM) Level 3 文件对象模型(Document Object Model,简称DOM),是W3C组织推荐的处理可扩展置标语言的标准编程接口.DO ...
- c++中浮点数精度设置
1.包含头文件<iomanip>,附注manip是manipulator,操控的简写. 2.第一种写法: cout<<setiosflags(ios::); 第二种写法: co ...
- Atitit jdbc 处理返回多个结果集
Atitit jdbc 处理返回多个结果集 Statement接口提供了三种执行SQL语句的方法: executeQuery.executeUpdate和execute.使用哪一个方法由SQL语句所 ...
- [svc]linux正则及grep常用手法
正则测试 可以用sublime等工具快速的检测正则是否合适 china : 匹配此行中任意位置有china字符的行 ^china : 匹配此以china开关的行 china$ : 匹配以china结尾 ...
- hdu 1875 畅通工程再续(prim方法求得最小生成树)
题目:http://acm.hdu.edu.cn/showproblem.php?pid=1875 /************************************************* ...