有这样一组搜索结果数据:

租户,平台, 登录用户, 搜索关键词, 搜索的商品结果List

{"tenantcode":"0000001", "platform":"IOS","loginName":"13111111111", "keywords":"手机","goodsList":[{"skuCode":"sku00001","skuName":"skuname1","spuCode":"spuCode1","spuName":"spuName1"},{"skuCode":"sku00002","skuName":"skuname2","spuCode":"spuCode2","spuName":"spuName2"}]}
{"tenantcode":"0000001", "platform":"IOS","loginName":"13111111111", "keywords":"外国手机","goodsList":[]}
{"tenantcode":"0000001", "platform":"IOS","loginName":"13111111112", "keywords":"手机壳","goodsList":[{"skuCode":"sku00001","skuName":"skuname1","spuCode":"spuCode1","spuName":"spuName1"},{"skuCode":"sku00003","skuName":"skuname2","spuCode":"spuCode2","spuName":"spuName2"}]}

现在需要统计每个商品被哪些关键词搜索到,最终结果如下:

这里最关键的是sku对应到命中的关键词:

操作步骤1: 

将给出的数据goodslist一列转为多行结构如下,重点用到了lateral view explode来解析。

    select tenantcode,
nvl(platform,0) as platform,
keywords,
'day' as dim_code,
'' as dim_value,
gl['skucode'] as skucode,
gl['skuname'] as skuname,
gl['spucode'] as spucode,
gl['spuname'] as spuname
from dw_mdl.m_search_result2
lateral view explode(goodsList) gl as gl
where dt = '';

显示如下:

操作步骤2:

根据商品,汇总关键词列,这里考虑到平台,时间维度等。

grouping sets 分组汇总数据

collect_set 多行合并并且去重

collect_list 多行合并不去重

with tmp_a as (
select tenantcode,
nvl(platform,0) as platform,
keywords,
'day' as dim_code,
'' as dim_value,
gl['skucode'] as skucode,
gl['skuname'] as skuname,
gl['spucode'] as spucode,
gl['spuname'] as spuname
from dw_mdl.m_search_result2
lateral view explode(goodsList) gl as gl
where dt = ''
) select tenantcode,
nvl(platform,'all') as platform,
skucode,
dim_code,
dim_value,
count(skuname) as search_times,
collect_set(keywords) as keywords
from tmp_a
group by tenantcode,platform,skucode,dim_code,dim_value
grouping sets((tenantcode,platform,skucode,dim_code,dim_value),(tenantcode,skucode,dim_code,dim_value))

操作步骤3:

数组转字符串: concat_ws('分隔符',数组)

with tmp_a as (
select tenantcode,
nvl(platform,0) as platform,
keywords,
'day' as dim_code,
'' as dim_value,
gl['skucode'] as skucode,
gl['skuname'] as skuname,
gl['spucode'] as spucode,
gl['spuname'] as spuname
from dw_mdl.m_search_result2
lateral view explode(goodsList) gl as gl
where dt = ''
),
tmp_b as (
select tenantcode,
nvl(platform,'all') as platform,
skucode,
dim_code,
dim_value,
count(skuname) as search_times,
concat_ws(',',collect_set(keywords)) as keywords
from tmp_a
group by tenantcode,platform,skucode,dim_code,dim_value
grouping sets((tenantcode,platform,skucode,dim_code,dim_value),(tenantcode,skucode,dim_code,dim_value))
)
select * from tmp_b;

是不是太简单了。

hive之案例分析(grouping sets,lateral view explode, concat_ws)的更多相关文章

  1. Hive lateral view explode

    select 'hello', x from dual lateral view explode(array(1,2,3,4,5)) vt as x 结果是: hello   1 hello   2 ...

  2. hive lateral view 与 explode详解

    ref:https://blog.csdn.net/bitcarmanlee/article/details/51926530 1.explode hive wiki对于expolde的解释如下: e ...

  3. hive splict, explode, lateral view, concat_ws

    hive> create table arrays (x array<string>) > row format delimited fields terminated by ...

  4. hive 使用笔记(table format;lateral view)

    1. create table 创建一张目标表,指定分隔符和存储格式: create table tmp_2 (resource_id bigint ,v int) ROW FORMAT DELIMI ...

  5. hive 使用笔记(table format;lateral view横表转纵表)

    1. create table 创建一张目标表,指定分隔符和存储格式: create table tmp_2 (resource_id bigint ,v int) ROW FORMAT DELIMI ...

  6. hive中的lateral view 与 explode函数的使用

    hive中的lateral view 与 explode函数的使用 背景介绍: explode与lateral view在关系型数据库中本身是不该出现的. 因为他的出现本身就是在操作不满足第一范式的数 ...

  7. 【Hive学习之六】Hive Lateral View &视图&索引

    环境 虚拟机:VMware 10 Linux版本:CentOS-6.5-x86_64 客户端:Xshell4 FTP:Xftp4 jdk8 hadoop-3.1.1 apache-hive-3.1.1 ...

  8. hive grouping sets 实现原理

    先下结论: 看了hive 1.1.0 grouping sets 实现(从源码及执行计划都可以看出与kylin实现不一样),(前提是可累加,如sum函数)他并没有像kylin一样先按照group by ...

  9. 【hive】lateral view的使用

    当使用UDTF函数的时候,hive只允许对拆分字段进行访问的 例如: select id,explode(arry1) from table; —错误 会报错FAILED: SemanticExcep ...

随机推荐

  1. struts系列:返回json格式的响应

    一.增加依赖库 // https://mvnrepository.com/artifact/org.apache.struts/struts2-json-plugin compile group: ' ...

  2. Android RoboGuice开源框架、Butter Knife开源框架浅析

    Google Guice on Android(RoboGuice) 今天介绍一下Google的这个开源框架RoboGuice, 它的作用跟之前讲过的Dagger框架差点儿是一样的,仅仅是Dagger ...

  3. 怎么在eclipse中查到这个类用的是哪个jar的类和Eclipse 编译错误 Access restriction:The type *** is not accessible due to restriction on... 解决方案

    找到了一个办法,你先按F3,然后点击Change Attached Source..按钮,在弹出的框里有个路径,我的路径是D:/SNFWorkSpace/JAR/~importu9.jar,然后你去引 ...

  4. Android 常用算法

    排序算法 简单排序算法 冒泡排序 两两比较相邻记录的关键字,如果反序则交换,直到没有反序的记录为止 直接插入排序 通过 n-i 次关键字间的比较,从 n-i+1 个记录中选出关键字最小的记录,并和第 ...

  5. 4-4-串的KMP匹配算法-串-第4章-《数据结构》课本源码-严蔚敏吴伟民版

    课本源码部分 第4章  串 - KMP匹配算法 ——<数据结构>-严蔚敏.吴伟民版        源码使用说明  链接☛☛☛ <数据结构-C语言版>(严蔚敏,吴伟民版)课本源码 ...

  6. nginx配置长连接

    http { keepalive_timeout 20; --长连接timeout keepalive_requests 8192; --每个连接最大请求数 } events { worker_con ...

  7. iOS开发如何学习前端(1)

    iOS开发如何学习前端(1) 我为何学前端?因为无聊. 概念 前端大概三大块. HTML CSS JavaScript 基本上每个概念在iOS中都有对应的.HTML请想象成只能拉Autolayout或 ...

  8. Implementation Notes: Runtime Environment Map Filtering for Image Based Lighting

    https://placeholderart.wordpress.com/2015/07/28/implementation-notes-runtime-environment-map-filteri ...

  9. Spark SQL编程指南(Python)【转】

    转自:http://www.cnblogs.com/yurunmiao/p/4685310.html 前言   Spark SQL允许我们在Spark环境中使用SQL或者Hive SQL执行关系型查询 ...

  10. LeetCode: Pascal's Triangle 解题报告

    Pascal's Triangle Given numRows, generate the first numRows of Pascal's triangle. For example, given ...