有这样一组搜索结果数据:

租户,平台, 登录用户, 搜索关键词, 搜索的商品结果List

{"tenantcode":"0000001", "platform":"IOS","loginName":"13111111111", "keywords":"手机","goodsList":[{"skuCode":"sku00001","skuName":"skuname1","spuCode":"spuCode1","spuName":"spuName1"},{"skuCode":"sku00002","skuName":"skuname2","spuCode":"spuCode2","spuName":"spuName2"}]}
{"tenantcode":"0000001", "platform":"IOS","loginName":"13111111111", "keywords":"外国手机","goodsList":[]}
{"tenantcode":"0000001", "platform":"IOS","loginName":"13111111112", "keywords":"手机壳","goodsList":[{"skuCode":"sku00001","skuName":"skuname1","spuCode":"spuCode1","spuName":"spuName1"},{"skuCode":"sku00003","skuName":"skuname2","spuCode":"spuCode2","spuName":"spuName2"}]}

现在需要统计每个商品被哪些关键词搜索到,最终结果如下:

这里最关键的是sku对应到命中的关键词:

操作步骤1: 

将给出的数据goodslist一列转为多行结构如下,重点用到了lateral view explode来解析。

    select tenantcode,
nvl(platform,0) as platform,
keywords,
'day' as dim_code,
'' as dim_value,
gl['skucode'] as skucode,
gl['skuname'] as skuname,
gl['spucode'] as spucode,
gl['spuname'] as spuname
from dw_mdl.m_search_result2
lateral view explode(goodsList) gl as gl
where dt = '';

显示如下:

操作步骤2:

根据商品,汇总关键词列,这里考虑到平台,时间维度等。

grouping sets 分组汇总数据

collect_set 多行合并并且去重

collect_list 多行合并不去重

with tmp_a as (
select tenantcode,
nvl(platform,0) as platform,
keywords,
'day' as dim_code,
'' as dim_value,
gl['skucode'] as skucode,
gl['skuname'] as skuname,
gl['spucode'] as spucode,
gl['spuname'] as spuname
from dw_mdl.m_search_result2
lateral view explode(goodsList) gl as gl
where dt = ''
) select tenantcode,
nvl(platform,'all') as platform,
skucode,
dim_code,
dim_value,
count(skuname) as search_times,
collect_set(keywords) as keywords
from tmp_a
group by tenantcode,platform,skucode,dim_code,dim_value
grouping sets((tenantcode,platform,skucode,dim_code,dim_value),(tenantcode,skucode,dim_code,dim_value))

操作步骤3:

数组转字符串: concat_ws('分隔符',数组)

with tmp_a as (
select tenantcode,
nvl(platform,0) as platform,
keywords,
'day' as dim_code,
'' as dim_value,
gl['skucode'] as skucode,
gl['skuname'] as skuname,
gl['spucode'] as spucode,
gl['spuname'] as spuname
from dw_mdl.m_search_result2
lateral view explode(goodsList) gl as gl
where dt = ''
),
tmp_b as (
select tenantcode,
nvl(platform,'all') as platform,
skucode,
dim_code,
dim_value,
count(skuname) as search_times,
concat_ws(',',collect_set(keywords)) as keywords
from tmp_a
group by tenantcode,platform,skucode,dim_code,dim_value
grouping sets((tenantcode,platform,skucode,dim_code,dim_value),(tenantcode,skucode,dim_code,dim_value))
)
select * from tmp_b;

是不是太简单了。

hive之案例分析(grouping sets,lateral view explode, concat_ws)的更多相关文章

  1. Hive lateral view explode

    select 'hello', x from dual lateral view explode(array(1,2,3,4,5)) vt as x 结果是: hello   1 hello   2 ...

  2. hive lateral view 与 explode详解

    ref:https://blog.csdn.net/bitcarmanlee/article/details/51926530 1.explode hive wiki对于expolde的解释如下: e ...

  3. hive splict, explode, lateral view, concat_ws

    hive> create table arrays (x array<string>) > row format delimited fields terminated by ...

  4. hive 使用笔记(table format;lateral view)

    1. create table 创建一张目标表,指定分隔符和存储格式: create table tmp_2 (resource_id bigint ,v int) ROW FORMAT DELIMI ...

  5. hive 使用笔记(table format;lateral view横表转纵表)

    1. create table 创建一张目标表,指定分隔符和存储格式: create table tmp_2 (resource_id bigint ,v int) ROW FORMAT DELIMI ...

  6. hive中的lateral view 与 explode函数的使用

    hive中的lateral view 与 explode函数的使用 背景介绍: explode与lateral view在关系型数据库中本身是不该出现的. 因为他的出现本身就是在操作不满足第一范式的数 ...

  7. 【Hive学习之六】Hive Lateral View &视图&索引

    环境 虚拟机:VMware 10 Linux版本:CentOS-6.5-x86_64 客户端:Xshell4 FTP:Xftp4 jdk8 hadoop-3.1.1 apache-hive-3.1.1 ...

  8. hive grouping sets 实现原理

    先下结论: 看了hive 1.1.0 grouping sets 实现(从源码及执行计划都可以看出与kylin实现不一样),(前提是可累加,如sum函数)他并没有像kylin一样先按照group by ...

  9. 【hive】lateral view的使用

    当使用UDTF函数的时候,hive只允许对拆分字段进行访问的 例如: select id,explode(arry1) from table; —错误 会报错FAILED: SemanticExcep ...

随机推荐

  1. 【转载并整理】ORACLE锁机制

    转载文章:http://blog.csdn.net/liuyiy/article/details/25005393 转载文章:http://www.cnblogs.com/jiyuqi/p/37017 ...

  2. C#基础第四天-作业答案-Hashtable-list<KeyValuePair>泛型实现名片

    .Hashtable 实现 Hashtable table = new Hashtable(); while (true) { Console.WriteLine("------------ ...

  3. spring 项目中使用 hibernate validator验证输入参数

    1 hibernate validator 官方文档:https://docs.jboss.org/hibernate/stable/validator/reference/en-US/html_si ...

  4. PowerShell控制台字体设置

    1.打开注册表: HKEY_CURRENT_USER\Console\%SystemRoot%_System32_WindowsPowerShell_v1.0_powershell.exe 2.找到键 ...

  5. 【Unity】第9章 粒子系统

    分类:Unity.C#.VS2015 创建日期:2016-05-02 一.简介 粒子是在三维空间中渲染出来的二维图像,主要用于在场景中表现如烟.火.水滴.落叶.--等各种效果. Unity粒子系统 ( ...

  6. 用casperjs模拟登录,支持多个账户登录

    var casper = require('casper').create({ viewportSize:{ width:1920, height:1080 } }); var url1 = 'htt ...

  7. [Windows Azure] Walkthrough to Configure System Center Management Pack for Windows Azure Fabric Preview for SCOM 2012 SP1 (with a MetricsHub Bonus)

    The wait is finally over. This is a huge update to the Azure Management Pack over the one that was r ...

  8. (原创)c++11改进我们的模式之改进单例模式

    我会写关于c++11的一个系列的文章,会讲到如何使用c++11改进我们的程序,本次讲如何改进我们的模式,会讲到如何改进单例模式.观察者模式.访问者模式.工厂模式.命令模式等模式.通过c++11的改进, ...

  9. Python 传值和传址 copy/deepcopy

    传值:被调函数局部变量改变不会影响主调函数局部变量 传址:被调函数局部变量改变会影响主调函数局部变量 Python参数传递方式:传递对象引用(传值和传址的混合方式),如果是数字,字符串,元组则传值:如 ...

  10. tensorflow笔记1:基础函数、embedding_lookup

    函数一:tf.nn.embedding_lookup() ERROR: I get this error: TypeError: Tensors in list passed to 'values' ...