有这样一组搜索结果数据:

租户,平台, 登录用户, 搜索关键词, 搜索的商品结果List

{"tenantcode":"0000001", "platform":"IOS","loginName":"13111111111", "keywords":"手机","goodsList":[{"skuCode":"sku00001","skuName":"skuname1","spuCode":"spuCode1","spuName":"spuName1"},{"skuCode":"sku00002","skuName":"skuname2","spuCode":"spuCode2","spuName":"spuName2"}]}
{"tenantcode":"0000001", "platform":"IOS","loginName":"13111111111", "keywords":"外国手机","goodsList":[]}
{"tenantcode":"0000001", "platform":"IOS","loginName":"13111111112", "keywords":"手机壳","goodsList":[{"skuCode":"sku00001","skuName":"skuname1","spuCode":"spuCode1","spuName":"spuName1"},{"skuCode":"sku00003","skuName":"skuname2","spuCode":"spuCode2","spuName":"spuName2"}]}

现在需要统计每个商品被哪些关键词搜索到,最终结果如下:

这里最关键的是sku对应到命中的关键词:

操作步骤1: 

将给出的数据goodslist一列转为多行结构如下,重点用到了lateral view explode来解析。

    select tenantcode,
nvl(platform,0) as platform,
keywords,
'day' as dim_code,
'' as dim_value,
gl['skucode'] as skucode,
gl['skuname'] as skuname,
gl['spucode'] as spucode,
gl['spuname'] as spuname
from dw_mdl.m_search_result2
lateral view explode(goodsList) gl as gl
where dt = '';

显示如下:

操作步骤2:

根据商品,汇总关键词列,这里考虑到平台,时间维度等。

grouping sets 分组汇总数据

collect_set 多行合并并且去重

collect_list 多行合并不去重

with tmp_a as (
select tenantcode,
nvl(platform,0) as platform,
keywords,
'day' as dim_code,
'' as dim_value,
gl['skucode'] as skucode,
gl['skuname'] as skuname,
gl['spucode'] as spucode,
gl['spuname'] as spuname
from dw_mdl.m_search_result2
lateral view explode(goodsList) gl as gl
where dt = ''
) select tenantcode,
nvl(platform,'all') as platform,
skucode,
dim_code,
dim_value,
count(skuname) as search_times,
collect_set(keywords) as keywords
from tmp_a
group by tenantcode,platform,skucode,dim_code,dim_value
grouping sets((tenantcode,platform,skucode,dim_code,dim_value),(tenantcode,skucode,dim_code,dim_value))

操作步骤3:

数组转字符串: concat_ws('分隔符',数组)

with tmp_a as (
select tenantcode,
nvl(platform,0) as platform,
keywords,
'day' as dim_code,
'' as dim_value,
gl['skucode'] as skucode,
gl['skuname'] as skuname,
gl['spucode'] as spucode,
gl['spuname'] as spuname
from dw_mdl.m_search_result2
lateral view explode(goodsList) gl as gl
where dt = ''
),
tmp_b as (
select tenantcode,
nvl(platform,'all') as platform,
skucode,
dim_code,
dim_value,
count(skuname) as search_times,
concat_ws(',',collect_set(keywords)) as keywords
from tmp_a
group by tenantcode,platform,skucode,dim_code,dim_value
grouping sets((tenantcode,platform,skucode,dim_code,dim_value),(tenantcode,skucode,dim_code,dim_value))
)
select * from tmp_b;

是不是太简单了。

hive之案例分析(grouping sets,lateral view explode, concat_ws)的更多相关文章

  1. Hive lateral view explode

    select 'hello', x from dual lateral view explode(array(1,2,3,4,5)) vt as x 结果是: hello   1 hello   2 ...

  2. hive lateral view 与 explode详解

    ref:https://blog.csdn.net/bitcarmanlee/article/details/51926530 1.explode hive wiki对于expolde的解释如下: e ...

  3. hive splict, explode, lateral view, concat_ws

    hive> create table arrays (x array<string>) > row format delimited fields terminated by ...

  4. hive 使用笔记(table format;lateral view)

    1. create table 创建一张目标表,指定分隔符和存储格式: create table tmp_2 (resource_id bigint ,v int) ROW FORMAT DELIMI ...

  5. hive 使用笔记(table format;lateral view横表转纵表)

    1. create table 创建一张目标表,指定分隔符和存储格式: create table tmp_2 (resource_id bigint ,v int) ROW FORMAT DELIMI ...

  6. hive中的lateral view 与 explode函数的使用

    hive中的lateral view 与 explode函数的使用 背景介绍: explode与lateral view在关系型数据库中本身是不该出现的. 因为他的出现本身就是在操作不满足第一范式的数 ...

  7. 【Hive学习之六】Hive Lateral View &视图&索引

    环境 虚拟机:VMware 10 Linux版本:CentOS-6.5-x86_64 客户端:Xshell4 FTP:Xftp4 jdk8 hadoop-3.1.1 apache-hive-3.1.1 ...

  8. hive grouping sets 实现原理

    先下结论: 看了hive 1.1.0 grouping sets 实现(从源码及执行计划都可以看出与kylin实现不一样),(前提是可累加,如sum函数)他并没有像kylin一样先按照group by ...

  9. 【hive】lateral view的使用

    当使用UDTF函数的时候,hive只允许对拆分字段进行访问的 例如: select id,explode(arry1) from table; —错误 会报错FAILED: SemanticExcep ...

随机推荐

  1. MySQL中 optimize table '表名'的作用

    语法: optimize table '表名' 一,原始数据 1,数据量 2,存放在硬盘中的表文件大小 3,查看一下索引信息 索引信息中的列的信息说明. Table :表的名称.Non_unique: ...

  2. FreeSWITCH呼叫参数之sip_cid_type

    这个参数定义了呼叫中主叫信息的头字段类型.支持两种类型: 1. rpidRemote-Party-ID头,这是默认的设置.{sip_cid_type=rpid}sofia/default/user@e ...

  3. python标准库介绍——33 thread 模块详解

    ?==thread 模块== (可选) ``thread`` 模块提为线程提供了一个低级 (low_level) 的接口, 如 [Example 3-6 #eg-3-6] 所示. 只有你在编译解释器时 ...

  4. SpringMVC 封装返回结果对象

    /*** *请求返回的最外层对象 **/ public class Result<T>{ /*错误码*/ private Integer code; /*提示信息*/ private St ...

  5. 修改 Semantic UI 中对 Google 字体的引用

    在第一次尝试 Semantic UI 后,发现其 css 中第一行,就引用了 fonts.googleapis.com 中的字体. 不知道为什么要这么做,也许在国外,google 的服务已经是一种互联 ...

  6. eclipse 创建Maven 架构的dynamic web project 问题解决汇总

    Eclipse创建Maven结构的web项目的时候选择Artifact Id为maven-artchetype-webapp,点击finish之后,一般会遇到如下问题 1. The superclas ...

  7. .NET CORE2.0发布后没有 VIEWS视图页面文件

    以前做的CORE1.0的项目,发布的时候有views文件夹的,升级VS后用CORE2.0做项目,发布后没有views文件夹了,全编译到一个类似于Niunan.ZYYCY.Web.Precompiled ...

  8. bit,Byte,Word,DWORD(DOUBLE WORD,DW)

    1个二进制位称为1个bit,8个二进制位称为1个Byte,也就是1个字节(8位),2个字节就是1个Word(1个字,16位),则DWORD(DOUBLE WORD)就是双字的意思,两个字(4个字节/3 ...

  9. [100]tar命令打包(排除目录或文件)

    在linux中可以用tar打包目录以方便传输or备份,我们先来看一个例子 Linux下tar命令exclude选项排除指定文件或目录 test 文件夹有如下文件 [root@lee ~]# ll te ...

  10. sql字段组合唯一

    create unique index [Itenmid_Uid] on Userchangeinfo(Itemid,Uid)