hive 函数
collect_set(x) 列转行函数---没有重复, 组装多列的数据的结构体
collect_list(x) 列转行函数---可以有重复,组装多列的数据的结构体
concat_ws 拼接函数, 用于多列转成同一行字段后,间隔符
UDF(User-Defined-Function) 用户定义(普通)函数,只对单行数值产生作用;
UDAF(User- Defined Aggregation Funcation)用户定义聚合函数,可对多行数据产生作用;等同与SQL中常用的SUM(),AVG(),也是聚合函数;
UDTF(User-Defined Table-Generating Functions) 用来解决 输入一行输出多行(On-to-many maping) 的需求。
lateral view用于和split、explode等UDTF一起使用的,能将一行数据拆分成多行数据,在此基础上可以对拆分的数据进行聚合,lateral view首先为原始表的每行调用UDTF,UDTF会把一行拆分成一行或者多行,lateral view把结果组合,产生一个支持别名表的虚拟表。下例中的 lateral view explode(subdinates) adTable as aa; 虚拟表adTable的别名为aa
explode(ARRAY) 列表中的每个元素生成一行
explode(MAP) map中每个key-value对,生成一行,key为一列,value为一列
| CREATE TABLE `employees`( |
| `name` string, |
| `salary` float, |
| `subdinates` array<string>, |
| `deducation` map<string,float>, |
| `address` struct<street:string,city:string,state:string,zip:int>) |
| ROW FORMAT SERDE |
| 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' |
| STORED AS INPUTFORMAT |
| 'org.apache.hadoop.mapred.TextInputFormat' |
| OUTPUTFORMAT |
| 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' |
| LOCATION |
| 'hdfs://localhost:9000/user/hive/warehouse/gamedw.db/employees' |
| TBLPROPERTIES ( |
| 'creator'='tianyongtao', |
| 'last_modified_by'='root', |
| 'last_modified_time'='1521447397', |
| 'numFiles'='0', |
| 'numRows'='0', |
| 'rawDataSize'='0', |
| 'totalSize'='0', |
| 'transient_lastDdlTime'='1521447397') |
+----------------------------------------------------------------------+--+
Array类型字段的处理
0: jdbc:hive2://192.168.53.122:10000/default> select name,subdinates from employees;
+---------------+-------------------------+--+
| name | subdinates |
+---------------+-------------------------+--+
| tianyongtao | ["wang","ZHANG","LIU"] |
| wangyangming | ["ma","zhong"] |
+---------------+-------------------------+--+
2 rows selected (0.301 seconds)
0: jdbc:hive2://192.168.53.122:10000/default> select name,aa from employees lateral view explode(subdinates) adTable as aa;
+---------------+--------+--+
| name | aa |
+---------------+--------+--+
| tianyongtao | wang |
| tianyongtao | ZHANG |
| tianyongtao | LIU |
| wangyangming | ma |
| wangyangming | zhong |
+---------------+--------+--+
5 rows selected (0.312 seconds)
Map类型字段的处理
0: jdbc:hive2://192.168.53.122:10000/default> select deducation from employees;
+---------------------------------+--+
| deducation |
+---------------------------------+--+
| {"aaa":10.0,"bb":5.0,"CC":8.0} |
| {"aaa":6.0,"bb":12.0} |
+---------------------------------+--+
2 rows selected (0.315 seconds)
0: jdbc:hive2://192.168.53.122:10000/default> select explode(deducation) as (aa,bb) from employees;
+------+-------+--+
| aa | bb |
+------+-------+--+
| aaa | 10.0 |
| bb | 5.0 |
| CC | 8.0 |
| aaa | 6.0 |
| bb | 12.0 |
+------+-------+--+
5 rows selected (0.314 seconds)
0: jdbc:hive2://192.168.53.122:10000/default> select name,aa,bb from employees lateral view explode(deducation) mtable as aa,bb;
+---------------+------+-------+--+
| name | aa | bb |
+---------------+------+-------+--+
| tianyongtao | aaa | 10.0 |
| tianyongtao | bb | 5.0 |
| tianyongtao | CC | 8.0 |
| wangyangming | aaa | 6.0 |
| wangyangming | bb | 12.0 |
+---------------+------+-------+--+
5 rows selected (0.347 seconds)
0: jdbc:hive2://192.168.53.122:10000/default> select name,aa,bb,cc from employees lateral view explode(deducation) mtable as aa,bb lateral view explode(subdinates) adTable as cc;
+---------------+------+-------+--------+--+
| name | aa | bb | cc |
+---------------+------+-------+--------+--+
| tianyongtao | aaa | 10.0 | wang |
| tianyongtao | aaa | 10.0 | ZHANG |
| tianyongtao | aaa | 10.0 | LIU |
| tianyongtao | bb | 5.0 | wang |
| tianyongtao | bb | 5.0 | ZHANG |
| tianyongtao | bb | 5.0 | LIU |
| tianyongtao | CC | 8.0 | wang |
| tianyongtao | CC | 8.0 | ZHANG |
| tianyongtao | CC | 8.0 | LIU |
| wangyangming | aaa | 6.0 | ma |
| wangyangming | aaa | 6.0 | zhong |
| wangyangming | bb | 12.0 | ma |
| wangyangming | bb | 12.0 | zhong |
+---------------+------+-------+--------+--+
13 rows selected (0.305 seconds)
结构体类型字段:
0: jdbc:hive2://192.168.53.122:10000/default> select name,address.street,address.city,address.state from employees;
+---------------+---------+-----------+----------+--+
| name | street | city | state |
+---------------+---------+-----------+----------+--+
| tianyongtao | HENAN | LUOHE | LINYING |
| wangyangming | hunan | changsha | NULL |
+---------------+---------+-----------+----------+--+
2 rows selected (0.309 seconds)
collect_set():该函数的作用是将某字段的值进行去重汇总,产生Array类型字段
0: jdbc:hive2://192.168.53.122:10000/default> select * from cust;
+------------------+-----------+----------------+--+
| cust.custname | cust.sex | cust.nianling |
+------------------+-----------+----------------+--+
| tianyt_touch100 | 1 | 50 |
| wangwu | 1 | 85 |
| zhangsan | 1 | 20 |
| liuqin | 0 | 56 |
| wangwu | 0 | 47 |
| liuyang | 1 | 32 |
| hello | 0 | 100 |
| mahuateng | 1 | 1001 |
| tianyt_touch100 | 1 | 50 |
| wangwu | 1 | 85 |
| zhangsan | 1 | 20 |
| liuqin | 0 | 56 |
| wangwu | 0 | 47 |
| nihao | 1 | 5 |
| liuyang | 1 | 32 |
| hello | 0 | 100 |
| mahuateng | 1 | 1001 |
| nihao | 1 | 5 |
+------------------+-----------+----------------+--+
scala> hcon.sql("select sex,collect_set(nianling) from gamedw.cust group by sex").show
+---+---------------------+
|sex|collect_set(nianling)|
+---+---------------------+
| 1| [85, 5, 20, 50, 3...|
| 0| [100, 56, 47]|
+---+---------------------+
0: jdbc:hive2://192.168.53.122:10000/default> select * from cityinfo;
+----------------+---------------------------------------------------------------+--+
| cityinfo.city | cityinfo.districts |
+----------------+---------------------------------------------------------------+--+
| shenzhen | longhua,futian,baoan,longgang,dapeng,guangming,nanshan,luohu |
| qingdao | shinan,lichang,jimo,jiaozhou,huangdao,laoshan |
+----------------+---------------------------------------------------------------+--+
0: jdbc:hive2://192.168.53.122:10000/default> select city,area from cityinfo lateral view explode(split(districts,",")) areatable as area;
+-----------+------------+--+
| city | area |
+-----------+------------+--+
| shenzhen | longhua |
| shenzhen | futian |
| shenzhen | baoan |
| shenzhen | longgang |
| shenzhen | dapeng |
| shenzhen | guangming |
| shenzhen | nanshan |
| shenzhen | luohu |
| qingdao | shinan |
| qingdao | lichang |
| qingdao | jimo |
| qingdao | jiaozhou |
| qingdao | huangdao |
| qingdao | laoshan |
+-----------+------------+--+
14 rows selected (0.479 seconds)
已知数据求截止当前月的最大值与截止当前月份的和:
scala> hcon.sql("select * from gamedw.visists order by custid,monthid").show
+------+-------+-----+
|custid|monthid|times|
+------+-------+-----+
| 1| 201801| 25|
| 1| 201801| 10|
| 1| 201802| 35|
| 1| 201802| 7|
| 1| 201803| 52|
| 1| 201805| 6|
| 2| 201801| 32|
| 2| 201801| 1|
| 2| 201802| 10|
| 2| 201802| 18|
| 2| 201803| 91|
| 2| 201804| 6|
| 2| 201804| 4|
| 2| 201805| 31|
+------+-------+-----+
scala> hcon.sql("select custid,b.monthid,sum(times),max(times) from gamedw.visists a inner join (select distinct monthid from gamedw.visists) b on a.monthid<=b.monthid group by custid,b.monthid order by custid,b.monthid").show
+------+-------+----------+----------+
|custid|monthid|sum(times)|max(times)|
+------+-------+----------+----------+
| 1| 201801| 35| 25|
| 1| 201802| 77| 35|
| 1| 201803| 129| 52|
| 1| 201804| 129| 52|
| 1| 201805| 135| 52|
| 2| 201801| 33| 32|
| 2| 201802| 61| 32|
| 2| 201803| 152| 91|
| 2| 201804| 162| 91|
| 2| 201805| 193| 91|
+------+-------+----------+----------+
关联的时候小表写在左边
hive 函数的更多相关文章
- hive函数参考手册
hive函数参考手册 原文见:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF 1.内置运算符1.1关系运算符 运 ...
- Hive函数以及自定义函数讲解(UDF)
Hive函数介绍HQL内嵌函数只有195个函数(包括操作符,使用命令show functions查看),基本能够胜任基本的hive开发,但是当有较为复杂的需求的时候,可能需要进行定制的HQL函数开发. ...
- 大数据入门第十一天——hive详解(三)hive函数
一.hive函数 1.内置运算符与内置函数 函数分类: 查看函数信息: DESC FUNCTION concat; 常用的分析函数之rank() row_number(),参考:https://www ...
- Hadoop生态圈-Hive函数
Hadoop生态圈-Hive函数 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任.
- Hive(四)hive函数与hive shell
一.hive函数 1.hive内置函数 (1)内容较多,见< Hive 官方文档> https://cwiki.apache.org/confluence/displ ...
- Hive入门笔记---2.hive函数大全
Hive函数大全–完整版 现在虽然有很多SQL ON Hadoop的解决方案,像Spark SQL.Impala.Presto等等,但就目前来看,在基于Hadoop的大数据分析平台.数据仓库中,Hiv ...
- 【Hive五】Hive函数UDF
Hive函数 系统自带的函数 查看系统自带的函数 查看系统自带的函数 show functions; 显示自带的函数的用法 desc function upper; 详细显示自带的函数的用法 desc ...
- Hive函数大全-完整版
现在虽然有很多SQL ON Hadoop的解决方案,像Spark SQL.Impala.Presto等等,但就目前来看,在基于Hadoop的大数据分析平台.数据仓库中,Hive仍然是不可替代的角色.尽 ...
- 【翻译】Flink Table Api & SQL — Hive —— Hive 函数
本文翻译自官网:Hive Functions https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/hive/h ...
- hive函数之数学函数
hive函数之数学函数 round(double d)--返回double型d的近似值(四舍五入),返回bigint型: round(double d,int n)--返回保留double型d的n ...
随机推荐
- Docker-compose ports和expose的区别
docker-compose中有两种方式可以暴露容器的端口:ports和expose. 1 ports ports暴露容器端口到主机的任意端口或指定端口,用法: ports: - "80:8 ...
- oracle--合并行数据
select to_char(wmsys.wm_concat(patname)) as WaitPatientName from (SELECT * FROM (SELECT ROW_NUMBER() ...
- RedirectStandardOutput
当Process将文本写入其标准流,通常在控制台上显示文本. 通过设置RedirectStandardOutput到true重定向StandardOutput流,可以操作或取消进程的输出. 例如,可以 ...
- C++标准模板库(STL)介绍:set的基本用法
1.元素的方向遍历 使用反向迭代器reverse_iterator可以反向遍历集合,输出集合元素的反向排序结果.它需要用到rbegin()和rend()两个方法,它们分别给出了反向遍历的开始位置和结束 ...
- 阿里云ECS专有网络下安装flannel注意事项
参照文章http://www.cnblogs.com/lyzw/p/6016789.html在两台阿里云ECS主机安装与配置flannel,在专有网络下两台主机只能通过公网ip连通,所以flannel ...
- Oracle随机排序函数和行数字段
随机排序函数dbms_random.value()用法:select * from tablename order by dbms_random.value() 行数字段rownum用法:select ...
- PAT 乙级 1009 说反话 (20) C++版
1009. 说反话 (20) 时间限制 400 ms 内存限制 65536 kB 代码长度限制 8000 B 判题程序 Standard 作者 CHEN, Yue 给定一句英语,要求你编写程序,将句中 ...
- API网关之Kong网关简介
1. Kong简介 那么,Kong是一个什么东东呢?它是一个开源的API网关,或者你可以认为它是一个针对API的一个管理工具.你可以在那些上游service之上,额外去实现一些功能.Kong是开源的, ...
- IDC:时钟系统
ylbtech-IDC:时钟系统 主要应用于要求有统一时间进行生产,调度的单位如:电力,机场.轻轨.地铁.体育场馆.酒店.医院.部队.油田.水利工程等领域.大区域时钟系统主要由母钟和多台子钟构成. 1 ...
- JVM底层又是如何实现synchronized的【转载】
目前在Java中存在两种锁机制:synchronized和Lock,Lock接口及其实现类是JDK5增加的内容,其作者是大名鼎鼎的并发专家Doug Lea.本文并不比较synchronized与Loc ...