hive 函数
collect_set(x) 列转行函数---没有重复, 组装多列的数据的结构体
collect_list(x) 列转行函数---可以有重复,组装多列的数据的结构体
concat_ws 拼接函数, 用于多列转成同一行字段后,间隔符
UDF(User-Defined-Function) 用户定义(普通)函数,只对单行数值产生作用;
UDAF(User- Defined Aggregation Funcation)用户定义聚合函数,可对多行数据产生作用;等同与SQL中常用的SUM(),AVG(),也是聚合函数;
UDTF(User-Defined Table-Generating Functions) 用来解决 输入一行输出多行(On-to-many maping) 的需求。
lateral view用于和split、explode等UDTF一起使用的,能将一行数据拆分成多行数据,在此基础上可以对拆分的数据进行聚合,lateral view首先为原始表的每行调用UDTF,UDTF会把一行拆分成一行或者多行,lateral view把结果组合,产生一个支持别名表的虚拟表。下例中的 lateral view explode(subdinates) adTable as aa; 虚拟表adTable的别名为aa
explode(ARRAY) 列表中的每个元素生成一行
explode(MAP) map中每个key-value对,生成一行,key为一列,value为一列
| CREATE TABLE `employees`( |
| `name` string, |
| `salary` float, |
| `subdinates` array<string>, |
| `deducation` map<string,float>, |
| `address` struct<street:string,city:string,state:string,zip:int>) |
| ROW FORMAT SERDE |
| 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' |
| STORED AS INPUTFORMAT |
| 'org.apache.hadoop.mapred.TextInputFormat' |
| OUTPUTFORMAT |
| 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' |
| LOCATION |
| 'hdfs://localhost:9000/user/hive/warehouse/gamedw.db/employees' |
| TBLPROPERTIES ( |
| 'creator'='tianyongtao', |
| 'last_modified_by'='root', |
| 'last_modified_time'='1521447397', |
| 'numFiles'='0', |
| 'numRows'='0', |
| 'rawDataSize'='0', |
| 'totalSize'='0', |
| 'transient_lastDdlTime'='1521447397') |
+----------------------------------------------------------------------+--+
Array类型字段的处理
0: jdbc:hive2://192.168.53.122:10000/default> select name,subdinates from employees;
+---------------+-------------------------+--+
| name | subdinates |
+---------------+-------------------------+--+
| tianyongtao | ["wang","ZHANG","LIU"] |
| wangyangming | ["ma","zhong"] |
+---------------+-------------------------+--+
2 rows selected (0.301 seconds)
0: jdbc:hive2://192.168.53.122:10000/default> select name,aa from employees lateral view explode(subdinates) adTable as aa;
+---------------+--------+--+
| name | aa |
+---------------+--------+--+
| tianyongtao | wang |
| tianyongtao | ZHANG |
| tianyongtao | LIU |
| wangyangming | ma |
| wangyangming | zhong |
+---------------+--------+--+
5 rows selected (0.312 seconds)
Map类型字段的处理
0: jdbc:hive2://192.168.53.122:10000/default> select deducation from employees;
+---------------------------------+--+
| deducation |
+---------------------------------+--+
| {"aaa":10.0,"bb":5.0,"CC":8.0} |
| {"aaa":6.0,"bb":12.0} |
+---------------------------------+--+
2 rows selected (0.315 seconds)
0: jdbc:hive2://192.168.53.122:10000/default> select explode(deducation) as (aa,bb) from employees;
+------+-------+--+
| aa | bb |
+------+-------+--+
| aaa | 10.0 |
| bb | 5.0 |
| CC | 8.0 |
| aaa | 6.0 |
| bb | 12.0 |
+------+-------+--+
5 rows selected (0.314 seconds)
0: jdbc:hive2://192.168.53.122:10000/default> select name,aa,bb from employees lateral view explode(deducation) mtable as aa,bb;
+---------------+------+-------+--+
| name | aa | bb |
+---------------+------+-------+--+
| tianyongtao | aaa | 10.0 |
| tianyongtao | bb | 5.0 |
| tianyongtao | CC | 8.0 |
| wangyangming | aaa | 6.0 |
| wangyangming | bb | 12.0 |
+---------------+------+-------+--+
5 rows selected (0.347 seconds)
0: jdbc:hive2://192.168.53.122:10000/default> select name,aa,bb,cc from employees lateral view explode(deducation) mtable as aa,bb lateral view explode(subdinates) adTable as cc;
+---------------+------+-------+--------+--+
| name | aa | bb | cc |
+---------------+------+-------+--------+--+
| tianyongtao | aaa | 10.0 | wang |
| tianyongtao | aaa | 10.0 | ZHANG |
| tianyongtao | aaa | 10.0 | LIU |
| tianyongtao | bb | 5.0 | wang |
| tianyongtao | bb | 5.0 | ZHANG |
| tianyongtao | bb | 5.0 | LIU |
| tianyongtao | CC | 8.0 | wang |
| tianyongtao | CC | 8.0 | ZHANG |
| tianyongtao | CC | 8.0 | LIU |
| wangyangming | aaa | 6.0 | ma |
| wangyangming | aaa | 6.0 | zhong |
| wangyangming | bb | 12.0 | ma |
| wangyangming | bb | 12.0 | zhong |
+---------------+------+-------+--------+--+
13 rows selected (0.305 seconds)
结构体类型字段:
0: jdbc:hive2://192.168.53.122:10000/default> select name,address.street,address.city,address.state from employees;
+---------------+---------+-----------+----------+--+
| name | street | city | state |
+---------------+---------+-----------+----------+--+
| tianyongtao | HENAN | LUOHE | LINYING |
| wangyangming | hunan | changsha | NULL |
+---------------+---------+-----------+----------+--+
2 rows selected (0.309 seconds)
collect_set():该函数的作用是将某字段的值进行去重汇总,产生Array类型字段
0: jdbc:hive2://192.168.53.122:10000/default> select * from cust;
+------------------+-----------+----------------+--+
| cust.custname | cust.sex | cust.nianling |
+------------------+-----------+----------------+--+
| tianyt_touch100 | 1 | 50 |
| wangwu | 1 | 85 |
| zhangsan | 1 | 20 |
| liuqin | 0 | 56 |
| wangwu | 0 | 47 |
| liuyang | 1 | 32 |
| hello | 0 | 100 |
| mahuateng | 1 | 1001 |
| tianyt_touch100 | 1 | 50 |
| wangwu | 1 | 85 |
| zhangsan | 1 | 20 |
| liuqin | 0 | 56 |
| wangwu | 0 | 47 |
| nihao | 1 | 5 |
| liuyang | 1 | 32 |
| hello | 0 | 100 |
| mahuateng | 1 | 1001 |
| nihao | 1 | 5 |
+------------------+-----------+----------------+--+
scala> hcon.sql("select sex,collect_set(nianling) from gamedw.cust group by sex").show
+---+---------------------+
|sex|collect_set(nianling)|
+---+---------------------+
| 1| [85, 5, 20, 50, 3...|
| 0| [100, 56, 47]|
+---+---------------------+
0: jdbc:hive2://192.168.53.122:10000/default> select * from cityinfo;
+----------------+---------------------------------------------------------------+--+
| cityinfo.city | cityinfo.districts |
+----------------+---------------------------------------------------------------+--+
| shenzhen | longhua,futian,baoan,longgang,dapeng,guangming,nanshan,luohu |
| qingdao | shinan,lichang,jimo,jiaozhou,huangdao,laoshan |
+----------------+---------------------------------------------------------------+--+
0: jdbc:hive2://192.168.53.122:10000/default> select city,area from cityinfo lateral view explode(split(districts,",")) areatable as area;
+-----------+------------+--+
| city | area |
+-----------+------------+--+
| shenzhen | longhua |
| shenzhen | futian |
| shenzhen | baoan |
| shenzhen | longgang |
| shenzhen | dapeng |
| shenzhen | guangming |
| shenzhen | nanshan |
| shenzhen | luohu |
| qingdao | shinan |
| qingdao | lichang |
| qingdao | jimo |
| qingdao | jiaozhou |
| qingdao | huangdao |
| qingdao | laoshan |
+-----------+------------+--+
14 rows selected (0.479 seconds)
已知数据求截止当前月的最大值与截止当前月份的和:
scala> hcon.sql("select * from gamedw.visists order by custid,monthid").show
+------+-------+-----+
|custid|monthid|times|
+------+-------+-----+
| 1| 201801| 25|
| 1| 201801| 10|
| 1| 201802| 35|
| 1| 201802| 7|
| 1| 201803| 52|
| 1| 201805| 6|
| 2| 201801| 32|
| 2| 201801| 1|
| 2| 201802| 10|
| 2| 201802| 18|
| 2| 201803| 91|
| 2| 201804| 6|
| 2| 201804| 4|
| 2| 201805| 31|
+------+-------+-----+
scala> hcon.sql("select custid,b.monthid,sum(times),max(times) from gamedw.visists a inner join (select distinct monthid from gamedw.visists) b on a.monthid<=b.monthid group by custid,b.monthid order by custid,b.monthid").show
+------+-------+----------+----------+
|custid|monthid|sum(times)|max(times)|
+------+-------+----------+----------+
| 1| 201801| 35| 25|
| 1| 201802| 77| 35|
| 1| 201803| 129| 52|
| 1| 201804| 129| 52|
| 1| 201805| 135| 52|
| 2| 201801| 33| 32|
| 2| 201802| 61| 32|
| 2| 201803| 152| 91|
| 2| 201804| 162| 91|
| 2| 201805| 193| 91|
+------+-------+----------+----------+
关联的时候小表写在左边
hive 函数的更多相关文章
- hive函数参考手册
hive函数参考手册 原文见:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF 1.内置运算符1.1关系运算符 运 ...
- Hive函数以及自定义函数讲解(UDF)
Hive函数介绍HQL内嵌函数只有195个函数(包括操作符,使用命令show functions查看),基本能够胜任基本的hive开发,但是当有较为复杂的需求的时候,可能需要进行定制的HQL函数开发. ...
- 大数据入门第十一天——hive详解(三)hive函数
一.hive函数 1.内置运算符与内置函数 函数分类: 查看函数信息: DESC FUNCTION concat; 常用的分析函数之rank() row_number(),参考:https://www ...
- Hadoop生态圈-Hive函数
Hadoop生态圈-Hive函数 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任.
- Hive(四)hive函数与hive shell
一.hive函数 1.hive内置函数 (1)内容较多,见< Hive 官方文档> https://cwiki.apache.org/confluence/displ ...
- Hive入门笔记---2.hive函数大全
Hive函数大全–完整版 现在虽然有很多SQL ON Hadoop的解决方案,像Spark SQL.Impala.Presto等等,但就目前来看,在基于Hadoop的大数据分析平台.数据仓库中,Hiv ...
- 【Hive五】Hive函数UDF
Hive函数 系统自带的函数 查看系统自带的函数 查看系统自带的函数 show functions; 显示自带的函数的用法 desc function upper; 详细显示自带的函数的用法 desc ...
- Hive函数大全-完整版
现在虽然有很多SQL ON Hadoop的解决方案,像Spark SQL.Impala.Presto等等,但就目前来看,在基于Hadoop的大数据分析平台.数据仓库中,Hive仍然是不可替代的角色.尽 ...
- 【翻译】Flink Table Api & SQL — Hive —— Hive 函数
本文翻译自官网:Hive Functions https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/hive/h ...
- hive函数之数学函数
hive函数之数学函数 round(double d)--返回double型d的近似值(四舍五入),返回bigint型: round(double d,int n)--返回保留double型d的n ...
随机推荐
- Windows Azure Virtual Machine (36) 扩展Azure ARM VM的磁盘大小
<Windows Azure Platform 系列文章目录> 在默认情况下,Azure ARM VM的操作系统磁盘(OS Disk),容量为: (1)Windows VM OS Disk ...
- json server的简单使用(附:使用nodejs快速搭建本地服务器)
作为前端开发人员,经常需要模拟后台数据,我们称之为mock.通常的方式为自己搭建一个服务器,返回我们想要的数据.json server 作为工具,因为它足够简单,写少量数据,即可使用. 安装 首先需要 ...
- PHP 汉字数字互转(100以内)| 汉字转数字 | 数字转汉字
<?php function numDatabase(){ $numarr =array(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,2 ...
- a:hover应用精粹
原本想把题目叫做“纯CSS相册2”的,但在实现过程中试验了许多东西,干脆全部写出来分享了.大家知道,能兼容IE6的具有动态切换能力的CSS属性也只有hover伪类了,但hover伪类在IE仅对链接生效 ...
- vmware虚拟机三种网络模式的区别
首先安装了VMware,在其中安装了Ubuntu系统,正常启动之后开始考虑怎么才能够让主机和虚拟机实现网络互连并且由主机向虚拟机发送文件,通过在网上查阅相关资料,记录学习笔记如下. 学习参考资料: l ...
- Android开发之adb,$Sqlite篇
一. 操作系统: 1. linux操作系统: linux操作系统其实就是Linux内核,Linux内核[kernel]是整个操作系统的最底层,它负责整个硬件的驱动,以及提供各种系统所需的核心功能,包括 ...
- bzoj 4866: [Ynoi2017]由乃的商场之旅
设第i个字母的权值为1<<i,则一个可重集合可以重排为回文串,当且仅当这个集合的异或和x满足x==x&-x,用莫队维护区间内有多少对异或前缀和,异或后满足x==x&-x,这 ...
- php 安装 phpredis 扩展
1. git clone https://github.com/nicolasff/phpredis2. 首先git clone 项目到本地,切换到phpredis目录下 phpize ./confi ...
- PAT 乙级 1060 爱丁顿数(25) C++版
1060. 爱丁顿数(25) 时间限制 250 ms 内存限制 65536 kB 代码长度限制 8000 B 判题程序 Standard 作者 CHEN, Yue 英国天文学家爱丁顿很喜欢骑车.据说他 ...
- Kubernetes报错Failed to get system container stats for "/system.slice/kubelet.service"
tail -f /var/log/message Nov 14 07:12:51 image kubelet: E1114 07:12:51.627782 3007 summary.go:92] ...