hive中的join
建表
: jdbc:hive2://localhost:10000> create database myjoin;
No rows affected (3.78 seconds)
: jdbc:hive2://localhost:10000> use myjoin;
No rows affected (0.419 seconds)
: jdbc:hive2://localhost:10000> create table a(id int,name string) row format delimited fields terminated by ',';
No rows affected (2.08 seconds)
: jdbc:hive2://localhost:10000> create table b(id int,name string) row format delimited fields terminated by ',';
: jdbc:hive2://localhost:10000> select * from a
: jdbc:hive2://localhost:10000> ;
+-------+---------+--+
| a.id | a.name |
+-------+---------+--+
| | qq |
| | ww |
| | ee |
| | rr |
| | tt |
| | yy |
| | aa |
| | ss |
| | zz |
+-------+---------+--+
rows selected (1.881 seconds)
: jdbc:hive2://localhost:10000> select * from b;
+-------+---------+--+
| b.id | b.name |
+-------+---------+--+
| | qq |
| | |
| | dd |
| | rr |
| | fgf |
| | as |
| | |
| | ww |
| | |
| | |
| | |
| | 4r |
+-------+---------+--+
rows selected (0.147 seconds)
inner join 的结果,也就是join
0: jdbc:hive2://localhost:10000> select a.*,b.* from a inner join b on a.id = b.id;
INFO : Execution completed successfully
INFO : MapredLocal task succeeded
INFO : Number of reduce tasks is set to since there's no reduce operator
INFO : number of splits:
INFO : Submitting tokens for job: job_1496277833427_0007
INFO : The url to track the job: http://mini2:8088/proxy/application_1496277833427_0007/
INFO : Starting Job = job_1496277833427_0007, Tracking URL = http://mini2:8088/proxy/application_1496277833427_0007/
INFO : Kill Command = /home/hadoop/xxxxxx/hadoop265/bin/hadoop job -kill job_1496277833427_0007
INFO : Hadoop job information for Stage-: number of mappers: ; number of reducers:
INFO : -- ::, Stage- map = %, reduce = %
INFO : -- ::, Stage- map = %, reduce = %, Cumulative CPU 5.05 sec
INFO : MapReduce Total cumulative CPU time: seconds msec
INFO : Ended Job = job_1496277833427_0007
+-------+---------+-------+---------+--+
| a.id | a.name | b.id | b.name |
+-------+---------+-------+---------+--+
| | qq | | qq |
| | ww | | |
| | ee | | dd |
| | rr | | rr |
| | yy | | fgf |
| | aa | | as |
+-------+---------+-------+---------+--+
full outer join ,两边的数据都会出来只不过on条件没有对应上的一端会显示为null
: jdbc:hive2://localhost:10000> select a.*,b.* from a full outer join b on a.id = b.id;
INFO : Number of reduce tasks not specified. Estimated from input data size:
INFO : In order to change the average load for a reducer (in bytes):
INFO : set hive.exec.reducers.bytes.per.reducer=<number>
INFO : In order to limit the maximum number of reducers:
INFO : set hive.exec.reducers.max=<number>
INFO : In order to set a constant number of reducers:
INFO : set mapreduce.job.reduces=<number>
INFO : number of splits:
INFO : Submitting tokens for job: job_1496277833427_0008
INFO : The url to track the job: http://mini2:8088/proxy/application_1496277833427_0008/
INFO : Starting Job = job_1496277833427_0008, Tracking URL = http://mini2:8088/proxy/application_1496277833427_0008/
INFO : Kill Command = /home/hadoop/xxxxxx/hadoop265/bin/hadoop job -kill job_1496277833427_0008
INFO : Hadoop job information for Stage-: number of mappers: ; number of reducers:
INFO : -- ::, Stage- map = %, reduce = %
INFO : -- ::, Stage- map = %, reduce = %
INFO : -- ::, Stage- map = %, reduce = %
INFO : -- ::, Stage- map = %, reduce = %, Cumulative CPU 6.52 sec
INFO : -- ::, Stage- map = %, reduce = %, Cumulative CPU 9.17 sec
INFO : -- ::, Stage- map = %, reduce = %, Cumulative CPU 12.65 sec
INFO : MapReduce Total cumulative CPU time: seconds msec
INFO : Ended Job = job_1496277833427_0008
+-------+---------+-------+---------+--+
| a.id | a.name | b.id | b.name |
+-------+---------+-------+---------+--+
| | qq | | qq |
| | ww | | |
| | ee | | dd |
| | rr | | rr |
| | tt | NULL | NULL |
| | yy | | fgf |
| | aa | | as |
| | ss | NULL | NULL |
| NULL | NULL | | |
| | zz | NULL | NULL |
| NULL | NULL | | |
| NULL | NULL | | ww |
| NULL | NULL | | |
| NULL | NULL | | 4r |
| NULL | NULL | | |
+-------+---------+-------+---------+--+
rows selected (371.304 seconds)
select a.*from a left semi join b on a.id = b.id; -- from 前不能写b.* 否则会报错( Error while compiling statement: FAILED: SemanticException [Error 10009]: Line 1:11 Invalid table alias 'b' (state=42000,code=10009))
替代exist in 的用法,返回值只是inner join 中左边的一般,
+-------+---------+--+
| a.id | a.name |
+-------+---------+--+
| | qq |
| | ww |
| | ee |
| | rr |
| | yy |
| | aa |
+-------+---------+--+
没有 right semi join
left semi join 是exist in 的高效实现,比inner join 效率高
hive中的join的更多相关文章
- hive中left join、left outer join和left semi join的区别
先说结论,再举例子. hive中,left join与left outer join等价. left semi join与left outer join的区别:left semi join相当 ...
- SQL join中级篇--hive中 mapreduce join方法分析
1. 概述. 本文主要介绍了mapreduce框架上如何实现两表JOIN. 2. 常见的join方法介绍 假设要进行join的数据分别来自File1和File2. 2.1 reduce side jo ...
- 关于Hive中的join和left join的理解
一.join与left join的全称 JOIN是INNER JOIN的简写,LEFT JOIN是LEFT OUTER JOIN的简写. 二.join与left join的应用场景 JOIN一般用于A ...
- Hive中Join的原理和机制
转自:http://lxw1234.com/archives/2015/06/313.htm 笼统的说,Hive中的Join可分为Common Join(Reduce阶段完成join)和Map Joi ...
- Hive中Join的类型和用法
关键字:Hive Join.Hive LEFT|RIGTH|FULL OUTER JOIN.Hive LEFT SEMI JOIN.Hive Cross Join Hive中除了支持和传统数据库中一样 ...
- Hive中JOIN操作
1. 只支持相等JOIN. 2. 多表连接当使用不同的列进行JOIN时,会产生多个MR作业. 3. 最后的表的数据是从流中读取,而前面的会在内存中缓存,因此最好把最大的表放在最后. SELECT /* ...
- hive 配置文件以及join中null值的处理
一.Hive的參数设置 1. 三种设定方式:配置文件 · 用户自己定义配置文件:$HIVE_CONF_DIR/hive-site.xml · 默认配置文件:$HIVE_CONF_DIR/hi ...
- hive中与hbase外部表join时内存溢出(hive处理mapjoin的优化器机制)
与hbase外部表(wizad_mdm_main)进行join出现问题: CREATE TABLE wizad_mdm_dev_lmj_edition_result as select * from ...
- hive中的子查询改join操作(转)
这些子查询在oracle和mysql等数据库中都能执行,但是在hive中却不支持,但是我们可以把这些查询语句改为join操作: -- 1.子查询 select * from A a where a.u ...
随机推荐
- noip2017集训测试赛(三) Problem B: mex [补档]
Description 给你一个无限长的数组,初始的时候都为0,有3种操作: 操作1是把给定区间[l,r][l,r] 设为1, 操作2是把给定区间[l,r][l,r] 设为0, 操作3把给定区间[l, ...
- SweetAlert和MBProgressHUD冲突的解决办法
解决方法 : UIView *view = [UIApplication sharedApplication].keyWindow; NSLog(@" -- &g ...
- VMware虚拟机中为Linux 添加虚拟硬盘(VirtualBox方法类似)
修改1:2014-06-24 11:38:21 Linux添加硬盘是在原来安装的硬盘空间不够或者需要使用其他硬盘上的东西时候的解决办法,因为大多数初学者习惯使用虚拟机,这里以在Vmware虚拟机中实现 ...
- Swift,下标简化方法的调用
在类(class)当中采用subscript的方法直接用下标 class a{ func b(number:Int)->Int{ return number } subscript(number ...
- Laravel系列教程一:安装及环境配置
免费视频教程地址https://laravist.com/series/laravel-5-basic 最近在SF上面看到越来越多的Laravel相关的问题,而作为一个Laravel的脑残粉,本来打算 ...
- java源码阅读LinkedList
1类签名与注释 public class LinkedList<E> extends AbstractSequentialList<E> implements List< ...
- ZOJ 3526 Weekend Party
Weekend Party Time Limit: 2 Seconds Memory Limit: 65536 KB As the only Oni (a kind of fabulous ...
- 基于Maven的SpringBoot项目实现热部署的两种方式
转载:http://blog.csdn.net/tengxing007/article/details/72675168 前言 JRebel是JavaEE中比较流行的热部署插件,可快速实现热部署,节省 ...
- 深入解析淘宝Diamond之客户端架构
转载:http://blog.csdn.net/u013970991/article/details/52088350 一.什么是Diamond diamond是淘宝内部使用的一个管理持久配置的系统, ...
- 倍福TwinCAT(贝福Beckhoff)常见问题(FAQ)-人机界面如何让文本框可以输入,文本框可以编辑
选中一个文本框,然后在属性中双击输入配置的OnMouseDown事件(也可以是别的事件,但都是通过这种方法) 在左侧点击写变量,然后输入类型改成VisuDialos.Numpad(数字键盘方式), ...