建表

: jdbc:hive2://localhost:10000> create database myjoin;
No rows affected (3.78 seconds)
: jdbc:hive2://localhost:10000> use myjoin;
No rows affected (0.419 seconds)
: jdbc:hive2://localhost:10000> create table a(id int,name string) row format delimited fields terminated by ',';
No rows affected (2.08 seconds)
: jdbc:hive2://localhost:10000> create table b(id int,name string) row format delimited fields terminated by ',';
: jdbc:hive2://localhost:10000> select * from a
: jdbc:hive2://localhost:10000> ;
+-------+---------+--+
| a.id | a.name |
+-------+---------+--+
| | qq |
| | ww |
| | ee |
| | rr |
| | tt |
| | yy |
| | aa |
| | ss |
| | zz |
+-------+---------+--+
rows selected (1.881 seconds)
: jdbc:hive2://localhost:10000> select * from b;
+-------+---------+--+
| b.id | b.name |
+-------+---------+--+
| | qq |
| | |
| | dd |
| | rr |
| | fgf |
| | as |
| | |
| | ww |
| | |
| | |
| | |
| | 4r |
+-------+---------+--+
rows selected (0.147 seconds)
inner join 的结果,也就是join
0: jdbc:hive2://localhost:10000> select a.*,b.* from a inner join b on a.id = b.id;
INFO : Execution completed successfully
INFO : MapredLocal task succeeded
INFO : Number of reduce tasks is set to since there's no reduce operator
INFO : number of splits:
INFO : Submitting tokens for job: job_1496277833427_0007
INFO : The url to track the job: http://mini2:8088/proxy/application_1496277833427_0007/
INFO : Starting Job = job_1496277833427_0007, Tracking URL = http://mini2:8088/proxy/application_1496277833427_0007/
INFO : Kill Command = /home/hadoop/xxxxxx/hadoop265/bin/hadoop job -kill job_1496277833427_0007
INFO : Hadoop job information for Stage-: number of mappers: ; number of reducers:
INFO : -- ::, Stage- map = %, reduce = %
INFO : -- ::, Stage- map = %, reduce = %, Cumulative CPU 5.05 sec
INFO : MapReduce Total cumulative CPU time: seconds msec
INFO : Ended Job = job_1496277833427_0007
+-------+---------+-------+---------+--+
| a.id | a.name | b.id | b.name |
+-------+---------+-------+---------+--+
| | qq | | qq |
| | ww | | |
| | ee | | dd |
| | rr | | rr |
| | yy | | fgf |
| | aa | | as |
+-------+---------+-------+---------+--+

full outer join ,两边的数据都会出来只不过on条件没有对应上的一端会显示为null

: jdbc:hive2://localhost:10000> select a.*,b.* from a full outer join b on a.id = b.id;
INFO : Number of reduce tasks not specified. Estimated from input data size:
INFO : In order to change the average load for a reducer (in bytes):
INFO : set hive.exec.reducers.bytes.per.reducer=<number>
INFO : In order to limit the maximum number of reducers:
INFO : set hive.exec.reducers.max=<number>
INFO : In order to set a constant number of reducers:
INFO : set mapreduce.job.reduces=<number>
INFO : number of splits:
INFO : Submitting tokens for job: job_1496277833427_0008
INFO : The url to track the job: http://mini2:8088/proxy/application_1496277833427_0008/
INFO : Starting Job = job_1496277833427_0008, Tracking URL = http://mini2:8088/proxy/application_1496277833427_0008/
INFO : Kill Command = /home/hadoop/xxxxxx/hadoop265/bin/hadoop job -kill job_1496277833427_0008
INFO : Hadoop job information for Stage-: number of mappers: ; number of reducers:
INFO : -- ::, Stage- map = %, reduce = %
INFO : -- ::, Stage- map = %, reduce = %
INFO : -- ::, Stage- map = %, reduce = %
INFO : -- ::, Stage- map = %, reduce = %, Cumulative CPU 6.52 sec
INFO : -- ::, Stage- map = %, reduce = %, Cumulative CPU 9.17 sec
INFO : -- ::, Stage- map = %, reduce = %, Cumulative CPU 12.65 sec
INFO : MapReduce Total cumulative CPU time: seconds msec
INFO : Ended Job = job_1496277833427_0008
+-------+---------+-------+---------+--+
| a.id | a.name | b.id | b.name |
+-------+---------+-------+---------+--+
| | qq | | qq |
| | ww | | |
| | ee | | dd |
| | rr | | rr |
| | tt | NULL | NULL |
| | yy | | fgf |
| | aa | | as |
| | ss | NULL | NULL |
| NULL | NULL | | |
| | zz | NULL | NULL |
| NULL | NULL | | |
| NULL | NULL | | ww |
| NULL | NULL | | |
| NULL | NULL | | 4r |
| NULL | NULL | | |
+-------+---------+-------+---------+--+
rows selected (371.304 seconds)

select a.*from a left semi join b on a.id = b.id; -- from 前不能写b.* 否则会报错( Error while compiling statement: FAILED: SemanticException [Error 10009]: Line 1:11 Invalid table alias 'b' (state=42000,code=10009))

替代exist in 的用法,返回值只是inner join 中左边的一般,

+-------+---------+--+
| a.id | a.name |
+-------+---------+--+
| | qq |
| | ww |
| | ee |
| | rr |
| | yy |
| | aa |
+-------+---------+--+

没有 right semi join

left semi join 是exist in 的高效实现,比inner join 效率高

hive中的join的更多相关文章

  1. hive中left join、left outer join和left semi join的区别

    先说结论,再举例子.   hive中,left join与left outer join等价.   left semi join与left outer join的区别:left semi join相当 ...

  2. SQL join中级篇--hive中 mapreduce join方法分析

    1. 概述. 本文主要介绍了mapreduce框架上如何实现两表JOIN. 2. 常见的join方法介绍 假设要进行join的数据分别来自File1和File2. 2.1 reduce side jo ...

  3. 关于Hive中的join和left join的理解

    一.join与left join的全称 JOIN是INNER JOIN的简写,LEFT JOIN是LEFT OUTER JOIN的简写. 二.join与left join的应用场景 JOIN一般用于A ...

  4. Hive中Join的原理和机制

    转自:http://lxw1234.com/archives/2015/06/313.htm 笼统的说,Hive中的Join可分为Common Join(Reduce阶段完成join)和Map Joi ...

  5. Hive中Join的类型和用法

    关键字:Hive Join.Hive LEFT|RIGTH|FULL OUTER JOIN.Hive LEFT SEMI JOIN.Hive Cross Join Hive中除了支持和传统数据库中一样 ...

  6. Hive中JOIN操作

    1. 只支持相等JOIN. 2. 多表连接当使用不同的列进行JOIN时,会产生多个MR作业. 3. 最后的表的数据是从流中读取,而前面的会在内存中缓存,因此最好把最大的表放在最后. SELECT /* ...

  7. hive 配置文件以及join中null值的处理

    一.Hive的參数设置 1.  三种设定方式:配置文件 ·   用户自己定义配置文件:$HIVE_CONF_DIR/hive-site.xml ·   默认配置文件:$HIVE_CONF_DIR/hi ...

  8. hive中与hbase外部表join时内存溢出(hive处理mapjoin的优化器机制)

    与hbase外部表(wizad_mdm_main)进行join出现问题: CREATE TABLE wizad_mdm_dev_lmj_edition_result as select *  from ...

  9. hive中的子查询改join操作(转)

    这些子查询在oracle和mysql等数据库中都能执行,但是在hive中却不支持,但是我们可以把这些查询语句改为join操作: -- 1.子查询 select * from A a where a.u ...

随机推荐

  1. [Hybrid App]--Android混合开发,Android、Js的交互

    AndroidJs通信 *:first-child { margin-top: 0 !important; } body>*:last-child { margin-bottom: 0 !imp ...

  2. React Native使用Navigator组件进行页面导航报this.props....is not a function错误

    在push的时候定义回调函数: this.props.navigator.push({ component: nextVC, title: titleName, passProps: { //回调 g ...

  3. RxJava 1.x 理解-3

    在 RxJava 1.x 理解-1 中,我们说到了RxJava的简单用法,但是这还远远不够,因为 输入的数据 ---> 被监听者(订阅源)对这些数据进行操作,或者执行响应的处理 --> 产 ...

  4. Matlab与神经网络入门

    第一节.神经网络基本原理  1. 人工神经元( Artificial Neuron )模型  人工神经元是神经网络的基本元素,其原理可以用下图表示: 图1. 人工神经元模型 图中x1~xn是从其他神经 ...

  5. 一天干掉一只Monkey计划(一)——基本光照模型及RT后处理 【转】

    http://www.cnblogs.com/Zephyroal/archive/2011/10/10/2206530.html 一天干掉一只Monkey计划(一)——基本光照模型及RT后处理 1, ...

  6. webpack配置:打包第三方类库、第三方类库抽离、watch自动打包、集中拷贝静态资源

    一.打包第三方类库 下面说2种方法: 第一种: 1.引入jQuery,首先安装: npm install --save-dev jquery 2.安装好后,在index.js中引入,用jquery语法 ...

  7. PHP微信墙制作

    微信墙 PHP 注意:由于微信官网不定时会更新,其中模拟登陆以及爬取数据的方式可能会失效,最近这12个月里,就有两次更新导致此功能需要重写. 服务端源码->github地址传送门 思路 其实实现 ...

  8. 【proxy agent资料】

    参考资料: GoAgent使用方法, 2015年最新FQ方法总结:http://www.bianlei.com/we-wanna-see-the-world/ 配置Android支持GAE Proxy ...

  9. 爪哇国新游记之十九----使用Stack检查数字表达式中括号的匹配性

    /** * 辅助类 * 用于记载字符和位置 * */ class CharPos{ char c; int pos; public CharPos(char c,int pos){ this.c=c; ...

  10. Web前端开发实战1:二级下拉式菜单之CSS实现

    二级下拉式菜单在各大学校站点.电商类站点.新闻类站点等大型?站点非经常见,那么它的实现原理是什么呢? 学习了Web前端开发的知识后,我们是能够实现这种功能的.复杂的都是从基础效果上加入做出来的.原理和 ...