建表

: jdbc:hive2://localhost:10000> create database myjoin;
No rows affected (3.78 seconds)
: jdbc:hive2://localhost:10000> use myjoin;
No rows affected (0.419 seconds)
: jdbc:hive2://localhost:10000> create table a(id int,name string) row format delimited fields terminated by ',';
No rows affected (2.08 seconds)
: jdbc:hive2://localhost:10000> create table b(id int,name string) row format delimited fields terminated by ',';
: jdbc:hive2://localhost:10000> select * from a
: jdbc:hive2://localhost:10000> ;
+-------+---------+--+
| a.id | a.name |
+-------+---------+--+
| | qq |
| | ww |
| | ee |
| | rr |
| | tt |
| | yy |
| | aa |
| | ss |
| | zz |
+-------+---------+--+
rows selected (1.881 seconds)
: jdbc:hive2://localhost:10000> select * from b;
+-------+---------+--+
| b.id | b.name |
+-------+---------+--+
| | qq |
| | |
| | dd |
| | rr |
| | fgf |
| | as |
| | |
| | ww |
| | |
| | |
| | |
| | 4r |
+-------+---------+--+
rows selected (0.147 seconds)
inner join 的结果,也就是join
0: jdbc:hive2://localhost:10000> select a.*,b.* from a inner join b on a.id = b.id;
INFO : Execution completed successfully
INFO : MapredLocal task succeeded
INFO : Number of reduce tasks is set to since there's no reduce operator
INFO : number of splits:
INFO : Submitting tokens for job: job_1496277833427_0007
INFO : The url to track the job: http://mini2:8088/proxy/application_1496277833427_0007/
INFO : Starting Job = job_1496277833427_0007, Tracking URL = http://mini2:8088/proxy/application_1496277833427_0007/
INFO : Kill Command = /home/hadoop/xxxxxx/hadoop265/bin/hadoop job -kill job_1496277833427_0007
INFO : Hadoop job information for Stage-: number of mappers: ; number of reducers:
INFO : -- ::, Stage- map = %, reduce = %
INFO : -- ::, Stage- map = %, reduce = %, Cumulative CPU 5.05 sec
INFO : MapReduce Total cumulative CPU time: seconds msec
INFO : Ended Job = job_1496277833427_0007
+-------+---------+-------+---------+--+
| a.id | a.name | b.id | b.name |
+-------+---------+-------+---------+--+
| | qq | | qq |
| | ww | | |
| | ee | | dd |
| | rr | | rr |
| | yy | | fgf |
| | aa | | as |
+-------+---------+-------+---------+--+

full outer join ,两边的数据都会出来只不过on条件没有对应上的一端会显示为null

: jdbc:hive2://localhost:10000> select a.*,b.* from a full outer join b on a.id = b.id;
INFO : Number of reduce tasks not specified. Estimated from input data size:
INFO : In order to change the average load for a reducer (in bytes):
INFO : set hive.exec.reducers.bytes.per.reducer=<number>
INFO : In order to limit the maximum number of reducers:
INFO : set hive.exec.reducers.max=<number>
INFO : In order to set a constant number of reducers:
INFO : set mapreduce.job.reduces=<number>
INFO : number of splits:
INFO : Submitting tokens for job: job_1496277833427_0008
INFO : The url to track the job: http://mini2:8088/proxy/application_1496277833427_0008/
INFO : Starting Job = job_1496277833427_0008, Tracking URL = http://mini2:8088/proxy/application_1496277833427_0008/
INFO : Kill Command = /home/hadoop/xxxxxx/hadoop265/bin/hadoop job -kill job_1496277833427_0008
INFO : Hadoop job information for Stage-: number of mappers: ; number of reducers:
INFO : -- ::, Stage- map = %, reduce = %
INFO : -- ::, Stage- map = %, reduce = %
INFO : -- ::, Stage- map = %, reduce = %
INFO : -- ::, Stage- map = %, reduce = %, Cumulative CPU 6.52 sec
INFO : -- ::, Stage- map = %, reduce = %, Cumulative CPU 9.17 sec
INFO : -- ::, Stage- map = %, reduce = %, Cumulative CPU 12.65 sec
INFO : MapReduce Total cumulative CPU time: seconds msec
INFO : Ended Job = job_1496277833427_0008
+-------+---------+-------+---------+--+
| a.id | a.name | b.id | b.name |
+-------+---------+-------+---------+--+
| | qq | | qq |
| | ww | | |
| | ee | | dd |
| | rr | | rr |
| | tt | NULL | NULL |
| | yy | | fgf |
| | aa | | as |
| | ss | NULL | NULL |
| NULL | NULL | | |
| | zz | NULL | NULL |
| NULL | NULL | | |
| NULL | NULL | | ww |
| NULL | NULL | | |
| NULL | NULL | | 4r |
| NULL | NULL | | |
+-------+---------+-------+---------+--+
rows selected (371.304 seconds)

select a.*from a left semi join b on a.id = b.id; -- from 前不能写b.* 否则会报错( Error while compiling statement: FAILED: SemanticException [Error 10009]: Line 1:11 Invalid table alias 'b' (state=42000,code=10009))

替代exist in 的用法,返回值只是inner join 中左边的一般,

+-------+---------+--+
| a.id | a.name |
+-------+---------+--+
| | qq |
| | ww |
| | ee |
| | rr |
| | yy |
| | aa |
+-------+---------+--+

没有 right semi join

left semi join 是exist in 的高效实现,比inner join 效率高

hive中的join的更多相关文章

  1. hive中left join、left outer join和left semi join的区别

    先说结论,再举例子.   hive中,left join与left outer join等价.   left semi join与left outer join的区别:left semi join相当 ...

  2. SQL join中级篇--hive中 mapreduce join方法分析

    1. 概述. 本文主要介绍了mapreduce框架上如何实现两表JOIN. 2. 常见的join方法介绍 假设要进行join的数据分别来自File1和File2. 2.1 reduce side jo ...

  3. 关于Hive中的join和left join的理解

    一.join与left join的全称 JOIN是INNER JOIN的简写,LEFT JOIN是LEFT OUTER JOIN的简写. 二.join与left join的应用场景 JOIN一般用于A ...

  4. Hive中Join的原理和机制

    转自:http://lxw1234.com/archives/2015/06/313.htm 笼统的说,Hive中的Join可分为Common Join(Reduce阶段完成join)和Map Joi ...

  5. Hive中Join的类型和用法

    关键字:Hive Join.Hive LEFT|RIGTH|FULL OUTER JOIN.Hive LEFT SEMI JOIN.Hive Cross Join Hive中除了支持和传统数据库中一样 ...

  6. Hive中JOIN操作

    1. 只支持相等JOIN. 2. 多表连接当使用不同的列进行JOIN时,会产生多个MR作业. 3. 最后的表的数据是从流中读取,而前面的会在内存中缓存,因此最好把最大的表放在最后. SELECT /* ...

  7. hive 配置文件以及join中null值的处理

    一.Hive的參数设置 1.  三种设定方式:配置文件 ·   用户自己定义配置文件:$HIVE_CONF_DIR/hive-site.xml ·   默认配置文件:$HIVE_CONF_DIR/hi ...

  8. hive中与hbase外部表join时内存溢出(hive处理mapjoin的优化器机制)

    与hbase外部表(wizad_mdm_main)进行join出现问题: CREATE TABLE wizad_mdm_dev_lmj_edition_result as select *  from ...

  9. hive中的子查询改join操作(转)

    这些子查询在oracle和mysql等数据库中都能执行,但是在hive中却不支持,但是我们可以把这些查询语句改为join操作: -- 1.子查询 select * from A a where a.u ...

随机推荐

  1. angular2学习资源汇总

    文档博客书籍类 官方网站: https://angular.io 中文站点: https://angular.cn Victor的blog(Victor是Angular路由模块的作者): https: ...

  2. Hadoop学习入门

    1.hadoop相关术语 HDFS: Hadoop分布式文件系统(HDFS,Hadoop Distributed Filesystem) MapReduce: NameNode: DataNode: ...

  3. identifier is too long 异常处理

    修改了oracle中的表. 报 identifier is too long 错误 我执行的脚本是: ---备份create table MDT_AGREEMENTMANAGEMENT_2018080 ...

  4. 【教训】 form表单提交时,action url中参数无效

    今天提交一个表单,内容参考如下: <form action="add.php?a=123&b=456">     <input type="hi ...

  5. redis_安装及使用

    一.文档资料       1.官方网站:http://redis.io/       2.官方文档:http://redis.io/documentation       3.常用命令文档:http: ...

  6. 定期访问WebLogic Server返回状态的脚本

    在运维过程中,经常要获悉WebLogic Server的状态以便于主动的维护,本文通过weblogic WLST脚本初步设计了一下 脚本大概为2个,一是WLST的py脚本,getStates.py c ...

  7. SpringMVC学习记录(七)--拦截器的使用

    SpringMVC的请求如以下这样的图所看到的: 能够看出全部的请求都要通过Dispatherservlet来接收,然后通过Handlermapping来决定使用哪个控制器,再依据ViewResolv ...

  8. Python - 文本处理模块

    文本处理模块 本文地址: http://blog.csdn.net/caroline_wendy/article/details/27050431 Python的文本处理模块, 使用四种内置库. st ...

  9. centos7 iptables和firewalld学习记录

    centos7系统使用firewalld服务替代了iptables服务,但是依然可以使用iptables来管理内核的netfilter 但其实iptables服务和firewalld服务都不是真正的防 ...

  10. How to simplify a PHP code with the help of the façade pattern?

    原文:https://phpenthusiast.com/blog/simplify-your-php-code-with-facade-class ------------------------- ...