建表

: jdbc:hive2://localhost:10000> create database myjoin;
No rows affected (3.78 seconds)
: jdbc:hive2://localhost:10000> use myjoin;
No rows affected (0.419 seconds)
: jdbc:hive2://localhost:10000> create table a(id int,name string) row format delimited fields terminated by ',';
No rows affected (2.08 seconds)
: jdbc:hive2://localhost:10000> create table b(id int,name string) row format delimited fields terminated by ',';
: jdbc:hive2://localhost:10000> select * from a
: jdbc:hive2://localhost:10000> ;
+-------+---------+--+
| a.id | a.name |
+-------+---------+--+
| | qq |
| | ww |
| | ee |
| | rr |
| | tt |
| | yy |
| | aa |
| | ss |
| | zz |
+-------+---------+--+
rows selected (1.881 seconds)
: jdbc:hive2://localhost:10000> select * from b;
+-------+---------+--+
| b.id | b.name |
+-------+---------+--+
| | qq |
| | |
| | dd |
| | rr |
| | fgf |
| | as |
| | |
| | ww |
| | |
| | |
| | |
| | 4r |
+-------+---------+--+
rows selected (0.147 seconds)
inner join 的结果,也就是join
0: jdbc:hive2://localhost:10000> select a.*,b.* from a inner join b on a.id = b.id;
INFO : Execution completed successfully
INFO : MapredLocal task succeeded
INFO : Number of reduce tasks is set to since there's no reduce operator
INFO : number of splits:
INFO : Submitting tokens for job: job_1496277833427_0007
INFO : The url to track the job: http://mini2:8088/proxy/application_1496277833427_0007/
INFO : Starting Job = job_1496277833427_0007, Tracking URL = http://mini2:8088/proxy/application_1496277833427_0007/
INFO : Kill Command = /home/hadoop/xxxxxx/hadoop265/bin/hadoop job -kill job_1496277833427_0007
INFO : Hadoop job information for Stage-: number of mappers: ; number of reducers:
INFO : -- ::, Stage- map = %, reduce = %
INFO : -- ::, Stage- map = %, reduce = %, Cumulative CPU 5.05 sec
INFO : MapReduce Total cumulative CPU time: seconds msec
INFO : Ended Job = job_1496277833427_0007
+-------+---------+-------+---------+--+
| a.id | a.name | b.id | b.name |
+-------+---------+-------+---------+--+
| | qq | | qq |
| | ww | | |
| | ee | | dd |
| | rr | | rr |
| | yy | | fgf |
| | aa | | as |
+-------+---------+-------+---------+--+

full outer join ,两边的数据都会出来只不过on条件没有对应上的一端会显示为null

: jdbc:hive2://localhost:10000> select a.*,b.* from a full outer join b on a.id = b.id;
INFO : Number of reduce tasks not specified. Estimated from input data size:
INFO : In order to change the average load for a reducer (in bytes):
INFO : set hive.exec.reducers.bytes.per.reducer=<number>
INFO : In order to limit the maximum number of reducers:
INFO : set hive.exec.reducers.max=<number>
INFO : In order to set a constant number of reducers:
INFO : set mapreduce.job.reduces=<number>
INFO : number of splits:
INFO : Submitting tokens for job: job_1496277833427_0008
INFO : The url to track the job: http://mini2:8088/proxy/application_1496277833427_0008/
INFO : Starting Job = job_1496277833427_0008, Tracking URL = http://mini2:8088/proxy/application_1496277833427_0008/
INFO : Kill Command = /home/hadoop/xxxxxx/hadoop265/bin/hadoop job -kill job_1496277833427_0008
INFO : Hadoop job information for Stage-: number of mappers: ; number of reducers:
INFO : -- ::, Stage- map = %, reduce = %
INFO : -- ::, Stage- map = %, reduce = %
INFO : -- ::, Stage- map = %, reduce = %
INFO : -- ::, Stage- map = %, reduce = %, Cumulative CPU 6.52 sec
INFO : -- ::, Stage- map = %, reduce = %, Cumulative CPU 9.17 sec
INFO : -- ::, Stage- map = %, reduce = %, Cumulative CPU 12.65 sec
INFO : MapReduce Total cumulative CPU time: seconds msec
INFO : Ended Job = job_1496277833427_0008
+-------+---------+-------+---------+--+
| a.id | a.name | b.id | b.name |
+-------+---------+-------+---------+--+
| | qq | | qq |
| | ww | | |
| | ee | | dd |
| | rr | | rr |
| | tt | NULL | NULL |
| | yy | | fgf |
| | aa | | as |
| | ss | NULL | NULL |
| NULL | NULL | | |
| | zz | NULL | NULL |
| NULL | NULL | | |
| NULL | NULL | | ww |
| NULL | NULL | | |
| NULL | NULL | | 4r |
| NULL | NULL | | |
+-------+---------+-------+---------+--+
rows selected (371.304 seconds)

select a.*from a left semi join b on a.id = b.id; -- from 前不能写b.* 否则会报错( Error while compiling statement: FAILED: SemanticException [Error 10009]: Line 1:11 Invalid table alias 'b' (state=42000,code=10009))

替代exist in 的用法,返回值只是inner join 中左边的一般,

+-------+---------+--+
| a.id | a.name |
+-------+---------+--+
| | qq |
| | ww |
| | ee |
| | rr |
| | yy |
| | aa |
+-------+---------+--+

没有 right semi join

left semi join 是exist in 的高效实现,比inner join 效率高

hive中的join的更多相关文章

  1. hive中left join、left outer join和left semi join的区别

    先说结论,再举例子.   hive中,left join与left outer join等价.   left semi join与left outer join的区别:left semi join相当 ...

  2. SQL join中级篇--hive中 mapreduce join方法分析

    1. 概述. 本文主要介绍了mapreduce框架上如何实现两表JOIN. 2. 常见的join方法介绍 假设要进行join的数据分别来自File1和File2. 2.1 reduce side jo ...

  3. 关于Hive中的join和left join的理解

    一.join与left join的全称 JOIN是INNER JOIN的简写,LEFT JOIN是LEFT OUTER JOIN的简写. 二.join与left join的应用场景 JOIN一般用于A ...

  4. Hive中Join的原理和机制

    转自:http://lxw1234.com/archives/2015/06/313.htm 笼统的说,Hive中的Join可分为Common Join(Reduce阶段完成join)和Map Joi ...

  5. Hive中Join的类型和用法

    关键字:Hive Join.Hive LEFT|RIGTH|FULL OUTER JOIN.Hive LEFT SEMI JOIN.Hive Cross Join Hive中除了支持和传统数据库中一样 ...

  6. Hive中JOIN操作

    1. 只支持相等JOIN. 2. 多表连接当使用不同的列进行JOIN时,会产生多个MR作业. 3. 最后的表的数据是从流中读取,而前面的会在内存中缓存,因此最好把最大的表放在最后. SELECT /* ...

  7. hive 配置文件以及join中null值的处理

    一.Hive的參数设置 1.  三种设定方式:配置文件 ·   用户自己定义配置文件:$HIVE_CONF_DIR/hive-site.xml ·   默认配置文件:$HIVE_CONF_DIR/hi ...

  8. hive中与hbase外部表join时内存溢出(hive处理mapjoin的优化器机制)

    与hbase外部表(wizad_mdm_main)进行join出现问题: CREATE TABLE wizad_mdm_dev_lmj_edition_result as select *  from ...

  9. hive中的子查询改join操作(转)

    这些子查询在oracle和mysql等数据库中都能执行,但是在hive中却不支持,但是我们可以把这些查询语句改为join操作: -- 1.子查询 select * from A a where a.u ...

随机推荐

  1. 一个强大的UI node 抽象

    基于cocos2d -x的一个强大的 界面对象的基类 ---@type uinode ui 对象的抽象 --@usage -- 界面打开的执行流程 -- 带*的是可选重写的函数,不带*的为必须实现的 ...

  2. 图像视图-ImageView

    (一) 知识点: (1)imageView.setImageAlpha(Alpha):设置图片透明度 (2)在布局imageView中设置图片位置:android:scaleType="ce ...

  3. JAVA Socket 连接时长

    其实关于这个问题可能用到的人不会很多,不过我在这里还是说说. 正常很多人写socket通信时,都会直接通过new socket(IP,PORT)直接去链接服务器.其实这种做法也没有错误,但是若当服务器 ...

  4. hdu 1863 畅通project

    #include <stdio.h> #include <string.h> #include <iostream> #include <algorithm& ...

  5. TortoiseSVN 使用简介

    什么是SVN(subversion)? 有一个简单但不十分精确的比喻:SVN = 版本控制 + 备份服务. 简单的说就是,你可以把SVN看做一个备份服务器,但是更好的是,他可以帮助记住每一次上传的版本 ...

  6. Python数据整合与数据准备-BigGorilla介绍

    参考文档:http://www.biggorilla.org/zh-hans/walkt/ 一.前言 “根据访谈记录和专家估计,数据科学家将50%至80%的时间花在搜集和准备难以梳理的数字数据的琐碎工 ...

  7. 转:解决 java.util.MissingResourceException: Can't find bundle for base name com...config, locale zh_CN 错误

    Solve java.util.MissingResourceException: Can't find bundle for base name com...config, locale zh_CN ...

  8. [Javascript] Deep merge in Javascript with Ramda.js mergeDeepWith

    Javascript's Object.assign is shadow merge, loadsh's _.merge is deep merge, but has probem for array ...

  9. Mac机装Win7后 启动只见鼠标怎么办

    我有一台Mac机,用Bootcamp的方式装了Win7,昨天一按开机键发现只有鼠标没有别的. 当时按热启动无效,把笔记本盖子合上一会再开也无效,按关机键关掉再开也无效(这时是短按). 当时想是不是Ma ...

  10. 倍福TwinCAT(贝福Beckhoff)常见问题(FAQ)-人机界面如何快速调整大量控件的位置

    打开元素列表,然后直接从顶部按住Shift批量选中控件即可     更多教学视频和资料下载,欢迎关注以下信息: 我的优酷空间: http://i.youku.com/acetaohai123   我的 ...