hive中的join
建表
: jdbc:hive2://localhost:10000> create database myjoin;
No rows affected (3.78 seconds)
: jdbc:hive2://localhost:10000> use myjoin;
No rows affected (0.419 seconds)
: jdbc:hive2://localhost:10000> create table a(id int,name string) row format delimited fields terminated by ',';
No rows affected (2.08 seconds)
: jdbc:hive2://localhost:10000> create table b(id int,name string) row format delimited fields terminated by ',';
: jdbc:hive2://localhost:10000> select * from a
: jdbc:hive2://localhost:10000> ;
+-------+---------+--+
| a.id | a.name |
+-------+---------+--+
| | qq |
| | ww |
| | ee |
| | rr |
| | tt |
| | yy |
| | aa |
| | ss |
| | zz |
+-------+---------+--+
rows selected (1.881 seconds)
: jdbc:hive2://localhost:10000> select * from b;
+-------+---------+--+
| b.id | b.name |
+-------+---------+--+
| | qq |
| | |
| | dd |
| | rr |
| | fgf |
| | as |
| | |
| | ww |
| | |
| | |
| | |
| | 4r |
+-------+---------+--+
rows selected (0.147 seconds)
inner join 的结果,也就是join
0: jdbc:hive2://localhost:10000> select a.*,b.* from a inner join b on a.id = b.id;
INFO : Execution completed successfully
INFO : MapredLocal task succeeded
INFO : Number of reduce tasks is set to since there's no reduce operator
INFO : number of splits:
INFO : Submitting tokens for job: job_1496277833427_0007
INFO : The url to track the job: http://mini2:8088/proxy/application_1496277833427_0007/
INFO : Starting Job = job_1496277833427_0007, Tracking URL = http://mini2:8088/proxy/application_1496277833427_0007/
INFO : Kill Command = /home/hadoop/xxxxxx/hadoop265/bin/hadoop job -kill job_1496277833427_0007
INFO : Hadoop job information for Stage-: number of mappers: ; number of reducers:
INFO : -- ::, Stage- map = %, reduce = %
INFO : -- ::, Stage- map = %, reduce = %, Cumulative CPU 5.05 sec
INFO : MapReduce Total cumulative CPU time: seconds msec
INFO : Ended Job = job_1496277833427_0007
+-------+---------+-------+---------+--+
| a.id | a.name | b.id | b.name |
+-------+---------+-------+---------+--+
| | qq | | qq |
| | ww | | |
| | ee | | dd |
| | rr | | rr |
| | yy | | fgf |
| | aa | | as |
+-------+---------+-------+---------+--+
full outer join ,两边的数据都会出来只不过on条件没有对应上的一端会显示为null
: jdbc:hive2://localhost:10000> select a.*,b.* from a full outer join b on a.id = b.id;
INFO : Number of reduce tasks not specified. Estimated from input data size:
INFO : In order to change the average load for a reducer (in bytes):
INFO : set hive.exec.reducers.bytes.per.reducer=<number>
INFO : In order to limit the maximum number of reducers:
INFO : set hive.exec.reducers.max=<number>
INFO : In order to set a constant number of reducers:
INFO : set mapreduce.job.reduces=<number>
INFO : number of splits:
INFO : Submitting tokens for job: job_1496277833427_0008
INFO : The url to track the job: http://mini2:8088/proxy/application_1496277833427_0008/
INFO : Starting Job = job_1496277833427_0008, Tracking URL = http://mini2:8088/proxy/application_1496277833427_0008/
INFO : Kill Command = /home/hadoop/xxxxxx/hadoop265/bin/hadoop job -kill job_1496277833427_0008
INFO : Hadoop job information for Stage-: number of mappers: ; number of reducers:
INFO : -- ::, Stage- map = %, reduce = %
INFO : -- ::, Stage- map = %, reduce = %
INFO : -- ::, Stage- map = %, reduce = %
INFO : -- ::, Stage- map = %, reduce = %, Cumulative CPU 6.52 sec
INFO : -- ::, Stage- map = %, reduce = %, Cumulative CPU 9.17 sec
INFO : -- ::, Stage- map = %, reduce = %, Cumulative CPU 12.65 sec
INFO : MapReduce Total cumulative CPU time: seconds msec
INFO : Ended Job = job_1496277833427_0008
+-------+---------+-------+---------+--+
| a.id | a.name | b.id | b.name |
+-------+---------+-------+---------+--+
| | qq | | qq |
| | ww | | |
| | ee | | dd |
| | rr | | rr |
| | tt | NULL | NULL |
| | yy | | fgf |
| | aa | | as |
| | ss | NULL | NULL |
| NULL | NULL | | |
| | zz | NULL | NULL |
| NULL | NULL | | |
| NULL | NULL | | ww |
| NULL | NULL | | |
| NULL | NULL | | 4r |
| NULL | NULL | | |
+-------+---------+-------+---------+--+
rows selected (371.304 seconds)
select a.*from a left semi join b on a.id = b.id; -- from 前不能写b.* 否则会报错( Error while compiling statement: FAILED: SemanticException [Error 10009]: Line 1:11 Invalid table alias 'b' (state=42000,code=10009))
替代exist in 的用法,返回值只是inner join 中左边的一般,
+-------+---------+--+
| a.id | a.name |
+-------+---------+--+
| | qq |
| | ww |
| | ee |
| | rr |
| | yy |
| | aa |
+-------+---------+--+
没有 right semi join
left semi join 是exist in 的高效实现,比inner join 效率高
hive中的join的更多相关文章
- hive中left join、left outer join和left semi join的区别
先说结论,再举例子. hive中,left join与left outer join等价. left semi join与left outer join的区别:left semi join相当 ...
- SQL join中级篇--hive中 mapreduce join方法分析
1. 概述. 本文主要介绍了mapreduce框架上如何实现两表JOIN. 2. 常见的join方法介绍 假设要进行join的数据分别来自File1和File2. 2.1 reduce side jo ...
- 关于Hive中的join和left join的理解
一.join与left join的全称 JOIN是INNER JOIN的简写,LEFT JOIN是LEFT OUTER JOIN的简写. 二.join与left join的应用场景 JOIN一般用于A ...
- Hive中Join的原理和机制
转自:http://lxw1234.com/archives/2015/06/313.htm 笼统的说,Hive中的Join可分为Common Join(Reduce阶段完成join)和Map Joi ...
- Hive中Join的类型和用法
关键字:Hive Join.Hive LEFT|RIGTH|FULL OUTER JOIN.Hive LEFT SEMI JOIN.Hive Cross Join Hive中除了支持和传统数据库中一样 ...
- Hive中JOIN操作
1. 只支持相等JOIN. 2. 多表连接当使用不同的列进行JOIN时,会产生多个MR作业. 3. 最后的表的数据是从流中读取,而前面的会在内存中缓存,因此最好把最大的表放在最后. SELECT /* ...
- hive 配置文件以及join中null值的处理
一.Hive的參数设置 1. 三种设定方式:配置文件 · 用户自己定义配置文件:$HIVE_CONF_DIR/hive-site.xml · 默认配置文件:$HIVE_CONF_DIR/hi ...
- hive中与hbase外部表join时内存溢出(hive处理mapjoin的优化器机制)
与hbase外部表(wizad_mdm_main)进行join出现问题: CREATE TABLE wizad_mdm_dev_lmj_edition_result as select * from ...
- hive中的子查询改join操作(转)
这些子查询在oracle和mysql等数据库中都能执行,但是在hive中却不支持,但是我们可以把这些查询语句改为join操作: -- 1.子查询 select * from A a where a.u ...
随机推荐
- 我的vim配置---jeffy-vim-v2.1.tar
http://files.cnblogs.com/pengdonglin137/jeffy-vim-v2.1.rar 使用方法: 在Linux下,解压后,进入解压后的目录,执行./install.sh ...
- SpringMVC——redirect重定向跳转传值
原文:https://my.oschina.net/u/2273085/blog/398374 spring MVC框架controller间跳转,需重定向.有几种情况:不带参数跳转,带参数拼接url ...
- SqlServer_合并多个递归查询数据(CTE)
该方法在数据量过大时,效率过低,可参考hierarchyid字段实现(Sqlserver 2008) 优点:效率较高 缺点:需要不断维护数据,对现有业务有一定影响 参考:http://www.cnbl ...
- 关于BOM UTF8
这三篇可以看下: http://www.zhihu.com/question/20167122 http://www.cnblogs.com/DDark/archive/2011/11/28/2266 ...
- Android Studio打包:“APP_NAME" IS NOT TRANSLATED IN ZH, ZH_CN……..解决办法
开始用Android Studio更新到2.0稳定版,调试的时候没啥问题,在打包的时候出现了"app_name" is not translated in zh, zh_CN….. ...
- Android性能调优篇之探索JVM内存分配
开篇废话 今天我们一起来学习JVM的内存分配,主要目的是为我们Android内存优化打下基础. 一直在想以什么样的方式来呈现这个知识点才能让我们易于理解,最终决定使用方法为:图解+源代码分析. 欢迎访 ...
- Intel haxm安装失败问题解决
在安装Intel haxm为安卓模拟器加速时,会遇到提示VT-X未开启问题,问题提示如下图 工具/原料 Intel haxm 安卓模拟器 方法/步骤 确认你的处理器是否是Intel的,如果是A ...
- 页面自动适应大小&&获取页面的大小
直接上代码: <script type="text/JavaScript"> var size = 1.0; function showheight() { alert ...
- [Python爬虫] 之二十五:Selenium +phantomjs 利用 pyquery抓取今日头条网数据
一.介绍 本例子用Selenium +phantomjs爬取今日头条(http://www.toutiao.com/search/?keyword=电视)的资讯信息,输入给定关键字抓取资讯信息. 给定 ...
- 在基于or1200处理器的SoC上移植linux
经历了前端的艰苦奋斗.SoC前端设计已经调试完毕,如今直接进入uboot移植 首先cd入u-boot-master 找到子文件夹include下得de2_115.h文件进行改动: (下一步计划:加 ...