hive中的join

建表

: jdbc:hive2://localhost:10000> create database myjoin;

No rows affected (3.78 seconds)

: jdbc:hive2://localhost:10000> use myjoin;

No rows affected (0.419 seconds)

: jdbc:hive2://localhost:10000> create table a(id int,name string) row format delimited fields terminated by ',';

No rows affected (2.08 seconds)

: jdbc:hive2://localhost:10000> create table b(id int,name string) row format delimited fields terminated by ',';

: jdbc:hive2://localhost:10000> select * from a

: jdbc:hive2://localhost:10000> ;

+-------+---------+--+

| a.id  | a.name  |

+-------+---------+--+

|      | qq      |

|      | ww      |

|      | ee      |

|      | rr      |

|      | tt      |

|      | yy      |

|      | aa      |

|      | ss      |

|     | zz      |

+-------+---------+--+

 rows selected (1.881 seconds)

: jdbc:hive2://localhost:10000> select * from b;

+-------+---------+--+

| b.id  | b.name  |

+-------+---------+--+

|      | qq      |

|      |       |

|      | dd      |

|      | rr      |

|      | fgf     |

|      | as      |

|      |       |

|     | ww      |

|     |        |

|     |       |

|     |       |

|     | 4r      |

+-------+---------+--+

 rows selected (0.147 seconds)

inner join 的结果，也就是join
0: jdbc:hive2://localhost:10000> select a.*,b.* from a inner join b on a.id = b.id;

INFO  : Execution completed successfully

INFO  : MapredLocal task succeeded

INFO  : Number of reduce tasks is set to  since there's no reduce operator

INFO  : number of splits:

INFO  : Submitting tokens for job: job_1496277833427_0007

INFO  : The url to track the job: http://mini2:8088/proxy/application_1496277833427_0007/

INFO  : Starting Job = job_1496277833427_0007, Tracking URL = http://mini2:8088/proxy/application_1496277833427_0007/

INFO  : Kill Command = /home/hadoop/xxxxxx/hadoop265/bin/hadoop job  -kill job_1496277833427_0007

INFO  : Hadoop job information for Stage-: number of mappers: ; number of reducers:

INFO  : -- ::, Stage- map = %,  reduce = %

INFO  : -- ::, Stage- map = %,  reduce = %, Cumulative CPU 5.05 sec

INFO  : MapReduce Total cumulative CPU time:  seconds  msec

INFO  : Ended Job = job_1496277833427_0007

+-------+---------+-------+---------+--+

| a.id  | a.name  | b.id  | b.name  |

+-------+---------+-------+---------+--+

|      | qq      |      | qq      |

|      | ww      |      |       |

|      | ee      |      | dd      |

|      | rr      |      | rr      |

|      | yy      |      | fgf     |

|      | aa      |      | as      |

+-------+---------+-------+---------+--+

full outer join ，两边的数据都会出来只不过on条件没有对应上的一端会显示为null

: jdbc:hive2://localhost:10000> select a.*,b.* from a full outer join b on a.id = b.id;

INFO  : Number of reduce tasks not specified. Estimated from input data size:

INFO  : In order to change the average load for a reducer (in bytes):

INFO  :   set hive.exec.reducers.bytes.per.reducer=<number>

INFO  : In order to limit the maximum number of reducers:

INFO  :   set hive.exec.reducers.max=<number>

INFO  : In order to set a constant number of reducers:

INFO  :   set mapreduce.job.reduces=<number>

INFO  : number of splits:

INFO  : Submitting tokens for job: job_1496277833427_0008

INFO  : The url to track the job: http://mini2:8088/proxy/application_1496277833427_0008/

INFO  : Starting Job = job_1496277833427_0008, Tracking URL = http://mini2:8088/proxy/application_1496277833427_0008/

INFO  : Kill Command = /home/hadoop/xxxxxx/hadoop265/bin/hadoop job  -kill job_1496277833427_0008

INFO  : Hadoop job information for Stage-: number of mappers: ; number of reducers:

INFO  : -- ::, Stage- map = %,  reduce = %

INFO  : -- ::, Stage- map = %,  reduce = %

INFO  : -- ::, Stage- map = %,  reduce = %

INFO  : -- ::, Stage- map = %,  reduce = %, Cumulative CPU 6.52 sec

INFO  : -- ::, Stage- map = %,  reduce = %, Cumulative CPU 9.17 sec

INFO  : -- ::, Stage- map = %,  reduce = %, Cumulative CPU 12.65 sec

INFO  : MapReduce Total cumulative CPU time:  seconds  msec

INFO  : Ended Job = job_1496277833427_0008

+-------+---------+-------+---------+--+

| a.id  | a.name  | b.id  | b.name  |

+-------+---------+-------+---------+--+

|      | qq      |      | qq      |

|      | ww      |      |       |

|      | ee      |      | dd      |

|      | rr      |      | rr      |

|      | tt      | NULL  | NULL    |

|      | yy      |      | fgf     |

|      | aa      |      | as      |

|      | ss      | NULL  | NULL    |

| NULL  | NULL    |      |       |

|     | zz      | NULL  | NULL    |

| NULL  | NULL    |     |       |

| NULL  | NULL    |     | ww      |

| NULL  | NULL    |     |       |

| NULL  | NULL    |     | 4r      |

| NULL  | NULL    |     |        |

+-------+---------+-------+---------+--+

 rows selected (371.304 seconds)

select a.*from a left semi join b on a.id = b.id; -- from 前不能写b.* 否则会报错（ Error while compiling statement: FAILED: SemanticException [Error 10009]: Line 1:11 Invalid table alias 'b' (state=42000,code=10009)）

替代exist in 的用法，返回值只是inner join 中左边的一般，

+-------+---------+--+

| a.id  | a.name  |

+-------+---------+--+

|      | qq      |

|      | ww      |

|      | ee      |

|      | rr      |

|      | yy      |

|      | aa      |

+-------+---------+--+

没有 right semi join

left semi join 是exist in 的高效实现，比inner join 效率高

hive中的join的更多相关文章

hive中left join、left outer join和left semi join的区别
先说结论,再举例子. hive中,left join与left outer join等价. left semi join与left outer join的区别:left semi join相当 ...
SQL join中级篇--hive中 mapreduce join方法分析
1. 概述. 本文主要介绍了mapreduce框架上如何实现两表JOIN. 2. 常见的join方法介绍假设要进行join的数据分别来自File1和File2. 2.1 reduce side jo ...
关于Hive中的join和left join的理解
一.join与left join的全称 JOIN是INNER JOIN的简写,LEFT JOIN是LEFT OUTER JOIN的简写. 二.join与left join的应用场景 JOIN一般用于A ...
Hive中Join的原理和机制
转自:http://lxw1234.com/archives/2015/06/313.htm 笼统的说,Hive中的Join可分为Common Join(Reduce阶段完成join)和Map Joi ...
Hive中Join的类型和用法
关键字:Hive Join.Hive LEFT|RIGTH|FULL OUTER JOIN.Hive LEFT SEMI JOIN.Hive Cross Join Hive中除了支持和传统数据库中一样 ...
Hive中JOIN操作
1. 只支持相等JOIN. 2. 多表连接当使用不同的列进行JOIN时,会产生多个MR作业. 3. 最后的表的数据是从流中读取,而前面的会在内存中缓存,因此最好把最大的表放在最后. SELECT /* ...
hive 配置文件以及join中null值的处理
一.Hive的參数设置 1. 三种设定方式:配置文件 · 用户自己定义配置文件:$HIVE_CONF_DIR/hive-site.xml · 默认配置文件:$HIVE_CONF_DIR/hi ...
hive中与hbase外部表join时内存溢出（hive处理mapjoin的优化器机制）
与hbase外部表(wizad_mdm_main)进行join出现问题: CREATE TABLE wizad_mdm_dev_lmj_edition_result as select * from ...
hive中的子查询改join操作（转）
这些子查询在oracle和mysql等数据库中都能执行,但是在hive中却不支持,但是我们可以把这些查询语句改为join操作: -- 1.子查询 select * from A a where a.u ...

随机推荐

Java概述--Java开发实战经典
1)Java有三个发展方向,分别是Java SE,Java EE,Java ME.以下简要介绍. a.Java SE,Java Standard Edition(java标准版),包含了构成java语 ...
Scala零基础教学【90-101】Akka 实战-代码实现
第90讲:基于Scala的Actor之上的分布式并发消息驱动框架Akka初体验 akka在业界使用非常广泛 spark背后就是由akka驱动的要写消息驱动的编程模型都首推akka 下面将用30讲讲解 ...
Saga alternatives – routing slips
In the last few posts on sagas, we looked at a variety of patterns of modeling long-running business ...
JAVA call graphs JAVA调用图
https://github.com/gousiosg/java-callgraph Programs for producing static and dynamic (runtime) call ...
[Android Memory] Android内存管理、监测剖析
转载自:http://blog.csdn.net/anlegor/article/details/23398785 Android内存管理机制: Android内存管理主要有:LowMemory Ki ...
Solr-5.3.1 dataimport 导入mysql数据
最近需要计算制造业领域大词表每个词的idf,词表里一共九十多万个词,语料一共三百七十多万篇分词后文献.最开始尝试用程序词表循环套语料循环得到每个词的idf,后来又尝试把语料存入mysql然后建立全文索 ...
解决MySQL数据导入报错Got a packet bigger than‘max_allowed_packet’bytes
临时修改:mysql>set global max_allowed_packet=524288000;修改 #512M 这条语句可以在小黑窗里执行,也可以在navicat查询新建查询里执行.
淘宝Diamond架构分析
转载:http://blog.csdn.net/szwandcj/article/details/51165954 早期的应用都是单体的,配置修改后,只要通过预留的管理界面刷新reload即可.后来, ...
倍福TwinCAT(贝福Beckhoff)常见问题(FAQ)-T_AmsNetID是什么
该参数是包含六个数字类似于IP地址的字符串形式,例如"1.1.1.2.7.1",如果为空字符串,则默认使用本机的AmsNetID 你可以右击贝福的图标,然后点击About查看当前本 ...
MYSQL百万级数据，如何优化
MYSQL百万级数据,如何优化首先,数据量大的时候,应尽量避免全表扫描,应考虑在 where 及 order by 涉及的列上建立索引,建索引可以大大加快数据的检索速度.但是,有些情况索引是 ...

hive中的join

hive中的join的更多相关文章

随机推荐

热门专题