hive中的join

建表

: jdbc:hive2://localhost:10000> create database myjoin;

No rows affected (3.78 seconds)

: jdbc:hive2://localhost:10000> use myjoin;

No rows affected (0.419 seconds)

: jdbc:hive2://localhost:10000> create table a(id int,name string) row format delimited fields terminated by ',';

No rows affected (2.08 seconds)

: jdbc:hive2://localhost:10000> create table b(id int,name string) row format delimited fields terminated by ',';

: jdbc:hive2://localhost:10000> select * from a

: jdbc:hive2://localhost:10000> ;

+-------+---------+--+

| a.id  | a.name  |

+-------+---------+--+

|      | qq      |

|      | ww      |

|      | ee      |

|      | rr      |

|      | tt      |

|      | yy      |

|      | aa      |

|      | ss      |

|     | zz      |

+-------+---------+--+

 rows selected (1.881 seconds)

: jdbc:hive2://localhost:10000> select * from b;

+-------+---------+--+

| b.id  | b.name  |

+-------+---------+--+

|      | qq      |

|      |       |

|      | dd      |

|      | rr      |

|      | fgf     |

|      | as      |

|      |       |

|     | ww      |

|     |        |

|     |       |

|     |       |

|     | 4r      |

+-------+---------+--+

 rows selected (0.147 seconds)

inner join 的结果，也就是join
0: jdbc:hive2://localhost:10000> select a.*,b.* from a inner join b on a.id = b.id;

INFO  : Execution completed successfully

INFO  : MapredLocal task succeeded

INFO  : Number of reduce tasks is set to  since there's no reduce operator

INFO  : number of splits:

INFO  : Submitting tokens for job: job_1496277833427_0007

INFO  : The url to track the job: http://mini2:8088/proxy/application_1496277833427_0007/

INFO  : Starting Job = job_1496277833427_0007, Tracking URL = http://mini2:8088/proxy/application_1496277833427_0007/

INFO  : Kill Command = /home/hadoop/xxxxxx/hadoop265/bin/hadoop job  -kill job_1496277833427_0007

INFO  : Hadoop job information for Stage-: number of mappers: ; number of reducers:

INFO  : -- ::, Stage- map = %,  reduce = %

INFO  : -- ::, Stage- map = %,  reduce = %, Cumulative CPU 5.05 sec

INFO  : MapReduce Total cumulative CPU time:  seconds  msec

INFO  : Ended Job = job_1496277833427_0007

+-------+---------+-------+---------+--+

| a.id  | a.name  | b.id  | b.name  |

+-------+---------+-------+---------+--+

|      | qq      |      | qq      |

|      | ww      |      |       |

|      | ee      |      | dd      |

|      | rr      |      | rr      |

|      | yy      |      | fgf     |

|      | aa      |      | as      |

+-------+---------+-------+---------+--+

full outer join ，两边的数据都会出来只不过on条件没有对应上的一端会显示为null

: jdbc:hive2://localhost:10000> select a.*,b.* from a full outer join b on a.id = b.id;

INFO  : Number of reduce tasks not specified. Estimated from input data size:

INFO  : In order to change the average load for a reducer (in bytes):

INFO  :   set hive.exec.reducers.bytes.per.reducer=<number>

INFO  : In order to limit the maximum number of reducers:

INFO  :   set hive.exec.reducers.max=<number>

INFO  : In order to set a constant number of reducers:

INFO  :   set mapreduce.job.reduces=<number>

INFO  : number of splits:

INFO  : Submitting tokens for job: job_1496277833427_0008

INFO  : The url to track the job: http://mini2:8088/proxy/application_1496277833427_0008/

INFO  : Starting Job = job_1496277833427_0008, Tracking URL = http://mini2:8088/proxy/application_1496277833427_0008/

INFO  : Kill Command = /home/hadoop/xxxxxx/hadoop265/bin/hadoop job  -kill job_1496277833427_0008

INFO  : Hadoop job information for Stage-: number of mappers: ; number of reducers:

INFO  : -- ::, Stage- map = %,  reduce = %

INFO  : -- ::, Stage- map = %,  reduce = %

INFO  : -- ::, Stage- map = %,  reduce = %

INFO  : -- ::, Stage- map = %,  reduce = %, Cumulative CPU 6.52 sec

INFO  : -- ::, Stage- map = %,  reduce = %, Cumulative CPU 9.17 sec

INFO  : -- ::, Stage- map = %,  reduce = %, Cumulative CPU 12.65 sec

INFO  : MapReduce Total cumulative CPU time:  seconds  msec

INFO  : Ended Job = job_1496277833427_0008

+-------+---------+-------+---------+--+

| a.id  | a.name  | b.id  | b.name  |

+-------+---------+-------+---------+--+

|      | qq      |      | qq      |

|      | ww      |      |       |

|      | ee      |      | dd      |

|      | rr      |      | rr      |

|      | tt      | NULL  | NULL    |

|      | yy      |      | fgf     |

|      | aa      |      | as      |

|      | ss      | NULL  | NULL    |

| NULL  | NULL    |      |       |

|     | zz      | NULL  | NULL    |

| NULL  | NULL    |     |       |

| NULL  | NULL    |     | ww      |

| NULL  | NULL    |     |       |

| NULL  | NULL    |     | 4r      |

| NULL  | NULL    |     |        |

+-------+---------+-------+---------+--+

 rows selected (371.304 seconds)

select a.*from a left semi join b on a.id = b.id; -- from 前不能写b.* 否则会报错（ Error while compiling statement: FAILED: SemanticException [Error 10009]: Line 1:11 Invalid table alias 'b' (state=42000,code=10009)）

替代exist in 的用法，返回值只是inner join 中左边的一般，

+-------+---------+--+

| a.id  | a.name  |

+-------+---------+--+

|      | qq      |

|      | ww      |

|      | ee      |

|      | rr      |

|      | yy      |

|      | aa      |

+-------+---------+--+

没有 right semi join

left semi join 是exist in 的高效实现，比inner join 效率高

hive中的join的更多相关文章

hive中left join、left outer join和left semi join的区别
先说结论,再举例子. hive中,left join与left outer join等价. left semi join与left outer join的区别:left semi join相当 ...
SQL join中级篇--hive中 mapreduce join方法分析
1. 概述. 本文主要介绍了mapreduce框架上如何实现两表JOIN. 2. 常见的join方法介绍假设要进行join的数据分别来自File1和File2. 2.1 reduce side jo ...
关于Hive中的join和left join的理解
一.join与left join的全称 JOIN是INNER JOIN的简写,LEFT JOIN是LEFT OUTER JOIN的简写. 二.join与left join的应用场景 JOIN一般用于A ...
Hive中Join的原理和机制
转自:http://lxw1234.com/archives/2015/06/313.htm 笼统的说,Hive中的Join可分为Common Join(Reduce阶段完成join)和Map Joi ...
Hive中Join的类型和用法
关键字:Hive Join.Hive LEFT|RIGTH|FULL OUTER JOIN.Hive LEFT SEMI JOIN.Hive Cross Join Hive中除了支持和传统数据库中一样 ...
Hive中JOIN操作
1. 只支持相等JOIN. 2. 多表连接当使用不同的列进行JOIN时,会产生多个MR作业. 3. 最后的表的数据是从流中读取,而前面的会在内存中缓存,因此最好把最大的表放在最后. SELECT /* ...
hive 配置文件以及join中null值的处理
一.Hive的參数设置 1. 三种设定方式:配置文件 · 用户自己定义配置文件:$HIVE_CONF_DIR/hive-site.xml · 默认配置文件:$HIVE_CONF_DIR/hi ...
hive中与hbase外部表join时内存溢出（hive处理mapjoin的优化器机制）
与hbase外部表(wizad_mdm_main)进行join出现问题: CREATE TABLE wizad_mdm_dev_lmj_edition_result as select * from ...
hive中的子查询改join操作（转）
这些子查询在oracle和mysql等数据库中都能执行,但是在hive中却不支持,但是我们可以把这些查询语句改为join操作: -- 1.子查询 select * from A a where a.u ...

随机推荐

Problem Z: 零起点学算法22——求正弦和余弦
#include<stdio.h> #include <math.h> int main() { int n; ); double a,b; while(scanf(" ...
C#中yield return用法
转载:http://www.jb51.net/article/54810.htm http://www.cnblogs.com/HunterWei/archive/2012/06/13/csharpy ...
手把手教你调试Entity Framework 6源码
0 摘要本文讲述在Visual Studio 2013(VS 2013)下调试Entity Framework 6(EF 6)源码的配置过程.原则上,VS 2012也适用. 之前打算编写<E ...
Lucene的学习及使用实验
实验一下Lucene是怎么使用的. 参考:http://www.importnew.com/12715.html (例子比较简单) http://www.yiibai.com/lucene/lucen ...
FL2440 rt3070模块station模式动态获取IP地址
---------------------------------------------------------------------------------------------------- ...
【Bootstrap 多级菜单】
参考资料: Bootstrap-submenu:http://www.html580.com/11848/demo Bootstrap-submenu:https://vsn4ik.github.io ...
Centos 7 搭建蓝鲸V4.1.16社区版
第一次搭建蓝鲸平台,参考了蓝鲸社区的官方搭建文档. 友情链接:蓝鲸智云社区版V4.1.16用户手册搭建时遇到了不少的坑,这里做一个详细的安装梳理主机硬件要求官方的推荐如下: 在本地用VMware ...
struts2入门示例（hello world）
1. 环境搭建按照之前的文章配置好myeclipse的jdk和tomcat,并新建一个web项目后,可开始动手配置与struts2相关的地方了.首先去struts的官网下载好最新的struts2代码 ...
基于php的银行卡实名认证接口调用代码实例
银行卡二元素检测,检测输入的姓名.银行卡号是否一致. 银行卡实名认证接口:https://www.juhe.cn/docs/api/id/188 <?php // +-------------- ...
Oracle转化成为百分比
两种方式都行: ),)||'%' 百分比 from dual; ),'99D99')||'%' 百分比 from dual 第一种方式通过round可以自己选择精确到位数.

hive中的join

hive中的join的更多相关文章

随机推荐

热门专题