hive提前过滤

create table sospdm.tmp_yinfei_test_01
(
id string
)
partitioned by (statis_date string)
; create table sospdm.tmp_yinfei_test_02
(
id string
)
partitioned by (statis_date string)
; select t1.*
from tmp_yinfei_test_01 t1
left join tmp_yinfei_test_02 t2
on t1.id=t2.id
where t1.statis_date='' and t2.statis_date=''
;
select t1.*
from tmp_yinfei_test_01 t1
left join tmp_yinfei_test_02 t2
on t1.id=t2.id and t1.statis_date='' and t2.statis_date=''
;
select t1.*
from
(
select * from tmp_yinfei_test_01 where statis_date=''
) t1
left join
(
select * from tmp_yinfei_test_02 where statis_date=''
) t2
on t1.id=t2.id
;
=========================test1===================================== explain select t1.*
from tmp_yinfei_test_01 t1
left join tmp_yinfei_test_02 t2
on t1.id=t2.id
where t1.statis_date='' and t2.statis_date=''
; hive> explain select t1.*
> from tmp_yinfei_test_01 t1
> left join tmp_yinfei_test_02 t2
> on t1.id=t2.id
> where t1.statis_date='' and t2.statis_date=''
> ;
OK
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1 STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: t1
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Filter Operator
predicate: (statis_date = '') (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Reduce Output Operator
key expressions: id (type: string)
sort order: +
Map-reduce partition columns: id (type: string)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
TableScan
alias: t2
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Reduce Output Operator
key expressions: id (type: string)
sort order: +
Map-reduce partition columns: id (type: string)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
value expressions: statis_date (type: string)
Reduce Operator Tree:
Join Operator
condition map:
Left Outer Join0 to 1
keys:
0 id (type: string)
1 id (type: string)
outputColumnNames: _col0, _col6
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Filter Operator
predicate: (_col6 = '') (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Select Operator
expressions: _col0 (type: string), '' (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
File Output Operator
compressed: true
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink Time taken: 0.399 seconds, Fetched: 58 row(s) 结论:t2表会扫全表
=========================test2=====================================
explain select t1.*
from tmp_yinfei_test_01 t1
left join tmp_yinfei_test_02 t2
on t1.id=t2.id and t1.statis_date='' and t2.statis_date=''
;
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1 STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: t1
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Reduce Output Operator
key expressions: id (type: string)
sort order: +
Map-reduce partition columns: id (type: string)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
value expressions: statis_date (type: string)
TableScan
alias: t2
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Filter Operator
predicate: (statis_date = '') (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Reduce Output Operator
key expressions: id (type: string)
sort order: +
Map-reduce partition columns: id (type: string)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Reduce Operator Tree:
Join Operator
condition map:
Left Outer Join0 to 1
filter predicates:
0 {(VALUE._col0 = '')}
1
keys:
0 id (type: string)
1 id (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
File Output Operator
compressed: true
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
结论:t1表会扫全表
=========================test3=====================================
explain select t1.*
from
(
select * from tmp_yinfei_test_01 where statis_date=''
) t1
left join
(
select * from tmp_yinfei_test_02 where statis_date=''
) t2
on t1.id=t2.id
;
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1 STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: tmp_yinfei_test_01
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Filter Operator
predicate: (statis_date = '') (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Select Operator
expressions: id (type: string), '' (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
value expressions: _col1 (type: string)
TableScan
alias: tmp_yinfei_test_02
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Filter Operator
predicate: (statis_date = '') (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Select Operator
expressions: id (type: string)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Reduce Operator Tree:
Join Operator
condition map:
Left Outer Join0 to 1
keys:
0 _col0 (type: string)
1 _col0 (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
File Output Operator
compressed: true
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink

hive提前过滤重要性的更多相关文章

  1. hive -- 协同过滤sql语句

    hive -- 协同过滤sql语句 数据: *.3g.qq.com|腾讯应用宝|应用商店 *.91rb.com|91手机助手|应用商店 *.app.qq.com|腾讯应用宝|应用商店 *.haina. ...

  2. STREAMING HIVE流过滤 官网例子 注意中间用的py脚本

    Simple Example Use Cases MovieLens User Ratings First, create a table with tab-delimited text file f ...

  3. hive条件过滤

    where 过滤 %代表任意个字符,_代表一个字符; \\ 转移字符.\\_代表下划线

  4. Hive计算最大连续登陆天数

    目录 一.背景 二.算法 1. 第一步:排序 2. 第二步:第二列与第三列做日期差值 3. 第三步:按第二列分组求和 4. 第四步:求最大次数 三.扩展(股票最大涨停天数) 强哥说他发现了财富密码,最 ...

  5. hadoop 数据倾斜

    数据倾斜是指,map /reduce程序执行时,reduce节点大部分执行完毕,但是有一个或者几个reduce节点运行很慢,导致整个程序的处理时间很长,这是因为某一个key的条数比其他key多很多(有 ...

  6. orcFile split和读数据原理总结(hive0.13)

    http://blog.csdn.net/zhaorongsheng/article/details/72903431 官网关于orcfile的介绍 背景 Hive的rcfile格式已经使用多年,但是 ...

  7. DataSkew 数据倾斜

    date: 2020-04-21 19:38:00 updated: 2020-04-24 10:26:00 DataSkew 数据倾斜 1. Hive 里的数据倾斜 1.1 null值 空值 尽量提 ...

  8. MySQL之谓词下推

    MySQL之谓词下推 什么是谓词 在SQL中,谓词就是返回boolean值即true或者false的函数,或是隐式转换为boolean的函数.SQL中的谓词主要有 LKIE.BETWEEN.IS NU ...

  9. 大数据SQL中的Join谓词下推,真的那么难懂?

    听到谓词下推这个词,是不是觉得很高大上,找点资料看了半天才能搞懂概念和思想,借这个机会好好学习一下吧. 引用范欣欣大佬的博客中写道,以前经常满大街听到谓词下推,然而对谓词下推却总感觉懵懵懂懂,并不明白 ...

随机推荐

  1. RedHat Linux关闭防火墙的命令

    获得root 控制权限.在“#”下操作. 查看防火墙状态. systemctl status firewalld 临时关闭防火墙命令.重启电脑后,防火墙自动起来. systemctl stop fir ...

  2. Confluence 6 安装 PostgreSQL

    如果你的系统中还没有安装 PostgreSQL 数据库,你需要先下载后进行安装. 在安装 PostgreSQL 时候的一些小经验: 在安装的时候提供的 密码(password )是针对  'postg ...

  3. 关于deepin linux15.6-15.9.1系统播放视频卡顿解决办法

    关于deepin linux15.6-15.9.1系统播放视频卡顿解决办法 chrome浏览器 关闭chrome硬件加速模式 设置>高级>使用硬件加速模式 注释:由于视频卡顿是因显卡驱动问 ...

  4. Selenium WebDriver中鼠标事件

    鼠标点击操作  鼠标点击事件有以下几种类型:  清单 1. 鼠标左键点击   Actions action = new Actions(driver);action.click();// 鼠标左键在当 ...

  5. PHP实现网络Socket及IO多路复用

    一直以来,PHP很少用于socket编程,毕竟是一门脚本语言,效率会成为很大的瓶颈,但是不能说PHP就无法用于socket编程,也不能说PHP的socket编程性能就有多么的低,例如知名的一款PHP ...

  6. SpringMVC文件下载与JSON格式

    点击查看上一章 现在JSON这种数据格式是被使用的非常的广泛的,SpringMVC作为目前最受欢迎的框架,它对JSON这种数据格式提供了非常友好的支持,可以说是简单到爆. 在我们SpringMVC中只 ...

  7. JavaScript(JS)之Javascript对象

    简介: 在JavaScript中除了null和undefined以外其他的数据类型都被定义成了对象,也可以用创建对象的方法定义变量,String.Math.Array.Date.RegExp都是Jav ...

  8. yum安装软件内容

    linux  yum源改为阿里yum源 1.备份 mv /etc/yum.repos.d/CentOS-Base.repo /etc/yum.repos.d/CentOS-Base.repo.back ...

  9. ubuntu18.04进不了桌面

    ubuntu升级18.04进不了桌面 https://chengfeng.site/2018/05/02/ubuntu%E5%8D%87%E7%BA%A718-04%E8%B8%A9%E7%9A%84 ...

  10. 简单的做一个图片上传预览(web前端)

    转载:点击查看原文 在做web项目很多的时候图片都是避免不了的,所以操作图片就成了一个相对比较棘手的问题,其实也不是说很麻烦,只是说上传然后直接预览的过程很恶心,今天简单的做一个处理. 效果预览: & ...