hive提前过滤重要性
hive提前过滤
create table sospdm.tmp_yinfei_test_01
(
id string
)
partitioned by (statis_date string)
; create table sospdm.tmp_yinfei_test_02
(
id string
)
partitioned by (statis_date string)
; select t1.*
from tmp_yinfei_test_01 t1
left join tmp_yinfei_test_02 t2
on t1.id=t2.id
where t1.statis_date='' and t2.statis_date=''
;
select t1.*
from tmp_yinfei_test_01 t1
left join tmp_yinfei_test_02 t2
on t1.id=t2.id and t1.statis_date='' and t2.statis_date=''
;
select t1.*
from
(
select * from tmp_yinfei_test_01 where statis_date=''
) t1
left join
(
select * from tmp_yinfei_test_02 where statis_date=''
) t2
on t1.id=t2.id
;
=========================test1===================================== explain select t1.*
from tmp_yinfei_test_01 t1
left join tmp_yinfei_test_02 t2
on t1.id=t2.id
where t1.statis_date='' and t2.statis_date=''
; hive> explain select t1.*
> from tmp_yinfei_test_01 t1
> left join tmp_yinfei_test_02 t2
> on t1.id=t2.id
> where t1.statis_date='' and t2.statis_date=''
> ;
OK
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1 STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: t1
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Filter Operator
predicate: (statis_date = '') (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Reduce Output Operator
key expressions: id (type: string)
sort order: +
Map-reduce partition columns: id (type: string)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
TableScan
alias: t2
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Reduce Output Operator
key expressions: id (type: string)
sort order: +
Map-reduce partition columns: id (type: string)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
value expressions: statis_date (type: string)
Reduce Operator Tree:
Join Operator
condition map:
Left Outer Join0 to 1
keys:
0 id (type: string)
1 id (type: string)
outputColumnNames: _col0, _col6
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Filter Operator
predicate: (_col6 = '') (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Select Operator
expressions: _col0 (type: string), '' (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
File Output Operator
compressed: true
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink Time taken: 0.399 seconds, Fetched: 58 row(s) 结论:t2表会扫全表
=========================test2=====================================
explain select t1.*
from tmp_yinfei_test_01 t1
left join tmp_yinfei_test_02 t2
on t1.id=t2.id and t1.statis_date='' and t2.statis_date=''
;
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1 STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: t1
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Reduce Output Operator
key expressions: id (type: string)
sort order: +
Map-reduce partition columns: id (type: string)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
value expressions: statis_date (type: string)
TableScan
alias: t2
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Filter Operator
predicate: (statis_date = '') (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Reduce Output Operator
key expressions: id (type: string)
sort order: +
Map-reduce partition columns: id (type: string)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Reduce Operator Tree:
Join Operator
condition map:
Left Outer Join0 to 1
filter predicates:
0 {(VALUE._col0 = '')}
1
keys:
0 id (type: string)
1 id (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
File Output Operator
compressed: true
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
结论:t1表会扫全表
=========================test3=====================================
explain select t1.*
from
(
select * from tmp_yinfei_test_01 where statis_date=''
) t1
left join
(
select * from tmp_yinfei_test_02 where statis_date=''
) t2
on t1.id=t2.id
;
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1 STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: tmp_yinfei_test_01
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Filter Operator
predicate: (statis_date = '') (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Select Operator
expressions: id (type: string), '' (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
value expressions: _col1 (type: string)
TableScan
alias: tmp_yinfei_test_02
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Filter Operator
predicate: (statis_date = '') (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Select Operator
expressions: id (type: string)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Reduce Operator Tree:
Join Operator
condition map:
Left Outer Join0 to 1
keys:
0 _col0 (type: string)
1 _col0 (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
File Output Operator
compressed: true
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
hive提前过滤重要性的更多相关文章
- hive -- 协同过滤sql语句
hive -- 协同过滤sql语句 数据: *.3g.qq.com|腾讯应用宝|应用商店 *.91rb.com|91手机助手|应用商店 *.app.qq.com|腾讯应用宝|应用商店 *.haina. ...
- STREAMING HIVE流过滤 官网例子 注意中间用的py脚本
Simple Example Use Cases MovieLens User Ratings First, create a table with tab-delimited text file f ...
- hive条件过滤
where 过滤 %代表任意个字符,_代表一个字符; \\ 转移字符.\\_代表下划线
- Hive计算最大连续登陆天数
目录 一.背景 二.算法 1. 第一步:排序 2. 第二步:第二列与第三列做日期差值 3. 第三步:按第二列分组求和 4. 第四步:求最大次数 三.扩展(股票最大涨停天数) 强哥说他发现了财富密码,最 ...
- hadoop 数据倾斜
数据倾斜是指,map /reduce程序执行时,reduce节点大部分执行完毕,但是有一个或者几个reduce节点运行很慢,导致整个程序的处理时间很长,这是因为某一个key的条数比其他key多很多(有 ...
- orcFile split和读数据原理总结(hive0.13)
http://blog.csdn.net/zhaorongsheng/article/details/72903431 官网关于orcfile的介绍 背景 Hive的rcfile格式已经使用多年,但是 ...
- DataSkew 数据倾斜
date: 2020-04-21 19:38:00 updated: 2020-04-24 10:26:00 DataSkew 数据倾斜 1. Hive 里的数据倾斜 1.1 null值 空值 尽量提 ...
- MySQL之谓词下推
MySQL之谓词下推 什么是谓词 在SQL中,谓词就是返回boolean值即true或者false的函数,或是隐式转换为boolean的函数.SQL中的谓词主要有 LKIE.BETWEEN.IS NU ...
- 大数据SQL中的Join谓词下推,真的那么难懂?
听到谓词下推这个词,是不是觉得很高大上,找点资料看了半天才能搞懂概念和思想,借这个机会好好学习一下吧. 引用范欣欣大佬的博客中写道,以前经常满大街听到谓词下推,然而对谓词下推却总感觉懵懵懂懂,并不明白 ...
随机推荐
- 自定义Dialog的详细步骤(实现自定义样式一般原理)
现在很多App的提示对话框都非常有个性,然而你还用系统的对话框样式,是不是觉得很落后呢,今天我就给大家讲讲怎样自定义自己的Dialog,学会了之后,你就会根据自家app的主题,设计出相应的Dialog ...
- 【进阶1-5期】JavaScript深入之4类常见内存泄漏及如何避免(转)
这是我在公众号(高级前端进阶)看到的文章,现在做笔记 https://mp.weixin.qq.com/s/RZ8Lpkyk8lz6z5H8Q8SiEQ 垃圾回收算法 常用垃圾回收算法叫做**标记清除 ...
- 信息摘要算法之五:HMAC算法分析与实现
MAC(Message Authentication Code,消息认证码算法)是含有密钥散列函数算法,兼容了MD和SHA算法的特性,并在此基础上加上了密钥.因此MAC算法也经常被称作HMAC算法. ...
- 处理:“ORA-00257: archiver error. Connect internal only, until freed”的错误问题
注:本文参考了< ORA-00257: archiver error. Connect internal only, until freed 错误的处理方法 > 一:问题背景: 今天在 ...
- 银联支付java版
注:本文来源于:< 银联支付java版 > 银联支付java版 2016年09月18日 15:55:20 阅读数:2431 首先去银联官网注册测试支付账户 下载对应的demo[ ...
- linux之cp命令(转载)
Linux中使用cp命令复制文件(夹),本文就日常工作中常用的cp命令整理如下. 一.复制一个源文件到目标文件(夹). 命令格式为:cp 源文件 目标文件(夹) 这个是使用频率最多的命令,负责把一个源 ...
- vue.js 监听属性的学习/ 千米、米的转换 /时、分、秒 的转换
<!DOCTYPE html><html lang="en"><head> <meta charset="UTF-8" ...
- ShadingJdbc学习
可参考:https://blog.csdn.net/jadebai/article/details/86716082 https://blog.csdn.net/jadebai/article/det ...
- Distance
1191: Distance 时间限制: 1 Sec 内存限制: 32 MB 题目描述 There is a battle field. It is a square with the side l ...
- Linux文件系统及文件类型
Linux文件系统: 根文件系统(rootfs) root filesystem LSB, FHS: (FileSystem... /etc, /usr, /var, /root.... /bo ...