hive提前过滤重要性
hive提前过滤
- create table sospdm.tmp_yinfei_test_01
- (
- id string
- )
- partitioned by (statis_date string)
- ;
- create table sospdm.tmp_yinfei_test_02
- (
- id string
- )
- partitioned by (statis_date string)
- ;
- select t1.*
- from tmp_yinfei_test_01 t1
- left join tmp_yinfei_test_02 t2
- on t1.id=t2.id
- where t1.statis_date='' and t2.statis_date=''
- ;
- select t1.*
- from tmp_yinfei_test_01 t1
- left join tmp_yinfei_test_02 t2
- on t1.id=t2.id and t1.statis_date='' and t2.statis_date=''
- ;
- select t1.*
- from
- (
- select * from tmp_yinfei_test_01 where statis_date=''
- ) t1
- left join
- (
- select * from tmp_yinfei_test_02 where statis_date=''
- ) t2
- on t1.id=t2.id
- ;
- =========================test1=====================================
- explain select t1.*
- from tmp_yinfei_test_01 t1
- left join tmp_yinfei_test_02 t2
- on t1.id=t2.id
- where t1.statis_date='' and t2.statis_date=''
- ;
- hive> explain select t1.*
- > from tmp_yinfei_test_01 t1
- > left join tmp_yinfei_test_02 t2
- > on t1.id=t2.id
- > where t1.statis_date='' and t2.statis_date=''
- > ;
- OK
- STAGE DEPENDENCIES:
- Stage-1 is a root stage
- Stage-0 depends on stages: Stage-1
- STAGE PLANS:
- Stage: Stage-1
- Map Reduce
- Map Operator Tree:
- TableScan
- alias: t1
- Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
- Filter Operator
- predicate: (statis_date = '') (type: boolean)
- Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
- Reduce Output Operator
- key expressions: id (type: string)
- sort order: +
- Map-reduce partition columns: id (type: string)
- Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
- TableScan
- alias: t2
- Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
- Reduce Output Operator
- key expressions: id (type: string)
- sort order: +
- Map-reduce partition columns: id (type: string)
- Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
- value expressions: statis_date (type: string)
- Reduce Operator Tree:
- Join Operator
- condition map:
- Left Outer Join0 to 1
- keys:
- 0 id (type: string)
- 1 id (type: string)
- outputColumnNames: _col0, _col6
- Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
- Filter Operator
- predicate: (_col6 = '') (type: boolean)
- Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
- Select Operator
- expressions: _col0 (type: string), '' (type: string)
- outputColumnNames: _col0, _col1
- Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
- File Output Operator
- compressed: true
- Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
- table:
- input format: org.apache.hadoop.mapred.TextInputFormat
- output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
- serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
- Stage: Stage-0
- Fetch Operator
- limit: -1
- Processor Tree:
- ListSink
- Time taken: 0.399 seconds, Fetched: 58 row(s)
- 结论:t2表会扫全表
- =========================test2=====================================
- explain select t1.*
- from tmp_yinfei_test_01 t1
- left join tmp_yinfei_test_02 t2
- on t1.id=t2.id and t1.statis_date='' and t2.statis_date=''
- ;
- STAGE DEPENDENCIES:
- Stage-1 is a root stage
- Stage-0 depends on stages: Stage-1
- STAGE PLANS:
- Stage: Stage-1
- Map Reduce
- Map Operator Tree:
- TableScan
- alias: t1
- Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
- Reduce Output Operator
- key expressions: id (type: string)
- sort order: +
- Map-reduce partition columns: id (type: string)
- Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
- value expressions: statis_date (type: string)
- TableScan
- alias: t2
- Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
- Filter Operator
- predicate: (statis_date = '') (type: boolean)
- Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
- Reduce Output Operator
- key expressions: id (type: string)
- sort order: +
- Map-reduce partition columns: id (type: string)
- Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
- Reduce Operator Tree:
- Join Operator
- condition map:
- Left Outer Join0 to 1
- filter predicates:
- 0 {(VALUE._col0 = '')}
- 1
- keys:
- 0 id (type: string)
- 1 id (type: string)
- outputColumnNames: _col0, _col1
- Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
- File Output Operator
- compressed: true
- Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
- table:
- input format: org.apache.hadoop.mapred.TextInputFormat
- output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
- serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
- Stage: Stage-0
- Fetch Operator
- limit: -1
- Processor Tree:
- ListSink
- 结论:t1表会扫全表
- =========================test3=====================================
- explain select t1.*
- from
- (
- select * from tmp_yinfei_test_01 where statis_date=''
- ) t1
- left join
- (
- select * from tmp_yinfei_test_02 where statis_date=''
- ) t2
- on t1.id=t2.id
- ;
- STAGE DEPENDENCIES:
- Stage-1 is a root stage
- Stage-0 depends on stages: Stage-1
- STAGE PLANS:
- Stage: Stage-1
- Map Reduce
- Map Operator Tree:
- TableScan
- alias: tmp_yinfei_test_01
- Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
- Filter Operator
- predicate: (statis_date = '') (type: boolean)
- Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
- Select Operator
- expressions: id (type: string), '' (type: string)
- outputColumnNames: _col0, _col1
- Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
- Reduce Output Operator
- key expressions: _col0 (type: string)
- sort order: +
- Map-reduce partition columns: _col0 (type: string)
- Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
- value expressions: _col1 (type: string)
- TableScan
- alias: tmp_yinfei_test_02
- Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
- Filter Operator
- predicate: (statis_date = '') (type: boolean)
- Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
- Select Operator
- expressions: id (type: string)
- outputColumnNames: _col0
- Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
- Reduce Output Operator
- key expressions: _col0 (type: string)
- sort order: +
- Map-reduce partition columns: _col0 (type: string)
- Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
- Reduce Operator Tree:
- Join Operator
- condition map:
- Left Outer Join0 to 1
- keys:
- 0 _col0 (type: string)
- 1 _col0 (type: string)
- outputColumnNames: _col0, _col1
- Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
- File Output Operator
- compressed: true
- Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
- table:
- input format: org.apache.hadoop.mapred.TextInputFormat
- output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
- serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
- Stage: Stage-0
- Fetch Operator
- limit: -1
- Processor Tree:
- ListSink
hive提前过滤重要性的更多相关文章
- hive -- 协同过滤sql语句
hive -- 协同过滤sql语句 数据: *.3g.qq.com|腾讯应用宝|应用商店 *.91rb.com|91手机助手|应用商店 *.app.qq.com|腾讯应用宝|应用商店 *.haina. ...
- STREAMING HIVE流过滤 官网例子 注意中间用的py脚本
Simple Example Use Cases MovieLens User Ratings First, create a table with tab-delimited text file f ...
- hive条件过滤
where 过滤 %代表任意个字符,_代表一个字符; \\ 转移字符.\\_代表下划线
- Hive计算最大连续登陆天数
目录 一.背景 二.算法 1. 第一步:排序 2. 第二步:第二列与第三列做日期差值 3. 第三步:按第二列分组求和 4. 第四步:求最大次数 三.扩展(股票最大涨停天数) 强哥说他发现了财富密码,最 ...
- hadoop 数据倾斜
数据倾斜是指,map /reduce程序执行时,reduce节点大部分执行完毕,但是有一个或者几个reduce节点运行很慢,导致整个程序的处理时间很长,这是因为某一个key的条数比其他key多很多(有 ...
- orcFile split和读数据原理总结(hive0.13)
http://blog.csdn.net/zhaorongsheng/article/details/72903431 官网关于orcfile的介绍 背景 Hive的rcfile格式已经使用多年,但是 ...
- DataSkew 数据倾斜
date: 2020-04-21 19:38:00 updated: 2020-04-24 10:26:00 DataSkew 数据倾斜 1. Hive 里的数据倾斜 1.1 null值 空值 尽量提 ...
- MySQL之谓词下推
MySQL之谓词下推 什么是谓词 在SQL中,谓词就是返回boolean值即true或者false的函数,或是隐式转换为boolean的函数.SQL中的谓词主要有 LKIE.BETWEEN.IS NU ...
- 大数据SQL中的Join谓词下推,真的那么难懂?
听到谓词下推这个词,是不是觉得很高大上,找点资料看了半天才能搞懂概念和思想,借这个机会好好学习一下吧. 引用范欣欣大佬的博客中写道,以前经常满大街听到谓词下推,然而对谓词下推却总感觉懵懵懂懂,并不明白 ...
随机推荐
- web.xml详细选项配置
Web.xml常用元素 <web-app> <display-name></display-name>定义了WEB应用的名字 <description> ...
- spring jdbctemplate调用存储过程,返回list对象
注:本文来源于< spring jdbctemplate调用存储过程,返回list对象 > spring jdbctemplate调用存储过程,返回list对象 方法: /** * 调用 ...
- python 爬虫简化树状图
- linux之cp命令(转载)
Linux中使用cp命令复制文件(夹),本文就日常工作中常用的cp命令整理如下. 一.复制一个源文件到目标文件(夹). 命令格式为:cp 源文件 目标文件(夹) 这个是使用频率最多的命令,负责把一个源 ...
- Exception类的学习与继承总结
日期:2018.11.11 星期日 博客期:023 Exception类的学习与继承总结 说起来我们上课还是说过的!老师提到了报错问题出现主要分Exception和Error两类!第一次遇见这个问题是 ...
- matlab 测试 数字二次混频
% test2 clear; clf; close all Fs=800000;%采样频率800k fz=80000;%载波频率80k fz1=3000;%载波频率3k fj=79000;%基波频率7 ...
- Python基础之re模块(正则表达式)
就其本质而言,正则表达式(或 RE)是一种小型的.高度专业化的编程语言,(在Python中)它内嵌在Python中, 并通过 re 模块实现.正则表达式模式被编译成一系列的字节码,然后由用 C 编写的 ...
- 编辑方法分享之如何编辑PDF文件内容
我们现在在工作中会经常使用到PDF文件,还会有遇到需要编辑PDF文件的时候,PDF文件的编辑问题一直是个大难题.很多朋友在面对PDF文件的时候束手无策,不知道该怎么对它进行编辑.下面小编就教给大家一个 ...
- 小学生都看得懂的C语言入门(1): 基础/判别/循环
c基础入门, 小学生也可以都看得懂!!!! 安装一个编译器, 这方面我不太懂, 安装了DEV-C++ ,体积不大,30M左右吧, 感觉挺好用,初学者够了. 介绍下DEV 的快键键: 恢复 Ctrl+ ...
- cf862d 交互式二分
/* 二分搜索出一个01段或10即可 先用n个0确定1的个数num 然后测试区间[l,mid]是否全是0或全是1 如果是,则l=mid,否则r=mid,直到l+1==r 然后再测试l是1还是r是1 如 ...