hive提前过滤

  1. create table sospdm.tmp_yinfei_test_01
  2. (
  3. id string
  4. )
  5. partitioned by (statis_date string)
  6. ;
  7.  
  8. create table sospdm.tmp_yinfei_test_02
  9. (
  10. id string
  11. )
  12. partitioned by (statis_date string)
  13. ;
  14.  
  15. select t1.*
  16. from tmp_yinfei_test_01 t1
  17. left join tmp_yinfei_test_02 t2
  18. on t1.id=t2.id
  19. where t1.statis_date='' and t2.statis_date=''
  20. ;
  21. select t1.*
  22. from tmp_yinfei_test_01 t1
  23. left join tmp_yinfei_test_02 t2
  24. on t1.id=t2.id and t1.statis_date='' and t2.statis_date=''
  25. ;
  26. select t1.*
  27. from
  28. (
  29. select * from tmp_yinfei_test_01 where statis_date=''
  30. ) t1
  31. left join
  32. (
  33. select * from tmp_yinfei_test_02 where statis_date=''
  34. ) t2
  35. on t1.id=t2.id
  36. ;
  37. =========================test1=====================================
  38.  
  39. explain select t1.*
  40. from tmp_yinfei_test_01 t1
  41. left join tmp_yinfei_test_02 t2
  42. on t1.id=t2.id
  43. where t1.statis_date='' and t2.statis_date=''
  44. ;
  45.  
  46. hive> explain select t1.*
  47. > from tmp_yinfei_test_01 t1
  48. > left join tmp_yinfei_test_02 t2
  49. > on t1.id=t2.id
  50. > where t1.statis_date='' and t2.statis_date=''
  51. > ;
  52. OK
  53. STAGE DEPENDENCIES:
  54. Stage-1 is a root stage
  55. Stage-0 depends on stages: Stage-1
  56.  
  57. STAGE PLANS:
  58. Stage: Stage-1
  59. Map Reduce
  60. Map Operator Tree:
  61. TableScan
  62. alias: t1
  63. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  64. Filter Operator
  65. predicate: (statis_date = '') (type: boolean)
  66. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  67. Reduce Output Operator
  68. key expressions: id (type: string)
  69. sort order: +
  70. Map-reduce partition columns: id (type: string)
  71. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  72. TableScan
  73. alias: t2
  74. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  75. Reduce Output Operator
  76. key expressions: id (type: string)
  77. sort order: +
  78. Map-reduce partition columns: id (type: string)
  79. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  80. value expressions: statis_date (type: string)
  81. Reduce Operator Tree:
  82. Join Operator
  83. condition map:
  84. Left Outer Join0 to 1
  85. keys:
  86. 0 id (type: string)
  87. 1 id (type: string)
  88. outputColumnNames: _col0, _col6
  89. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  90. Filter Operator
  91. predicate: (_col6 = '') (type: boolean)
  92. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  93. Select Operator
  94. expressions: _col0 (type: string), '' (type: string)
  95. outputColumnNames: _col0, _col1
  96. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  97. File Output Operator
  98. compressed: true
  99. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  100. table:
  101. input format: org.apache.hadoop.mapred.TextInputFormat
  102. output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
  103. serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
  104.  
  105. Stage: Stage-0
  106. Fetch Operator
  107. limit: -1
  108. Processor Tree:
  109. ListSink
  110.  
  111. Time taken: 0.399 seconds, Fetched: 58 row(s)
  112.  
  113. 结论:t2表会扫全表
  114. =========================test2=====================================
  115. explain select t1.*
  116. from tmp_yinfei_test_01 t1
  117. left join tmp_yinfei_test_02 t2
  118. on t1.id=t2.id and t1.statis_date='' and t2.statis_date=''
  119. ;
  120. STAGE DEPENDENCIES:
  121. Stage-1 is a root stage
  122. Stage-0 depends on stages: Stage-1
  123.  
  124. STAGE PLANS:
  125. Stage: Stage-1
  126. Map Reduce
  127. Map Operator Tree:
  128. TableScan
  129. alias: t1
  130. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  131. Reduce Output Operator
  132. key expressions: id (type: string)
  133. sort order: +
  134. Map-reduce partition columns: id (type: string)
  135. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  136. value expressions: statis_date (type: string)
  137. TableScan
  138. alias: t2
  139. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  140. Filter Operator
  141. predicate: (statis_date = '') (type: boolean)
  142. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  143. Reduce Output Operator
  144. key expressions: id (type: string)
  145. sort order: +
  146. Map-reduce partition columns: id (type: string)
  147. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  148. Reduce Operator Tree:
  149. Join Operator
  150. condition map:
  151. Left Outer Join0 to 1
  152. filter predicates:
  153. 0 {(VALUE._col0 = '')}
  154. 1
  155. keys:
  156. 0 id (type: string)
  157. 1 id (type: string)
  158. outputColumnNames: _col0, _col1
  159. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  160. File Output Operator
  161. compressed: true
  162. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  163. table:
  164. input format: org.apache.hadoop.mapred.TextInputFormat
  165. output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
  166. serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
  167.  
  168. Stage: Stage-0
  169. Fetch Operator
  170. limit: -1
  171. Processor Tree:
  172. ListSink
  173. 结论:t1表会扫全表
  174. =========================test3=====================================
  175. explain select t1.*
  176. from
  177. (
  178. select * from tmp_yinfei_test_01 where statis_date=''
  179. ) t1
  180. left join
  181. (
  182. select * from tmp_yinfei_test_02 where statis_date=''
  183. ) t2
  184. on t1.id=t2.id
  185. ;
  186. STAGE DEPENDENCIES:
  187. Stage-1 is a root stage
  188. Stage-0 depends on stages: Stage-1
  189.  
  190. STAGE PLANS:
  191. Stage: Stage-1
  192. Map Reduce
  193. Map Operator Tree:
  194. TableScan
  195. alias: tmp_yinfei_test_01
  196. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  197. Filter Operator
  198. predicate: (statis_date = '') (type: boolean)
  199. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  200. Select Operator
  201. expressions: id (type: string), '' (type: string)
  202. outputColumnNames: _col0, _col1
  203. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  204. Reduce Output Operator
  205. key expressions: _col0 (type: string)
  206. sort order: +
  207. Map-reduce partition columns: _col0 (type: string)
  208. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  209. value expressions: _col1 (type: string)
  210. TableScan
  211. alias: tmp_yinfei_test_02
  212. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  213. Filter Operator
  214. predicate: (statis_date = '') (type: boolean)
  215. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  216. Select Operator
  217. expressions: id (type: string)
  218. outputColumnNames: _col0
  219. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  220. Reduce Output Operator
  221. key expressions: _col0 (type: string)
  222. sort order: +
  223. Map-reduce partition columns: _col0 (type: string)
  224. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  225. Reduce Operator Tree:
  226. Join Operator
  227. condition map:
  228. Left Outer Join0 to 1
  229. keys:
  230. 0 _col0 (type: string)
  231. 1 _col0 (type: string)
  232. outputColumnNames: _col0, _col1
  233. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  234. File Output Operator
  235. compressed: true
  236. Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  237. table:
  238. input format: org.apache.hadoop.mapred.TextInputFormat
  239. output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
  240. serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
  241.  
  242. Stage: Stage-0
  243. Fetch Operator
  244. limit: -1
  245. Processor Tree:
  246. ListSink

hive提前过滤重要性的更多相关文章

  1. hive -- 协同过滤sql语句

    hive -- 协同过滤sql语句 数据: *.3g.qq.com|腾讯应用宝|应用商店 *.91rb.com|91手机助手|应用商店 *.app.qq.com|腾讯应用宝|应用商店 *.haina. ...

  2. STREAMING HIVE流过滤 官网例子 注意中间用的py脚本

    Simple Example Use Cases MovieLens User Ratings First, create a table with tab-delimited text file f ...

  3. hive条件过滤

    where 过滤 %代表任意个字符,_代表一个字符; \\ 转移字符.\\_代表下划线

  4. Hive计算最大连续登陆天数

    目录 一.背景 二.算法 1. 第一步:排序 2. 第二步:第二列与第三列做日期差值 3. 第三步:按第二列分组求和 4. 第四步:求最大次数 三.扩展(股票最大涨停天数) 强哥说他发现了财富密码,最 ...

  5. hadoop 数据倾斜

    数据倾斜是指,map /reduce程序执行时,reduce节点大部分执行完毕,但是有一个或者几个reduce节点运行很慢,导致整个程序的处理时间很长,这是因为某一个key的条数比其他key多很多(有 ...

  6. orcFile split和读数据原理总结(hive0.13)

    http://blog.csdn.net/zhaorongsheng/article/details/72903431 官网关于orcfile的介绍 背景 Hive的rcfile格式已经使用多年,但是 ...

  7. DataSkew 数据倾斜

    date: 2020-04-21 19:38:00 updated: 2020-04-24 10:26:00 DataSkew 数据倾斜 1. Hive 里的数据倾斜 1.1 null值 空值 尽量提 ...

  8. MySQL之谓词下推

    MySQL之谓词下推 什么是谓词 在SQL中,谓词就是返回boolean值即true或者false的函数,或是隐式转换为boolean的函数.SQL中的谓词主要有 LKIE.BETWEEN.IS NU ...

  9. 大数据SQL中的Join谓词下推,真的那么难懂?

    听到谓词下推这个词,是不是觉得很高大上,找点资料看了半天才能搞懂概念和思想,借这个机会好好学习一下吧. 引用范欣欣大佬的博客中写道,以前经常满大街听到谓词下推,然而对谓词下推却总感觉懵懵懂懂,并不明白 ...

随机推荐

  1. web.xml详细选项配置

    Web.xml常用元素 <web-app> <display-name></display-name>定义了WEB应用的名字 <description> ...

  2. spring jdbctemplate调用存储过程,返回list对象

    注:本文来源于<  spring jdbctemplate调用存储过程,返回list对象 > spring jdbctemplate调用存储过程,返回list对象 方法: /** * 调用 ...

  3. python 爬虫简化树状图

  4. linux之cp命令(转载)

    Linux中使用cp命令复制文件(夹),本文就日常工作中常用的cp命令整理如下. 一.复制一个源文件到目标文件(夹). 命令格式为:cp 源文件 目标文件(夹) 这个是使用频率最多的命令,负责把一个源 ...

  5. Exception类的学习与继承总结

    日期:2018.11.11 星期日 博客期:023 Exception类的学习与继承总结 说起来我们上课还是说过的!老师提到了报错问题出现主要分Exception和Error两类!第一次遇见这个问题是 ...

  6. matlab 测试 数字二次混频

    % test2 clear; clf; close all Fs=800000;%采样频率800k fz=80000;%载波频率80k fz1=3000;%载波频率3k fj=79000;%基波频率7 ...

  7. Python基础之re模块(正则表达式)

    就其本质而言,正则表达式(或 RE)是一种小型的.高度专业化的编程语言,(在Python中)它内嵌在Python中, 并通过 re 模块实现.正则表达式模式被编译成一系列的字节码,然后由用 C 编写的 ...

  8. 编辑方法分享之如何编辑PDF文件内容

    我们现在在工作中会经常使用到PDF文件,还会有遇到需要编辑PDF文件的时候,PDF文件的编辑问题一直是个大难题.很多朋友在面对PDF文件的时候束手无策,不知道该怎么对它进行编辑.下面小编就教给大家一个 ...

  9. 小学生都看得懂的C语言入门(1): 基础/判别/循环

    c基础入门, 小学生也可以都看得懂!!!! 安装一个编译器, 这方面我不太懂, 安装了DEV-C++  ,体积不大,30M左右吧, 感觉挺好用,初学者够了. 介绍下DEV 的快键键: 恢复 Ctrl+ ...

  10. cf862d 交互式二分

    /* 二分搜索出一个01段或10即可 先用n个0确定1的个数num 然后测试区间[l,mid]是否全是0或全是1 如果是,则l=mid,否则r=mid,直到l+1==r 然后再测试l是1还是r是1 如 ...