[Hive - LanguageManual ] Explain (待)

EXPLAIN Syntax

EXPLAIN Syntax

Hive provides an EXPLAIN command that shows the execution plan for a query. The syntax for this statement is as follows:

EXPLAIN [EXTENDED|DEPENDENCY|AUTHORIZATION] query

AUTHORIZATION is supported from HIVE 0.14.0 via HIVE-5961.

The use of EXTENDED in the EXPLAIN statement produces extra information about the operators in the plan. This is typically physical information like file names.

A Hive query gets converted into a sequence (it is more an Directed Acyclic Graph) of stages. These stages may be map/reduce stages or they may even be stages that do metastore or file system operations like move and rename. The explain output comprises of three parts:

The Abstract Syntax Tree for the query
The dependencies between the different stages of the plan
The description of each of the stages

The description of the stages itself shows a sequence of operators with the metadata associated with the operators. The metadata may comprise of things like filter expressions for the FilterOperator or the select expressions for the SelectOperator or the output file names for the FileSinkOperator.

As an example, consider the following EXPLAIN query:

EXPLAIN

FROM src INSERT OVERWRITE TABLE dest_g1 SELECT src.key, sum(substr(src.value,4)) GROUP BY src.key;

The output of this statement contains the following parts:

The Abstract Syntax Tree

ABSTRACT SYNTAX TREE:

(TOK_QUERY (TOK_FROM (TOK_TABREF src)) (TOK_INSERT (TOK_DESTINATION (TOK_TAB dest_g1)) (TOK_SELECT (TOK_SELEXPR (TOK_COLREF src key)) (TOK_SELEXPR (TOK_FUNCTION sum (TOK_FUNCTION substr (TOK_COLREF src value)

4)))) (TOK_GROUPBY (TOK_COLREF src key))))

The Dependency Graph

STAGE DEPENDENCIES:

  Stage-1 is a root stage

  Stage-2 depends on stages: Stage-1

  Stage-0 depends on stages: Stage-2

This shows that Stage-1 is the root stage, Stage-2 is executed after Stage-1 is done and Stage-0 is executed after Stage-2 is done.

The plans of each Stage

STAGE PLANS:

Stage: Stage-1

Map Reduce

Alias -> Map Operator Tree:

src

Reduce Output Operator

key expressions:

expr: key

type: string

sort order: +

Map-reduce partition columns:

expr: rand()

type: double

tag: -1

value expressions:

expr: substr(value, 4)

type: string

Reduce Operator Tree:

Group By Operator

aggregations:

expr: sum(UDFToDouble(VALUE.0))

keys:

expr: KEY.0

type: string

mode: partial1

File Output Operator

compressed: false

table:

input format: org.apache.hadoop.mapred.SequenceFileInputFormat

output format: org.apache.hadoop.mapred.SequenceFileOutputFormat

name: binary_table

Stage: Stage-2

Map Reduce

Alias -> Map Operator Tree:

/tmp/hive-zshao/67494501/106593589.10001

Reduce Output Operator

key expressions:

expr: 0

type: string

sort order: +

Map-reduce partition columns:

expr: 0

type: string

tag: -1

value expressions:

expr: 1

type: double

Reduce Operator Tree:

Group By Operator

aggregations:

expr: sum(VALUE.0)

keys:

expr: KEY.0

type: string

mode: final

Select Operator

expressions:

expr: 0

type: string

expr: 1

type: double

Select Operator

expressions:

expr: UDFToInteger(0)

type: int

expr: 1

type: double

File Output Operator

compressed: false

table:

input format: org.apache.hadoop.mapred.TextInputFormat

output format: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat

serde: org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe

name: dest_g1

Stage: Stage-0

Move Operator

tables:

replace: true

table:

input format: org.apache.hadoop.mapred.TextInputFormat

output format: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat

serde: org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe

name: dest_g1

In this example there are 2 map/reduce stages (Stage-1 and Stage-2) and 1 File System related stage (Stage-0). Stage-0 basically moves the results from a temporary directory to the directory corresponding to the table dest_g1.

A map/reduce stage itself comprises of 2 parts:

A mapping from table alias to Map Operator Tree - This mapping tells the mappers which operator tree to call in order to process the rows from a particular table or result of a previous map/reduce stage. In Stage-1 in the above example, the rows from src table are processed by the operator tree rooted at a Reduce Output Operator. Similarly, in Stage-2 the rows of the results of Stage-1 are processed by another operator tree rooted at another Reduce Output Operator. Each of these Reduce Output Operators partitions the data to the reducers according to the criteria shown in the metadata.
A Reduce Operator Tree - This is the operator tree which processes all the rows on the reducer of the map/reduce job. In Stage-1 for example, the Reducer Operator Tree is carrying out a partial aggregation where as the Reducer Operator Tree in Stage-2 computes the final aggregation from the partial aggregates computed in Stage-1

The use of DEPENDENCY in the EXPLAIN statement produces extra information about the inputs in the plan. It shows various attributes for the inputs. For example, for a query like:

EXPLAIN DEPENDENCY

SELECT key, count(1) FROM srcpart WHERE ds IS NOT NULL GROUP BY key

the following output is produced:

{"input_partitions":[{"partitionName":"default<at:var at:name="srcpart" />ds=2008-04-08/hr=11"},{"partitionName":"default<at:var at:name="srcpart" />ds=2008-04-08/hr=12"},{"partitionName":"default<at:var at:name="srcpart" />ds=2008-04-09/hr=11"},{"partitionName":"default<at:var at:name="srcpart" />ds=2008-04-09/hr=12"}],"input_tables":[{"tablename":"default@srcpart","tabletype":"MANAGED_TABLE"}]}

The inputs contain both the tables and the partitions. Note that the table is present even if none of the partitions is accessed in the query.

The dependencies show the parents in case a table is accessed via a view. Consider the following queries:

CREATE VIEW V1 AS SELECT key, value from src;

EXPLAIN DEPENDENCY SELECT * FROM V1;

The following output is produced:

{"input_partitions":[],"input_tables":[{"tablename":"default@v1","tabletype":"VIRTUAL_VIEW"},{"tablename":"default@src","tabletype":"MANAGED_TABLE","tableParents":"[default@v1]"}]}

As above, the inputs contain the view V1 and the table 'src' that the view V1 refers to.

All the outputs are shown if a table is being accessed via multiple parents.

CREATE VIEW V2 AS SELECT ds, key, value FROM srcpart WHERE ds IS NOT NULL;

CREATE VIEW V4 AS

SELECT src1.key, src2.value as value1, src3.value as value2

FROM V1 src1 JOIN V2 src2 on src1.key = src2.key JOIN src src3 ON src2.key = src3.key;

EXPLAIN DEPENDENCY SELECT * FROM V4;

The following output is produced.

{"input_partitions":[{"partitionParents":"[default@v2]","partitionName":"default<at:var at:name="srcpart" />ds=2008-04-08/hr=11"},{"partitionParents":"[default@v2]","partitionName":"default<at:var at:name="srcpart" />ds=2008-04-08/hr=12"},{"partitionParents":"[default@v2]","partitionName":"default<at:var at:name="srcpart" />ds=2008-04-09/hr=11"},{"partitionParents":"[default@v2]","partitionName":"default<at:var at:name="srcpart" />ds=2008-04-09/hr=12"}],"input_tables":[{"tablename":"default@v4","tabletype":"VIRTUAL_VIEW"},{"tablename":"default@v2","tabletype":"VIRTUAL_VIEW","tableParents":"[default@v4]"},{"tablename":"default@v1","tabletype":"VIRTUAL_VIEW","tableParents":"[default@v4]"},{"tablename":"default@src","tabletype":"MANAGED_TABLE","tableParents":"[default@v4, default@v1]"},{"tablename":"default@srcpart","tabletype":"MANAGED_TABLE","tableParents":"[default@v2]"}]}

As can be seen, src is being accessed via parents v1 and v4.

The use of AUTHORIZATION in the EXPLAIN statement shows all entities needed to be authorized to execute the query and authorization failures if exists. For example, for a query like:

EXPLAIN AUTHORIZATION

SELECT * FROM src JOIN srcpart;

the following output is produced:

INPUTS:

default@srcpart

default@src

default@srcpart@ds=2008-04-08/hr=11

default@srcpart@ds=2008-04-08/hr=12

default@srcpart@ds=2008-04-09/hr=11

default@srcpart@ds=2008-04-09/hr=12

OUTPUTS:

hdfs://localhost:9000/tmp/.../-mr-10000

CURRENT_USER:

navis

OPERATION:

QUERY

AUTHORIZATION_FAILURES:

Permission denied: Principal [name=navis, type=USER] does not have following privileges for operation QUERY [[SELECT] on Object [type=TABLE_OR_VIEW, name=default.src], [SELECT] on Object [type=TABLE_OR_VIEW, name=default.srcpart]]

With the FORMATTED keyword, it will be returned in JSON format.

"OUTPUTS":["hdfs://localhost:9000/tmp/.../-mr-10000"],"INPUTS":["default@srcpart","default@src","default@srcpart@ds=2008-04-08/hr=11","default@srcpart@ds=2008-04-08/hr=12","default@srcpart@ds=2008-04-09/hr=11","default@srcpart@ds=2008-04-09/hr=12"],"OPERATION":"QUERY","CURRENT_USER":"navis","AUTHORIZATION_FAILURES":[

"Permission denied: Principal [name=navis, type=USER] does not have following privileges for operation QUERY [[SELECT] on Object [type=TABLE_OR_VIEW, name=default.src], [SELECT] on Object [type=TABLE_OR_VIEW, name=default.srcpart]]"

]}

[Hive - LanguageManual ] Explain (待)的更多相关文章

Hive的Explain命令
Hive的Explain命令,用于显示SQL查询的执行计划. Hive查询被转化成序列阶段(这是一个有向无环图).这些阶段可能是mapper/reducer阶段,或者是Metastore或文件系统的操 ...
[Hive - LanguageManual ] ]SQL Standard Based Hive Authorization
Status of Hive Authorization before Hive 0.13 SQL Standards Based Hive Authorization (New in Hive 0. ...
[Hive - LanguageManual ] Windowing and Analytics Functions （待）
LanguageManual WindowingAndAnalytics Skip to end of metadata Added by Lefty Leverenz, last edi ...
[HIve - LanguageManual] Hive Operators and User-Defined Functions (UDFs)
Hive Operators and User-Defined Functions (UDFs) Hive Operators and User-Defined Functions (UDFs) Bu ...
[Hive - LanguageManual] Import/Export
LanguageManual ImportExport Skip to end of metadata Added by Carl Steinbach, last edited by Le ...
[Hive - LanguageManual] DML: Load, Insert, Update, Delete
LanguageManual DML Hive Data Manipulation Language Hive Data Manipulation Language Loading files int ...
[Hive - LanguageManual] Alter Table/Partition/Column
Alter Table/Partition/Column Alter Table Rename Table Alter Table Properties Alter Table Comment Add ...
Hive LanguageManual DDL
hive语法规则LanguageManual DDL SQL DML 和 DDL 数据操作语言 (DML) 和数据定义语言 (DDL) 一.数据库增删改都在文档里说得也很明白,不重复造车轮二.表 ...
【Hive】explain command throw ClassCastException in 2.3.4
参考:https://issues.apache.org/jira/browse/HIVE-21489 (一)问题描述: Hive-2.3.4 执行 explain select * from sr ...

随机推荐

64位下好神奇啊（增加了PatchGuard技术保护自己，SSDT是相对地址，参数通过寄存器与rdi来传递）
近期可能会有一个64位平台的驱动开发任务,找了些资料,对64位平台下的驱动开发略知一二了,好神奇. 一.在64位系统下,有一项PatchGuard技术,它是微软为了防止自己的代码被Patch,进而影响 ...
Android：储存方式之SharePreferences
使用SharedPreferences保存数据,其实质是采用了xml文件存放数据, 存储位置:/data/data/<package name>/shared_prefs 写入: publ ...
创建高安全性PHP网站的几个实用要点
大家都知道PHP已经是当前最流行的Web应用编程语言了.但是也与其他脚本语言一样,PHP也有几个很危险的安全漏洞.所以在这篇教学文章中,我们将大致看看几个实用的技巧来让你避免一些常见的PHP安全问题. ...
cocos2dx 的基本框架
AppDelegate.h #ifndef _APP_DELEGATE_H_ #define _APP_DELEGATE_H_ #include "cocos2d.h" USING ...
MySql开启跟踪
使用 show variables like '%log%'; 查看general_log.log_output.general_log_file 更新为 set global general_log ...
1930. Ivan's Car（ｓｐｆａ）
1930 简单二维标记一下是上坡还是下坡 #include <iostream> #include<cstdio> #include<cstring> #incl ...
The dialect was not set. Set the property hibernate.dialect
The dialect was not set. Set the property hibernate.dialect load hibernate.cfg.xml 出 ...
EASYUI+MVC4通用权限管理平台
通用权限案例平台在经过几年的实际项目使用,并取得了不错的用户好评.在平台开发完成后,特抽空总结一下平台知识,请各位在以后的时间里,关注博客的更新. 1.EASYUI+MVC4通用权限管理平台--前言 ...
aspx中的表单验证 jquery.validate.js 的使用以及 jquery.validate相关扩展验证(Jquery表单提交验证插件)
这一期我们先讲在aspx中使用 jquery.validate插件进行表单的验证, 关于MVC中使用 validate我们在下一期中再讲上面是效果,下面来说使用步骤 jQuery.Valid ...
Linux busybox mount -a fstab
/*********************************************************************** * Linux busybox mount -a fs ...

[Hive - LanguageManual ] Explain (待)

EXPLAIN Syntax

[Hive - LanguageManual ] Explain (待)的更多相关文章

随机推荐

热门专题