[Hive - LanguageManual ] Explain (待)
EXPLAIN Syntax
Hive provides an EXPLAIN
command that shows the execution plan for a query. The syntax for this statement is as follows:
EXPLAIN [EXTENDED|DEPENDENCY|AUTHORIZATION] query |
AUTHORIZATION
is supported from HIVE 0.14.0 via HIVE-5961.
The use of EXTENDED
in the EXPLAIN
statement produces extra information about the operators in the plan. This is typically physical information like file names.
A Hive query gets converted into a sequence (it is more an Directed Acyclic Graph) of stages. These stages may be map/reduce stages or they may even be stages that do metastore or file system operations like move and rename. The explain output comprises of three parts:
- The Abstract Syntax Tree for the query
- The dependencies between the different stages of the plan
- The description of each of the stages
The description of the stages itself shows a sequence of operators with the metadata associated with the operators. The metadata may comprise of things like filter expressions for the FilterOperator or the select expressions for the SelectOperator or the output file names for the FileSinkOperator.
As an example, consider the following EXPLAIN
query:
EXPLAIN FROM src INSERT OVERWRITE TABLE dest_g1 SELECT src.key, sum(substr(src.value, 4 )) GROUP BY src.key; |
The output of this statement contains the following parts:
The Abstract Syntax Tree
ABSTRACT SYNTAX TREE:
(TOK_QUERY (TOK_FROM (TOK_TABREF src)) (TOK_INSERT (TOK_DESTINATION (TOK_TAB dest_g1)) (TOK_SELECT (TOK_SELEXPR (TOK_COLREF src key)) (TOK_SELEXPR (TOK_FUNCTION sum (TOK_FUNCTION substr (TOK_COLREF src value)
4
)))) (TOK_GROUPBY (TOK_COLREF src key))))
The Dependency Graph
STAGE DEPENDENCIES:
Stage-
1
is a root stage
Stage-
2
depends on stages: Stage-
1
Stage-
0
depends on stages: Stage-
2
This shows that Stage-1 is the root stage, Stage-2 is executed after Stage-1 is done and Stage-0 is executed after Stage-2 is done.
The plans of each Stage
STAGE PLANS:
Stage: Stage-
1
Map Reduce
Alias -> Map Operator Tree:
src
Reduce Output Operator
key expressions:
expr: key
type: string
sort order: +
Map-reduce partition columns:
expr: rand()
type:
double
tag: -
1
value expressions:
expr: substr(value,
4
)
type: string
Reduce Operator Tree:
Group By Operator
aggregations:
expr: sum(UDFToDouble(VALUE.
0
))
keys:
expr: KEY.
0
type: string
mode: partial1
File Output Operator
compressed:
false
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.mapred.SequenceFileOutputFormat
name: binary_table
Stage: Stage-
2
Map Reduce
Alias -> Map Operator Tree:
/tmp/hive-zshao/
67494501
/
106593589.10001
Reduce Output Operator
key expressions:
expr:
0
type: string
sort order: +
Map-reduce partition columns:
expr:
0
type: string
tag: -
1
value expressions:
expr:
1
type:
double
Reduce Operator Tree:
Group By Operator
aggregations:
expr: sum(VALUE.
0
)
keys:
expr: KEY.
0
type: string
mode:
final
Select Operator
expressions:
expr:
0
type: string
expr:
1
type:
double
Select Operator
expressions:
expr: UDFToInteger(
0
)
type:
int
expr:
1
type:
double
File Output Operator
compressed:
false
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe
name: dest_g1
Stage: Stage-
0
Move Operator
tables:
replace:
true
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe
name: dest_g1
In this example there are 2 map/reduce stages (Stage-1 and Stage-2) and 1 File System related stage (Stage-0). Stage-0 basically moves the results from a temporary directory to the directory corresponding to the table dest_g1.
A map/reduce stage itself comprises of 2 parts:
- A mapping from table alias to Map Operator Tree - This mapping tells the mappers which operator tree to call in order to process the rows from a particular table or result of a previous map/reduce stage. In Stage-1 in the above example, the rows from src table are processed by the operator tree rooted at a Reduce Output Operator. Similarly, in Stage-2 the rows of the results of Stage-1 are processed by another operator tree rooted at another Reduce Output Operator. Each of these Reduce Output Operators partitions the data to the reducers according to the criteria shown in the metadata.
- A Reduce Operator Tree - This is the operator tree which processes all the rows on the reducer of the map/reduce job. In Stage-1 for example, the Reducer Operator Tree is carrying out a partial aggregation where as the Reducer Operator Tree in Stage-2 computes the final aggregation from the partial aggregates computed in Stage-1
The use of DEPENDENCY
in the EXPLAIN
statement produces extra information about the inputs in the plan. It shows various attributes for the inputs. For example, for a query like:
EXPLAIN DEPENDENCY SELECT key, count( 1 ) FROM srcpart WHERE ds IS NOT NULL GROUP BY key |
the following output is produced:
{ "input_partitions" :[{ "partitionName" : "default<at:var at:name=" srcpart " />ds=2008-04-08/hr=11" },{ "partitionName" : "default<at:var at:name=" srcpart " />ds=2008-04-08/hr=12" },{ "partitionName" : "default<at:var at:name=" srcpart " />ds=2008-04-09/hr=11" },{ "partitionName" : "default<at:var at:name=" srcpart " />ds=2008-04-09/hr=12" }], "input_tables" :[{ "tablename" : "default@srcpart" , "tabletype" : "MANAGED_TABLE" }]} |
The inputs contain both the tables and the partitions. Note that the table is present even if none of the partitions is accessed in the query.
The dependencies show the parents in case a table is accessed via a view. Consider the following queries:
CREATE VIEW V1 AS SELECT key, value from src; EXPLAIN DEPENDENCY SELECT * FROM V1; |
The following output is produced:
{ "input_partitions" :[], "input_tables" :[{ "tablename" : "default@v1" , "tabletype" : "VIRTUAL_VIEW" },{ "tablename" : "default@src" , "tabletype" : "MANAGED_TABLE" , "tableParents" : "[default@v1]" }]} |
As above, the inputs contain the view V1 and the table 'src' that the view V1 refers to.
All the outputs are shown if a table is being accessed via multiple parents.
CREATE VIEW V2 AS SELECT ds, key, value FROM srcpart WHERE ds IS NOT NULL; CREATE VIEW V4 AS SELECT src1.key, src2.value as value1, src3.value as value2 FROM V1 src1 JOIN V2 src2 on src1.key = src2.key JOIN src src3 ON src2.key = src3.key; EXPLAIN DEPENDENCY SELECT * FROM V4; |
The following output is produced.
{ "input_partitions" :[{ "partitionParents" : "[default@v2]" , "partitionName" : "default<at:var at:name=" srcpart " />ds=2008-04-08/hr=11" },{ "partitionParents" : "[default@v2]" , "partitionName" : "default<at:var at:name=" srcpart " />ds=2008-04-08/hr=12" },{ "partitionParents" : "[default@v2]" , "partitionName" : "default<at:var at:name=" srcpart " />ds=2008-04-09/hr=11" },{ "partitionParents" : "[default@v2]" , "partitionName" : "default<at:var at:name=" srcpart " />ds=2008-04-09/hr=12" }], "input_tables" :[{ "tablename" : "default@v4" , "tabletype" : "VIRTUAL_VIEW" },{ "tablename" : "default@v2" , "tabletype" : "VIRTUAL_VIEW" , "tableParents" : "[default@v4]" },{ "tablename" : "default@v1" , "tabletype" : "VIRTUAL_VIEW" , "tableParents" : "[default@v4]" },{ "tablename" : "default@src" , "tabletype" : "MANAGED_TABLE" , "tableParents" : "[default@v4, default@v1]" },{ "tablename" : "default@srcpart" , "tabletype" : "MANAGED_TABLE" , "tableParents" : "[default@v2]" }]} |
As can be seen, src is being accessed via parents v1 and v4.
The use of AUTHORIZATION
in the EXPLAIN
statement shows all entities needed to be authorized to execute the query and authorization failures if exists. For example, for a query like:
EXPLAIN AUTHORIZATION SELECT * FROM src JOIN srcpart; |
the following output is produced:
INPUTS: default @srcpart default @src default @srcpart @ds = 2008 - 04 - 08 /hr= 11 default @srcpart @ds = 2008 - 04 - 08 /hr= 12 default @srcpart @ds = 2008 - 04 - 09 /hr= 11 default @srcpart @ds = 2008 - 04 - 09 /hr= 12 OUTPUTS: hdfs: //localhost:9000/tmp/.../-mr-10000 CURRENT_USER: navis OPERATION: QUERY AUTHORIZATION_FAILURES: Permission denied: Principal [name=navis, type=USER] does not have following privileges for operation QUERY [[SELECT] on Object [type=TABLE_OR_VIEW, name= default .src], [SELECT] on Object [type=TABLE_OR_VIEW, name= default .srcpart]] |
With the FORMATTED
keyword, it will be returned in JSON format.
"OUTPUTS" :[ "hdfs://localhost:9000/tmp/.../-mr-10000" ], "INPUTS" :[ "default@srcpart" , "default@src" , "default@srcpart@ds=2008-04-08/hr=11" , "default@srcpart@ds=2008-04-08/hr=12" , "default@srcpart@ds=2008-04-09/hr=11" , "default@srcpart@ds=2008-04-09/hr=12" ], "OPERATION" : "QUERY" , "CURRENT_USER" : "navis" , "AUTHORIZATION_FAILURES" :[ "Permission denied: Principal [name=navis, type=USER] does not have following privileges for operation QUERY [[SELECT] on Object [type=TABLE_OR_VIEW, name=default.src], [SELECT] on Object [type=TABLE_OR_VIEW, name=default.srcpart]]" ]} |
[Hive - LanguageManual ] Explain (待)的更多相关文章
- Hive的Explain命令
Hive的Explain命令,用于显示SQL查询的执行计划. Hive查询被转化成序列阶段(这是一个有向无环图).这些阶段可能是mapper/reducer阶段,或者是Metastore或文件系统的操 ...
- [Hive - LanguageManual ] ]SQL Standard Based Hive Authorization
Status of Hive Authorization before Hive 0.13 SQL Standards Based Hive Authorization (New in Hive 0. ...
- [Hive - LanguageManual ] Windowing and Analytics Functions (待)
LanguageManual WindowingAndAnalytics Skip to end of metadata Added by Lefty Leverenz, last edi ...
- [HIve - LanguageManual] Hive Operators and User-Defined Functions (UDFs)
Hive Operators and User-Defined Functions (UDFs) Hive Operators and User-Defined Functions (UDFs) Bu ...
- [Hive - LanguageManual] Import/Export
LanguageManual ImportExport Skip to end of metadata Added by Carl Steinbach, last edited by Le ...
- [Hive - LanguageManual] DML: Load, Insert, Update, Delete
LanguageManual DML Hive Data Manipulation Language Hive Data Manipulation Language Loading files int ...
- [Hive - LanguageManual] Alter Table/Partition/Column
Alter Table/Partition/Column Alter Table Rename Table Alter Table Properties Alter Table Comment Add ...
- Hive LanguageManual DDL
hive语法规则LanguageManual DDL SQL DML 和 DDL 数据操作语言 (DML) 和 数据定义语言 (DDL) 一.数据库 增删改都在文档里说得也很明白,不重复造车轮 二.表 ...
- 【Hive】explain command throw ClassCastException in 2.3.4
参考:https://issues.apache.org/jira/browse/HIVE-21489 (一)问题描述: Hive-2.3.4 执行 explain select * from sr ...
随机推荐
- C++:向函数传递对象(对象、对象指针、对象引用)
3.5.1 使用对象作为函数参数,其方法与传递基本类型的变量相同 //例3.21 使用对象作为函数参数 #include<iostream> using namespace std; ...
- C++:对象声明
(一)类与对象的关系: c++把类的变量叫做类的对象,对象也称类的实例 (二)对象的定义: 1.在声明类的同时,直接定义对象,即在声明类的右花括号“}”后,直接写出 属于该类的对象名表.例如:clas ...
- Android之NDK编程(JNI)
转自:http://www.cnblogs.com/xw022/archive/2011/08/18/2144621.html NDK编程入门--C回调JAVA方法 一.主要流程 1. 新建一个 ...
- 字符设备 register_chrdev_region()、alloc_chrdev_region() 和 register_chrdev()
1. 字符设备结构体 内核中所有已分配的字符设备编号都记录在一个名为 chrdevs 散列表里.该散列表中的每一个元素是一个 char_device_struct 结构,它的定义如下: static ...
- StaggeredGridLayoutManager
Class Overview A LayoutManager that lays out children in a staggered grid formation. It supports hor ...
- C#中Dictionary的用法及用途
Dictionary<string, string>是一个泛型 他本身有集合的功能有时候可以把它看成数组 他的结构是这样的:Dictionary<[key], [value]> ...
- ConcurrentDictionary的ToDictionary
如果Value是引用,那么在使用Value.Clear()的时候.会清空Value的所有元素,但是不会改变Value的引用 private static void Main() { try { var ...
- Json转译
public string ListToJson<T>(IList<T> list, string jsonName) { StringBuilder Json = new S ...
- iOS开发:在Xcode中用Pods管理第三方库
之前写了一篇 iOS开发:在Swift中调用oc库 ,今天记录一下如何用Pods的方式来管理第三方库,包括Swift/Object-C的库. 在这之前请先查阅Guides.CocoaPods如何使用的 ...
- codeforces 333B - Chips
注意:横向纵向交叉时,只要两条边不是正中的边(当n&1!=1),就可以余下两个chip. 代码里数组a[][]第二维下标 0表示横向边,1表示纵向边. #include<stdio.h& ...