[Hive - LanguageManual ] Explain (待)
EXPLAIN Syntax
Hive provides an EXPLAIN command that shows the execution plan for a query. The syntax for this statement is as follows:
EXPLAIN [EXTENDED|DEPENDENCY|AUTHORIZATION] query |
AUTHORIZATION is supported from HIVE 0.14.0 via HIVE-5961.
The use of EXTENDED in the EXPLAIN statement produces extra information about the operators in the plan. This is typically physical information like file names.
A Hive query gets converted into a sequence (it is more an Directed Acyclic Graph) of stages. These stages may be map/reduce stages or they may even be stages that do metastore or file system operations like move and rename. The explain output comprises of three parts:
- The Abstract Syntax Tree for the query
- The dependencies between the different stages of the plan
- The description of each of the stages
The description of the stages itself shows a sequence of operators with the metadata associated with the operators. The metadata may comprise of things like filter expressions for the FilterOperator or the select expressions for the SelectOperator or the output file names for the FileSinkOperator.
As an example, consider the following EXPLAIN query:
EXPLAINFROM src INSERT OVERWRITE TABLE dest_g1 SELECT src.key, sum(substr(src.value,4)) GROUP BY src.key; |
The output of this statement contains the following parts:
The Abstract Syntax Tree
ABSTRACT SYNTAX TREE:(TOK_QUERY (TOK_FROM (TOK_TABREF src)) (TOK_INSERT (TOK_DESTINATION (TOK_TAB dest_g1)) (TOK_SELECT (TOK_SELEXPR (TOK_COLREF src key)) (TOK_SELEXPR (TOK_FUNCTION sum (TOK_FUNCTION substr (TOK_COLREF src value)4)))) (TOK_GROUPBY (TOK_COLREF src key))))The Dependency Graph
STAGE DEPENDENCIES:Stage-1is a root stageStage-2depends on stages: Stage-1Stage-0depends on stages: Stage-2This shows that Stage-1 is the root stage, Stage-2 is executed after Stage-1 is done and Stage-0 is executed after Stage-2 is done.
The plans of each Stage
STAGE PLANS:Stage: Stage-1Map ReduceAlias -> Map Operator Tree:srcReduce Output Operatorkey expressions:expr: keytype: stringsort order: +Map-reduce partition columns:expr: rand()type:doubletag: -1value expressions:expr: substr(value,4)type: stringReduce Operator Tree:Group By Operatoraggregations:expr: sum(UDFToDouble(VALUE.0))keys:expr: KEY.0type: stringmode: partial1File Output Operatorcompressed:falsetable:input format: org.apache.hadoop.mapred.SequenceFileInputFormatoutput format: org.apache.hadoop.mapred.SequenceFileOutputFormatname: binary_tableStage: Stage-2Map ReduceAlias -> Map Operator Tree:/tmp/hive-zshao/67494501/106593589.10001Reduce Output Operatorkey expressions:expr:0type: stringsort order: +Map-reduce partition columns:expr:0type: stringtag: -1value expressions:expr:1type:doubleReduce Operator Tree:Group By Operatoraggregations:expr: sum(VALUE.0)keys:expr: KEY.0type: stringmode:finalSelect Operatorexpressions:expr:0type: stringexpr:1type:doubleSelect Operatorexpressions:expr: UDFToInteger(0)type:intexpr:1type:doubleFile Output Operatorcompressed:falsetable:input format: org.apache.hadoop.mapred.TextInputFormatoutput format: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormatserde: org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDename: dest_g1Stage: Stage-0Move Operatortables:replace:truetable:input format: org.apache.hadoop.mapred.TextInputFormatoutput format: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormatserde: org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDename: dest_g1In this example there are 2 map/reduce stages (Stage-1 and Stage-2) and 1 File System related stage (Stage-0). Stage-0 basically moves the results from a temporary directory to the directory corresponding to the table dest_g1.
A map/reduce stage itself comprises of 2 parts:
- A mapping from table alias to Map Operator Tree - This mapping tells the mappers which operator tree to call in order to process the rows from a particular table or result of a previous map/reduce stage. In Stage-1 in the above example, the rows from src table are processed by the operator tree rooted at a Reduce Output Operator. Similarly, in Stage-2 the rows of the results of Stage-1 are processed by another operator tree rooted at another Reduce Output Operator. Each of these Reduce Output Operators partitions the data to the reducers according to the criteria shown in the metadata.
- A Reduce Operator Tree - This is the operator tree which processes all the rows on the reducer of the map/reduce job. In Stage-1 for example, the Reducer Operator Tree is carrying out a partial aggregation where as the Reducer Operator Tree in Stage-2 computes the final aggregation from the partial aggregates computed in Stage-1
The use of DEPENDENCY in the EXPLAIN statement produces extra information about the inputs in the plan. It shows various attributes for the inputs. For example, for a query like:
EXPLAIN DEPENDENCY SELECT key, count(1) FROM srcpart WHERE ds IS NOT NULL GROUP BY key |
the following output is produced:
{"input_partitions":[{"partitionName":"default<at:var at:name="srcpart" />ds=2008-04-08/hr=11"},{"partitionName":"default<at:var at:name="srcpart" />ds=2008-04-08/hr=12"},{"partitionName":"default<at:var at:name="srcpart" />ds=2008-04-09/hr=11"},{"partitionName":"default<at:var at:name="srcpart" />ds=2008-04-09/hr=12"}],"input_tables":[{"tablename":"default@srcpart","tabletype":"MANAGED_TABLE"}]} |
The inputs contain both the tables and the partitions. Note that the table is present even if none of the partitions is accessed in the query.
The dependencies show the parents in case a table is accessed via a view. Consider the following queries:
CREATE VIEW V1 AS SELECT key, value from src;EXPLAIN DEPENDENCY SELECT * FROM V1; |
The following output is produced:
{"input_partitions":[],"input_tables":[{"tablename":"default@v1","tabletype":"VIRTUAL_VIEW"},{"tablename":"default@src","tabletype":"MANAGED_TABLE","tableParents":"[default@v1]"}]} |
As above, the inputs contain the view V1 and the table 'src' that the view V1 refers to.
All the outputs are shown if a table is being accessed via multiple parents.
CREATE VIEW V2 AS SELECT ds, key, value FROM srcpart WHERE ds IS NOT NULL;CREATE VIEW V4 AS SELECT src1.key, src2.value as value1, src3.value as value2 FROM V1 src1 JOIN V2 src2 on src1.key = src2.key JOIN src src3 ON src2.key = src3.key;EXPLAIN DEPENDENCY SELECT * FROM V4; |
The following output is produced.
{"input_partitions":[{"partitionParents":"[default@v2]","partitionName":"default<at:var at:name="srcpart" />ds=2008-04-08/hr=11"},{"partitionParents":"[default@v2]","partitionName":"default<at:var at:name="srcpart" />ds=2008-04-08/hr=12"},{"partitionParents":"[default@v2]","partitionName":"default<at:var at:name="srcpart" />ds=2008-04-09/hr=11"},{"partitionParents":"[default@v2]","partitionName":"default<at:var at:name="srcpart" />ds=2008-04-09/hr=12"}],"input_tables":[{"tablename":"default@v4","tabletype":"VIRTUAL_VIEW"},{"tablename":"default@v2","tabletype":"VIRTUAL_VIEW","tableParents":"[default@v4]"},{"tablename":"default@v1","tabletype":"VIRTUAL_VIEW","tableParents":"[default@v4]"},{"tablename":"default@src","tabletype":"MANAGED_TABLE","tableParents":"[default@v4, default@v1]"},{"tablename":"default@srcpart","tabletype":"MANAGED_TABLE","tableParents":"[default@v2]"}]} |
As can be seen, src is being accessed via parents v1 and v4.
The use of AUTHORIZATION in the EXPLAIN statement shows all entities needed to be authorized to execute the query and authorization failures if exists. For example, for a query like:
EXPLAIN AUTHORIZATION SELECT * FROM src JOIN srcpart; |
the following output is produced:
INPUTS: default@srcpart default@src default@srcpart@ds=2008-04-08/hr=11 default@srcpart@ds=2008-04-08/hr=12 default@srcpart@ds=2008-04-09/hr=11 default@srcpart@ds=2008-04-09/hr=12OUTPUTS: hdfs://localhost:9000/tmp/.../-mr-10000CURRENT_USER: navisOPERATION: QUERYAUTHORIZATION_FAILURES: Permission denied: Principal [name=navis, type=USER] does not have following privileges for operation QUERY [[SELECT] on Object [type=TABLE_OR_VIEW, name=default.src], [SELECT] on Object [type=TABLE_OR_VIEW, name=default.srcpart]] |
With the FORMATTED keyword, it will be returned in JSON format.
"OUTPUTS":["hdfs://localhost:9000/tmp/.../-mr-10000"],"INPUTS":["default@srcpart","default@src","default@srcpart@ds=2008-04-08/hr=11","default@srcpart@ds=2008-04-08/hr=12","default@srcpart@ds=2008-04-09/hr=11","default@srcpart@ds=2008-04-09/hr=12"],"OPERATION":"QUERY","CURRENT_USER":"navis","AUTHORIZATION_FAILURES":["Permission denied: Principal [name=navis, type=USER] does not have following privileges for operation QUERY [[SELECT] on Object [type=TABLE_OR_VIEW, name=default.src], [SELECT] on Object [type=TABLE_OR_VIEW, name=default.srcpart]]"]} |
[Hive - LanguageManual ] Explain (待)的更多相关文章
- Hive的Explain命令
Hive的Explain命令,用于显示SQL查询的执行计划. Hive查询被转化成序列阶段(这是一个有向无环图).这些阶段可能是mapper/reducer阶段,或者是Metastore或文件系统的操 ...
- [Hive - LanguageManual ] ]SQL Standard Based Hive Authorization
Status of Hive Authorization before Hive 0.13 SQL Standards Based Hive Authorization (New in Hive 0. ...
- [Hive - LanguageManual ] Windowing and Analytics Functions (待)
LanguageManual WindowingAndAnalytics Skip to end of metadata Added by Lefty Leverenz, last edi ...
- [HIve - LanguageManual] Hive Operators and User-Defined Functions (UDFs)
Hive Operators and User-Defined Functions (UDFs) Hive Operators and User-Defined Functions (UDFs) Bu ...
- [Hive - LanguageManual] Import/Export
LanguageManual ImportExport Skip to end of metadata Added by Carl Steinbach, last edited by Le ...
- [Hive - LanguageManual] DML: Load, Insert, Update, Delete
LanguageManual DML Hive Data Manipulation Language Hive Data Manipulation Language Loading files int ...
- [Hive - LanguageManual] Alter Table/Partition/Column
Alter Table/Partition/Column Alter Table Rename Table Alter Table Properties Alter Table Comment Add ...
- Hive LanguageManual DDL
hive语法规则LanguageManual DDL SQL DML 和 DDL 数据操作语言 (DML) 和 数据定义语言 (DDL) 一.数据库 增删改都在文档里说得也很明白,不重复造车轮 二.表 ...
- 【Hive】explain command throw ClassCastException in 2.3.4
参考:https://issues.apache.org/jira/browse/HIVE-21489 (一)问题描述: Hive-2.3.4 执行 explain select * from sr ...
随机推荐
- 本人arcgis api for javascript中常见错误总结
1. 2.对象不支持"replace"属性或方法 解决办法:一般在ie中执行js会报这样的错误,基本问题就是你引用了某个对象中不存在的方法,可能是这个方法本来存在而你写错了,或者调 ...
- 成为一个PHP专家:缺失的环节
这一篇文章是“Becoming a PHP Professional”系列 4 篇博文中的第 1 篇. 当浏览各类与PHP相关的博客时,比如Quora上的问题,谷歌群组,简讯和杂志,我经常注意到技能的 ...
- 在CentOS 6.X 上面安装 Python 2.7.X
在CentOS 6.X 上面安装 Python 2.7.X CentOS 6.X 自带的python版本是 2.6 , 由于工作需要,很多时候需要2.7版本.所以需要进行版本升级.由于一些系统工具和服 ...
- 自定义View(3)关于canas.drawText
本文以Canvas类的下面这个函数为基础,它用来在画布上绘制文本. public void drawText(String text, float x, float y, Paint paint) 效 ...
- Android权限安全(8)ContentProvider基于URI的安全
一.provider可以通过binder得到客户的uid,然后进程权限检查. 二,provider临时权限 场景: Email的内容在provider中提供,Email的客户端可读基其内容,现在一封 ...
- Android开发之执行定时任务AlarmManager,Timer,Thread
1.Thread:使用线程方式2.Timer是java的特性3.AlarmManager:AlarmManager将应用与服务分割开来后,使得应用程序开发者不用 关心具体的服务,而是直接通过Alarm ...
- 通俗易懂的讲解iphone视图控制器的生命周期
IOS 视图控制器的生命周期非常非常重要,所以我有必要写个文章来和大家一起探讨问题. 今天在学习视图控制器的生命周期,也看了一下网上的一些资料,但总觉得不是那么好理解,首先我们来看一张图: 先粗略讲一 ...
- 使用设置报头x-Frame-Options限制iframe网页嵌套
x-frame-options的出现一部分是为了防止一些别有用心的者制作钓鱼网站,现在支持的浏览器有一下: chrome 4.1.249.1042 firefox 3.6.9(1.9.2.9) IE ...
- Enabling HierarchyViewer on Rooted Android Devices
转自http://blog.apkudo.com/2012/07/26/enabling-hierarchyviewer-on-rooted-android-devices/. The Hierarc ...
- Android的计量单位px,in,mm,pt,dp,dip,sp
android中dip.dp.px.sp和屏幕密度 1. dip: device independent pixels(设备独立像素). 不同设备有不同的显示效果,这个和设备硬件有关,一般我们为了支持 ...