EXPLAIN Syntax

Hive provides an EXPLAIN command that shows the execution plan for a query. The syntax for this statement is as follows:

EXPLAIN [EXTENDED|DEPENDENCY|AUTHORIZATION] query
AUTHORIZATION is supported from HIVE 0.14.0 via HIVE-5961.

The use of EXTENDED in the EXPLAIN statement produces extra information about the operators in the plan. This is typically physical information like file names.

A Hive query gets converted into a sequence (it is more an Directed Acyclic Graph) of stages. These stages may be map/reduce stages or they may even be stages that do metastore or file system operations like move and rename. The explain output comprises of three parts:

  • The Abstract Syntax Tree for the query
  • The dependencies between the different stages of the plan
  • The description of each of the stages

The description of the stages itself shows a sequence of operators with the metadata associated with the operators. The metadata may comprise of things like filter expressions for the FilterOperator or the select expressions for the SelectOperator or the output file names for the FileSinkOperator.

As an example, consider the following EXPLAIN query:

EXPLAIN
FROM src INSERT OVERWRITE TABLE dest_g1 SELECT src.key, sum(substr(src.value,4)) GROUP BY src.key;

The output of this statement contains the following parts:

  • The Abstract Syntax Tree

    ABSTRACT SYNTAX TREE:
      (TOK_QUERY (TOK_FROM (TOK_TABREF src)) (TOK_INSERT (TOK_DESTINATION (TOK_TAB dest_g1)) (TOK_SELECT (TOK_SELEXPR (TOK_COLREF src key)) (TOK_SELEXPR (TOK_FUNCTION sum (TOK_FUNCTION substr (TOK_COLREF src value) 4)))) (TOK_GROUPBY (TOK_COLREF src key))))
  • The Dependency Graph

    STAGE DEPENDENCIES:
      Stage-1 is a root stage
      Stage-2 depends on stages: Stage-1
      Stage-0 depends on stages: Stage-2

    This shows that Stage-1 is the root stage, Stage-2 is executed after Stage-1 is done and Stage-0 is executed after Stage-2 is done.

  • The plans of each Stage

    STAGE PLANS:
      Stage: Stage-1
        Map Reduce
          Alias -> Map Operator Tree:
            src
                Reduce Output Operator
                  key expressions:
                        expr: key
                        type: string
                  sort order: +
                  Map-reduce partition columns:
                        expr: rand()
                        type: double
                  tag: -1
                  value expressions:
                        expr: substr(value, 4)
                        type: string
          Reduce Operator Tree:
            Group By Operator
              aggregations:
                    expr: sum(UDFToDouble(VALUE.0))
              keys:
                    expr: KEY.0
                    type: string
              mode: partial1
              File Output Operator
                compressed: false
                table:
                    input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                    output format: org.apache.hadoop.mapred.SequenceFileOutputFormat
                    name: binary_table
     
      Stage: Stage-2
        Map Reduce
          Alias -> Map Operator Tree:
            /tmp/hive-zshao/67494501/106593589.10001
              Reduce Output Operator
                key expressions:
                      expr: 0
                      type: string
                sort order: +
                Map-reduce partition columns:
                      expr: 0
                      type: string
                tag: -1
                value expressions:
                      expr: 1
                      type: double
          Reduce Operator Tree:
            Group By Operator
              aggregations:
                    expr: sum(VALUE.0)
              keys:
                    expr: KEY.0
                    type: string
              mode: final
              Select Operator
                expressions:
                      expr: 0
                      type: string
                      expr: 1
                      type: double
                Select Operator
                  expressions:
                        expr: UDFToInteger(0)
                        type: int
                        expr: 1
                        type: double
                  File Output Operator
                    compressed: false
                    table:
                        input format: org.apache.hadoop.mapred.TextInputFormat
                        output format: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat
                        serde: org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe
                        name: dest_g1
     
      Stage: Stage-0
        Move Operator
          tables:
                replace: true
                table:
                    input format: org.apache.hadoop.mapred.TextInputFormat
                    output format: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat
                    serde: org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe
                    name: dest_g1

    In this example there are 2 map/reduce stages (Stage-1 and Stage-2) and 1 File System related stage (Stage-0). Stage-0 basically moves the results from a temporary directory to the directory corresponding to the table dest_g1.

A map/reduce stage itself comprises of 2 parts:

  • A mapping from table alias to Map Operator Tree - This mapping tells the mappers which operator tree to call in order to process the rows from a particular table or result of a previous map/reduce stage. In Stage-1 in the above example, the rows from src table are processed by the operator tree rooted at a Reduce Output Operator. Similarly, in Stage-2 the rows of the results of Stage-1 are processed by another operator tree rooted at another Reduce Output Operator. Each of these Reduce Output Operators partitions the data to the reducers according to the criteria shown in the metadata.
  • A Reduce Operator Tree - This is the operator tree which processes all the rows on the reducer of the map/reduce job. In Stage-1 for example, the Reducer Operator Tree is carrying out a partial aggregation where as the Reducer Operator Tree in Stage-2 computes the final aggregation from the partial aggregates computed in Stage-1

The use of DEPENDENCY in the EXPLAIN statement produces extra information about the inputs in the plan. It shows various attributes for the inputs. For example, for a query like:

EXPLAIN DEPENDENCY
  SELECT key, count(1) FROM srcpart WHERE ds IS NOT NULL GROUP BY key

the following output is produced:

{"input_partitions":[{"partitionName":"default<at:var at:name="srcpart" />ds=2008-04-08/hr=11"},{"partitionName":"default<at:var at:name="srcpart" />ds=2008-04-08/hr=12"},{"partitionName":"default<at:var at:name="srcpart" />ds=2008-04-09/hr=11"},{"partitionName":"default<at:var at:name="srcpart" />ds=2008-04-09/hr=12"}],"input_tables":[{"tablename":"default@srcpart","tabletype":"MANAGED_TABLE"}]}

The inputs contain both the tables and the partitions. Note that the table is present even if none of the partitions is accessed in the query.

The dependencies show the parents in case a table is accessed via a view. Consider the following queries:

CREATE VIEW V1 AS SELECT key, value from src;
EXPLAIN DEPENDENCY SELECT * FROM V1;

The following output is produced:

{"input_partitions":[],"input_tables":[{"tablename":"default@v1","tabletype":"VIRTUAL_VIEW"},{"tablename":"default@src","tabletype":"MANAGED_TABLE","tableParents":"[default@v1]"}]}

As above, the inputs contain the view V1 and the table 'src' that the view V1 refers to.

All the outputs are shown if a table is being accessed via multiple parents.

CREATE VIEW V2 AS SELECT ds, key, value FROM srcpart WHERE ds IS NOT NULL;
CREATE VIEW V4 AS
  SELECT src1.key, src2.value as value1, src3.value as value2
  FROM V1 src1 JOIN V2 src2 on src1.key = src2.key JOIN src src3 ON src2.key = src3.key;
EXPLAIN DEPENDENCY SELECT * FROM V4;

The following output is produced.

{"input_partitions":[{"partitionParents":"[default@v2]","partitionName":"default<at:var at:name="srcpart" />ds=2008-04-08/hr=11"},{"partitionParents":"[default@v2]","partitionName":"default<at:var at:name="srcpart" />ds=2008-04-08/hr=12"},{"partitionParents":"[default@v2]","partitionName":"default<at:var at:name="srcpart" />ds=2008-04-09/hr=11"},{"partitionParents":"[default@v2]","partitionName":"default<at:var at:name="srcpart" />ds=2008-04-09/hr=12"}],"input_tables":[{"tablename":"default@v4","tabletype":"VIRTUAL_VIEW"},{"tablename":"default@v2","tabletype":"VIRTUAL_VIEW","tableParents":"[default@v4]"},{"tablename":"default@v1","tabletype":"VIRTUAL_VIEW","tableParents":"[default@v4]"},{"tablename":"default@src","tabletype":"MANAGED_TABLE","tableParents":"[default@v4, default@v1]"},{"tablename":"default@srcpart","tabletype":"MANAGED_TABLE","tableParents":"[default@v2]"}]}

As can be seen, src is being accessed via parents v1 and v4.

The use of AUTHORIZATION in the EXPLAIN statement shows all entities needed to be authorized to execute the query and authorization failures if exists. For example, for a query like:

EXPLAIN AUTHORIZATION
  SELECT * FROM src JOIN srcpart;

the following output is produced:

INPUTS:
  default@srcpart
  default@src
  default@srcpart@ds=2008-04-08/hr=11
  default@srcpart@ds=2008-04-08/hr=12
  default@srcpart@ds=2008-04-09/hr=11
  default@srcpart@ds=2008-04-09/hr=12
OUTPUTS:
  hdfs://localhost:9000/tmp/.../-mr-10000
CURRENT_USER:
  navis
OPERATION:
  QUERY
AUTHORIZATION_FAILURES:
  Permission denied: Principal [name=navis, type=USER] does not have following privileges for operation QUERY [[SELECT] on Object [type=TABLE_OR_VIEW, name=default.src], [SELECT] on Object [type=TABLE_OR_VIEW, name=default.srcpart]]

With the FORMATTED keyword, it will be returned in JSON format.

"OUTPUTS":["hdfs://localhost:9000/tmp/.../-mr-10000"],"INPUTS":["default@srcpart","default@src","default@srcpart@ds=2008-04-08/hr=11","default@srcpart@ds=2008-04-08/hr=12","default@srcpart@ds=2008-04-09/hr=11","default@srcpart@ds=2008-04-09/hr=12"],"OPERATION":"QUERY","CURRENT_USER":"navis","AUTHORIZATION_FAILURES":["Permission denied: Principal [name=navis, type=USER] does not have following privileges for operation QUERY [[SELECT] on Object [type=TABLE_OR_VIEW, name=default.src], [SELECT] on Object [type=TABLE_OR_VIEW, name=default.srcpart]]"]}
 

[Hive - LanguageManual ] Explain (待)的更多相关文章

  1. Hive的Explain命令

    Hive的Explain命令,用于显示SQL查询的执行计划. Hive查询被转化成序列阶段(这是一个有向无环图).这些阶段可能是mapper/reducer阶段,或者是Metastore或文件系统的操 ...

  2. [Hive - LanguageManual ] ]SQL Standard Based Hive Authorization

    Status of Hive Authorization before Hive 0.13 SQL Standards Based Hive Authorization (New in Hive 0. ...

  3. [Hive - LanguageManual ] Windowing and Analytics Functions (待)

    LanguageManual WindowingAndAnalytics     Skip to end of metadata   Added by Lefty Leverenz, last edi ...

  4. [HIve - LanguageManual] Hive Operators and User-Defined Functions (UDFs)

    Hive Operators and User-Defined Functions (UDFs) Hive Operators and User-Defined Functions (UDFs) Bu ...

  5. [Hive - LanguageManual] Import/Export

    LanguageManual ImportExport     Skip to end of metadata   Added by Carl Steinbach, last edited by Le ...

  6. [Hive - LanguageManual] DML: Load, Insert, Update, Delete

    LanguageManual DML Hive Data Manipulation Language Hive Data Manipulation Language Loading files int ...

  7. [Hive - LanguageManual] Alter Table/Partition/Column

    Alter Table/Partition/Column Alter Table Rename Table Alter Table Properties Alter Table Comment Add ...

  8. Hive LanguageManual DDL

    hive语法规则LanguageManual DDL SQL DML 和 DDL 数据操作语言 (DML) 和 数据定义语言 (DDL) 一.数据库 增删改都在文档里说得也很明白,不重复造车轮 二.表 ...

  9. 【Hive】explain command throw ClassCastException in 2.3.4

    参考:https://issues.apache.org/jira/browse/HIVE-21489 (一)问题描述: Hive-2.3.4 执行  explain select * from sr ...

随机推荐

  1. iOS 用命令实现简单的打包过程

    `xcode-select --print-path`/Platforms/iPhoneOS.platform/Developer/usr/bin/PackageApplication // 获得打包 ...

  2. 项目SVN的IP地址发生变化时修改SVN为新的IP地址

    在eclipse或者Myeclipse自带的svn:subclipse中修改ip地址 项目开发中有可能要修改SVN的IP地址,entries文件里面包含svn服务器的地址信息.每个文件夹都会产生一个e ...

  3. C++:对象数组

    对象数组 对象数组:每一个数组元素都是对象的数组,也就是说,若一个类有若干个对象,我们把这 一系列的对象用一个数组来存放.对应数组元素是对象,不仅具有的数据成员,而且还有函数 成员. @定义一个一维数 ...

  4. win7进入不了系统故障修复

    问题: 由于电脑关机比较慢,等得不耐烦了,就强制关机了,以前都没事,直到昨晚打开电脑,提示windows错误恢复,试了好久,提示windows无法修复此计算机,看来是没办法了.后来进入系统还原后,总算 ...

  5. UVa 1453 - Squares 旋转卡壳求凸包直径

    旋转卡壳求凸包直径. 参考:http://www.cppblog.com/staryjy/archive/2010/09/25/101412.html #include <cstdio> ...

  6. [HDOJ2586]How far away?(最近公共祖先, 离线tarjan, 并查集)

    题目链接:http://acm.hdu.edu.cn/showproblem.php?pid=2586 这题以前做过…现在用tarjan搞一发…竟然比以前暴力过的慢………… 由于是离线算法,需要Que ...

  7. grunt + compass retina sprites

    https://github.com/AdamBrodzinski/Retina-Sprites-for-Compass

  8. SPRING STS Virgo OSGI 开发一 : bundle 项目的创建

    1. Spring STS 下载地址  (spring 最近改了站点 暂时不是太熟悉)     http://spring.io/tools/sts/all 2. 下载 Virgo 插件    htt ...

  9. BootStrap基本样式

    文本对齐风格:.text-left:左对齐.text-center:居中对齐.text-right:右对齐.text-justify:两端对齐 取消列表符号:.list-unstyled内联列表:.l ...

  10. pinyin4j使用示例

    pinyin4j的主页:http://pinyin4j.sourceforge.net/pinyin4j能够根据中文字符获取其对应的拼音,而且拼音的格式可以定制pinyin4j是一个支持将中文转换到拼 ...