Hive学习笔记—

Hive是如何解析SQL的呢,首先拿hive的建表语句来举例,比如下面的建表语句

create table test(id int,name string)row format delimited fields terminated by '\t';

然后使用hive的show create table语句来查看创建的表结构，这是一张text表

CREATE TABLE `test`(

  `id` int,

  `name` string)

ROW FORMAT SERDE

  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'

WITH SERDEPROPERTIES (

  'field.delim'='\t',

  'serialization.format'='\t')

STORED AS INPUTFORMAT

  'org.apache.hadoop.mapred.TextInputFormat'

OUTPUTFORMAT

  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION

  'hdfs://master:8020/user/hive/warehouse/test'

TBLPROPERTIES (

  'transient_lastDdlTime'='1568561230')

当然还有其他各种建表语句，比如

csv表

CREATE EXTERNAL TABLE `default.test_1`(

	  `key` string COMMENT 'from deserializer',

	  `value` string COMMENT 'from deserializer')

	ROW FORMAT SERDE

	  'org.apache.hadoop.hive.serde2.OpenCSVSerde'

	WITH SERDEPROPERTIES (

	  'escapeChar'='\\',

	  'quoteChar'='\'',

	  'separatorChar'='\t')

	STORED AS INPUTFORMAT

	  'org.apache.hadoop.mapred.TextInputFormat'

	OUTPUTFORMAT

	  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

	LOCATION

	  'hdfs://master:8020/user/hive/warehouse/test'

	TBLPROPERTIES (

	  'COLUMN_STATS_ACCURATE'='false',

	  'numFiles'='0',

	  'numRows'='-1',

	  'rawDataSize'='-1',

	  'totalSize'='0',

	  'transient_lastDdlTime'='xxxx')

parquet表

CREATE TABLE `default.test`(

	  `time` string,

	  `server` int,

	  `id` bigint)

	PARTITIONED BY (

	  `ds` string)

	ROW FORMAT SERDE

	  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'

	WITH SERDEPROPERTIES (

	  'field.delim'='\t',

	  'serialization.format'='\t')

	STORED AS INPUTFORMAT

	  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'

	OUTPUTFORMAT

	  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'

	LOCATION

	  'hdfs://master:8020/user/hive/warehouse/test'

	TBLPROPERTIES (

  'transient_lastDdlTime'='xxxx')

json表

CREATE EXTERNAL TABLE `default.test`(

	  `titleid` string COMMENT 'from deserializer',

	  `timestamp` string COMMENT 'from deserializer')

	ROW FORMAT SERDE

	  'org.openx.data.jsonserde.JsonSerDe'

	STORED AS INPUTFORMAT

	  'org.apache.hadoop.mapred.TextInputFormat'

	OUTPUTFORMAT

	  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

	LOCATION

	  'hdfs://master:8020/user/hive/warehouse/test'

	TBLPROPERTIES (

	  'COLUMN_STATS_ACCURATE'='false',

	  'numFiles'='0',

	  'numRows'='-1',

	  'rawDataSize'='-1',

	  'totalSize'='0',

es表

CREATE EXTERNAL TABLE `default.test`(

	  `id` string COMMENT 'from deserializer',

	  `ts` string COMMENT 'from deserializer', ')

	PARTITIONED BY (

	  `ds` string)

	ROW FORMAT SERDE

	  'org.elasticsearch.hadoop.hive.EsSerDe'

	STORED BY

	  'org.elasticsearch.hadoop.hive.EsStorageHandler'

	WITH SERDEPROPERTIES (

	  'serialization.format'='1')

	LOCATION

	  'hdfs://master:8020/user/hive/warehouse/test'

	TBLPROPERTIES (

	  'es.index.auto.create'='yes',

	  'es.index.read.missing.as.empty'='yes',

	  'es.nodes'='host1,host2',

	  'es.port'='9200',

	  'es.resource'='index1/type1',

使用thrift的binary表

CREATE EXTERNAL TABLE `default.test`(

	  `bbb` string COMMENT 'from deserializer',

	  `aaa` string COMMENT 'from deserializer')

	COMMENT 'aas'

	PARTITIONED BY (

	  `ds` string COMMENT '日期分区')

	ROW FORMAT SERDE

	  'org.apache.hadoop.hive.serde2.thrift.ThriftDeserializer'

	WITH SERDEPROPERTIES (

	  'serialization.class'='com.xxx.xxx.xxx.tables.v1.XXXX',

	  'serialization.format'='org.apache.thrift.protocol.TCompactProtocol')

	STORED AS INPUTFORMAT

	  'org.apache.hadoop.mapred.SequenceFileInputFormat'

	OUTPUTFORMAT

	  'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'

	LOCATION

	  'hdfs://master:8020/user/hive/warehouse/test'

	TBLPROPERTIES (

	  'transient_lastDdlTime'='xxxxxx')

等等

可以查看show create table的hive源码

https://github.com/apache/hive/blob/68ae4a5cd1b916098dc1deb2bcede5f862afd80e/ql/src/java/org/apache/hadoop/hive/ql/ddl/table/creation/ShowCreateTableOperation.java

其中可以看出hive表的一些基本信息

private static final String CREATE_TABLE_TEMPLATE =

      "CREATE <" + TEMPORARY + "><" + EXTERNAL + ">TABLE `<" + NAME + ">`(\n" +

      "<" + LIST_COLUMNS + ">)\n" +

      "<" + COMMENT + ">\n" +

      "<" + PARTITIONS + ">\n" +

      "<" + BUCKETS + ">\n" +

      "<" + SKEWED + ">\n" +

      "<" + ROW_FORMAT + ">\n" +

      "<" + LOCATION_BLOCK + ">" +

      "TBLPROPERTIES (\n" +

      "<" + PROPERTIES + ">)\n";

  private String getCreateTableCommand(Table table) {

    ST command = new ST(CREATE_TABLE_TEMPLATE);

    command.add(NAME, desc.getTableName());

    command.add(TEMPORARY, getTemporary(table));

    command.add(EXTERNAL, getExternal(table));

    command.add(LIST_COLUMNS, getColumns(table));

    command.add(COMMENT, getComment(table));

    command.add(PARTITIONS, getPartitions(table));

    command.add(BUCKETS, getBuckets(table));

    command.add(SKEWED, getSkewed(table));

    command.add(ROW_FORMAT, getRowFormat(table));

    command.add(LOCATION_BLOCK, getLocationBlock(table));

    command.add(PROPERTIES, getProperties(table));

    return command.render();

  }

当用户输入一行create table语句的时候,可查看源码

https://github.com/apache/hive/blob/ff98efa7c6f2b241d8fddd0ac8dc55e817ecb234/ql/src/java/org/apache/hadoop/hive/ql/parse/ParseUtils.java

美团点评 Hive SQL的编译过程

https://tech.meituan.com/2014/02/12/hive-sql-to-mapreduce.html

其中可以看到,建表语句首先会使用antlr4将其转换成一颗语法树

public static ASTNode parse(String command) throws ParseException {

    return parse(command, null);

  }

然后可以使用getTable抽取其中的库名和表名

https://github.com/apache/hive/blob/f37c5de6c32b9395d1b34fa3c02ed06d1bfbf6eb/ql/src/java/org/apache/hadoop/hive/ql/parse/AnalyzeCommandUtils.java

源码

public static Table getTable(ASTNode tree, BaseSemanticAnalyzer sa) throws SemanticException {

    String tableName = ColumnStatsSemanticAnalyzer.getUnescapedName((ASTNode) tree.getChild(0).getChild(0));

    String currentDb = SessionState.get().getCurrentDatabase();

    String [] names = Utilities.getDbTableName(currentDb, tableName);

    return sa.getTable(names[0], names[1], true);

  }

https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/ParseDriver.java

public ASTNode parse(String command) throws ParseException {

    return parse(command, null);

  }

然后比如要提取inputformat，outpurformat，serde和storageHandler

https://github.com/apache/hive/blob/f37c5de6c32b9395d1b34fa3c02ed06d1bfbf6eb/ql/src/java/org/apache/hadoop/hive/ql/parse/StorageFormat.java

源码

要提取字段信息，SkewedValue，表名以及row format

https://github.com/apache/hive/blob/f37c5de6c32b9395d1b34fa3c02ed06d1bfbf6eb/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java

源码

public static List<FieldSchema> getColumns(

      ASTNode ast, boolean lowerCase, TokenRewriteStream tokenRewriteStream,

      List<SQLPrimaryKey> primaryKeys, List<SQLForeignKey> foreignKeys,

      List<SQLUniqueConstraint> uniqueConstraints, List<SQLNotNullConstraint> notNullConstraints,

      List<SQLDefaultConstraint> defaultConstraints, List<SQLCheckConstraint> checkConstraints,

      Configuration conf) throws SemanticException {

我是源码

}

源码

 /**

   * Get the unqualified name from a table node.

   *

   * This method works for table names qualified with their schema (e.g., "db.table")

   * and table names without schema qualification. In both cases, it returns

   * the table name without the schema.

   *

   * @param node the table node

   * @return the table name without schema qualification

   *         (i.e., if name is "db.table" or "table", returns "table")

   */

  public static String getUnescapedUnqualifiedTableName(ASTNode node) {

    assert node.getChildCount() <= 2;

    if (node.getChildCount() == 2) {

      node = (ASTNode) node.getChild(1);

    }

    return getUnescapedName(node);

  }

源码

  protected void analyzeRowFormat(ASTNode child) throws SemanticException {

      child = (ASTNode) child.getChild(0);

      int numChildRowFormat = child.getChildCount();

      for (int numC = 0; numC < numChildRowFormat; numC++) {

        ASTNode rowChild = (ASTNode) child.getChild(numC);

        switch (rowChild.getToken().getType()) {

        case HiveParser.TOK_TABLEROWFORMATFIELD:

          fieldDelim = unescapeSQLString(rowChild.getChild(0)

              .getText());

          if (rowChild.getChildCount() >= 2) {

            fieldEscape = unescapeSQLString(rowChild

                .getChild(1).getText());

          }

          break;

        case HiveParser.TOK_TABLEROWFORMATCOLLITEMS:

          collItemDelim = unescapeSQLString(rowChild

              .getChild(0).getText());

          break;

        case HiveParser.TOK_TABLEROWFORMATMAPKEYS:

          mapKeyDelim = unescapeSQLString(rowChild.getChild(0)

              .getText());

          break;

        case HiveParser.TOK_TABLEROWFORMATLINES:

          lineDelim = unescapeSQLString(rowChild.getChild(0)

              .getText());

          if (!lineDelim.equals("\n")

              && !lineDelim.equals("10")) {

            throw new SemanticException(SemanticAnalyzer.generateErrorMessage(rowChild,

                ErrorMsg.LINES_TERMINATED_BY_NON_NEWLINE.getMsg()));

          }

          break;

        case HiveParser.TOK_TABLEROWFORMATNULL:

          nullFormat = unescapeSQLString(rowChild.getChild(0)

                    .getText());

          break;

        default:

          throw new AssertionError("Unkown Token: " + rowChild);

        }

      }

    }

  }

分区信息，首先通过取得Map对象，

https://github.com/apache/hive/blob/6f18bbbc2e030ce7d446b2475037203cbd4f860d/ql/src/java/org/apache/hadoop/hive/ql/parse/AnalyzeCommandUtils.java

源码

  public static Map<String,String> getPartKeyValuePairsFromAST(Table tbl, ASTNode tree,

      HiveConf hiveConf) throws SemanticException {

    ASTNode child = ((ASTNode) tree.getChild(0).getChild(1));

    Map<String,String> partSpec = new HashMap<String, String>();

    if (child != null) {

      partSpec = DDLSemanticAnalyzer.getValidatedPartSpec(tbl, child, hiveConf, false);

    } //otherwise, it is the case of analyze table T compute statistics for columns;

    return partSpec;

  }

再转换成List<Partition>对象

https://github.com/apache/hive/blob/556531182dc989e12fd491d951b353b4df13fd47/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java

源码

public Map<String, String> partSpec; // has to use LinkedHashMap to enforce order
public List<Partition> partitions; // involved partitions in TableScanOperator/FileSinkOperator

partitions = db.getPartitions(table, partSpec);

location信息，parsedLocation

https://github.com/apache/hive/blob/0213afb8a31af1f48d009edd41cec9e6c8942354/ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java

Hive学习笔记——parse的更多相关文章

hive学习笔记之十：用户自定义聚合函数(UDAF)
欢迎访问我的GitHub 这里分类和汇总了欣宸的全部原创(含配套源码):https://github.com/zq2599/blog_demos 本篇概览本文是<hive学习笔记>的第十 ...
hive学习笔记之一：基本数据类型
欢迎访问我的GitHub https://github.com/zq2599/blog_demos 内容:所有原创文章分类汇总及配套源码,涉及Java.Docker.Kubernetes.DevOPS ...
hive学习笔记之三：内部表和外部表
欢迎访问我的GitHub https://github.com/zq2599/blog_demos 内容:所有原创文章分类汇总及配套源码,涉及Java.Docker.Kubernetes.DevOPS ...
hive学习笔记之四：分区表
欢迎访问我的GitHub https://github.com/zq2599/blog_demos 内容:所有原创文章分类汇总及配套源码,涉及Java.Docker.Kubernetes.DevOPS ...
hive学习笔记之五：分桶
欢迎访问我的GitHub https://github.com/zq2599/blog_demos 内容:所有原创文章分类汇总及配套源码,涉及Java.Docker.Kubernetes.DevOPS ...
hive学习笔记之六：HiveQL基础
欢迎访问我的GitHub https://github.com/zq2599/blog_demos 内容:所有原创文章分类汇总及配套源码,涉及Java.Docker.Kubernetes.DevOPS ...
hive学习笔记之七：内置函数
欢迎访问我的GitHub https://github.com/zq2599/blog_demos 内容:所有原创文章分类汇总及配套源码,涉及Java.Docker.Kubernetes.DevOPS ...
hive学习笔记之九：基础UDF
欢迎访问我的GitHub https://github.com/zq2599/blog_demos 内容:所有原创文章分类汇总及配套源码,涉及Java.Docker.Kubernetes.DevOPS ...
hive学习笔记之十一：UDTF
欢迎访问我的GitHub https://github.com/zq2599/blog_demos 内容:所有原创文章分类汇总及配套源码,涉及Java.Docker.Kubernetes.DevOPS ...

随机推荐

Win10上的Docker应用：Hello World
前言: 最近学习了Docker相关技术点,国内关于Docker的资料大多是基于Linux系统的,但是我对Linux又不熟(实际上没用过,掩面哭笑.Jpg). 好在在Win10下也是支持Docker的, ...
oVirt-postgresql
连接数据库方法一: cd /opt/rh/rh-postgresql95/root/bin su postgres ./psql \c engine 执行sql语句即可方法二: 用pgAdmin访 ...
201671010446姚良实验十四团队项目评审&课程总结
实验十四团队项目评审&课程学习总结项目内容这个作业属于哪个课程 http://www.cnblogs.com/nwnu-daizh/ 这个作业的要求在哪里 https://www.cn ...
项目Beta冲刺（4/7）（追光的人）(2019.5.26)
所属课程软件工程1916 作业要求 Beta冲刺博客汇总团队名称追光的人作业目标描述Beta冲刺每日的scrum和PM报告两部分队员学号队员博客 221600219 小墨 https:/ ...
abp记录1
1在AbpWebApplication中的的构造函数中创建abpBootstrapper 实例,在Application_Start执行AbpBootstrapper值初始化方式 2AbpBootst ...
A A=new A();
using System; using System.Collections.Generic; using System.Linq; using System.Text; namespace Cons ...
GeoIP简介与资源，定位经纬度，获取用户IP
所谓GeoIP,就是通过来访者的IP,定位他的经纬度,国家/地区,省市,甚至街道等位置信息.这里面的技术不算难题,关键在于有个精准的数据库.有了准确的数据源就奇货可居赚点小钱,可是发扬合作精神,集体贡 ...
class [org.springframework.context.annotation.ComponentScanBeanDefinitionParser] are only available on JDK 1.5 and higher
在搭建SSM项目时报了以下的错误: 06-Oct-2019 11:55:52.109 信息 [RMI TCP Connection(5)-127.0.0.1] org.apache.catalina. ...
OKR案例——不同类型的OKR实例
OKR是一种能将团队调动起来一起向着一个方向去努力的绝佳目标管理法,它让我们的团队去挑战自己的极限,去实现更大的价值,去将我们的战略最完美的转化为成果. 然而,想要让OKR在我们的团队中发挥作用,制定 ...
下拉选择的blur和click事件冲突了
当写个下拉选择框时我们希望当input失去焦点时,下拉框消失,或者当选择下拉框中的内容的同时将内容填入input并且使下拉框消失. 这时候我们会想到blur和click,单独使用的时候是没有问题的,但 ...

Hive学习笔记——parse

Hive学习笔记——parse的更多相关文章

随机推荐

热门专题