hive 2.1

一 问题

最近有一个场景,要向一个表的多个分区写数据,为了缩短执行时间,采用并发的方式,多个sql同时执行,分别写不同的分区,同时开启动态分区:

set hive.exec.dynamic.partition=true

insert overwrite table test_table partition(dt) select * from test_table_another where dt = 1;

结果发现只有1个sql运行,其他sql都会卡住;
查看hive thrift server线程堆栈发现请求都卡在DbTxnManager上,hive关键配置如下:

hive.support.concurrency=true
hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager

配置对应的默认值及注释:

org.apache.hadoop.hive.conf.HiveConf

    HIVE_SUPPORT_CONCURRENCY("hive.support.concurrency", false,
"Whether Hive supports concurrency control or not. \n" +
"A ZooKeeper instance must be up and running when using zookeeper Hive lock manager "), HIVE_TXN_MANAGER("hive.txn.manager",
"org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager",
"Set to org.apache.hadoop.hive.ql.lockmgr.DbTxnManager as part of turning on Hive\n" +
"transactions, which also requires appropriate settings for hive.compactor.initiator.on,\n" +
"hive.compactor.worker.threads, hive.support.concurrency (true), hive.enforce.bucketing\n" +
"(true), and hive.exec.dynamic.partition.mode (nonstrict).\n" +
"The default DummyTxnManager replicates pre-Hive-0.13 behavior and provides\n" +
"no transactions."),

二 代码分析

hive执行sql的详细过程详见:https://www.cnblogs.com/barneywill/p/10185168.html

hive中执行sql最终都会调用到Driver.run,run会调用runInternal,下面直接看runInternal代码:

org.apache.hadoop.hive.ql.Driver

  private CommandProcessorResponse runInternal(String command, boolean alreadyCompiled)
throws CommandNeedRetryException {
...
if (requiresLock()) {
// a checkpoint to see if the thread is interrupted or not before an expensive operation
if (isInterrupted()) {
ret = handleInterruption("at acquiring the lock.");
} else {
ret = acquireLocksAndOpenTxn(startTxnImplicitly);
}
... private boolean requiresLock() {
if (!checkConcurrency()) {
return false;
}
// Lock operations themselves don't require the lock.
if (isExplicitLockOperation()){
return false;
}
if (!HiveConf.getBoolVar(conf, ConfVars.HIVE_LOCK_MAPRED_ONLY)) {
return true;
}
Queue<Task<? extends Serializable>> taskQueue = new LinkedList<Task<? extends Serializable>>();
taskQueue.addAll(plan.getRootTasks());
while (taskQueue.peek() != null) {
Task<? extends Serializable> tsk = taskQueue.remove();
if (tsk.requireLock()) {
return true;
}
... private boolean checkConcurrency() {
boolean supportConcurrency = conf.getBoolVar(HiveConf.ConfVars.HIVE_SUPPORT_CONCURRENCY);
if (!supportConcurrency) {
LOG.info("Concurrency mode is disabled, not creating a lock manager");
return false;
}
return true;
} private int acquireLocksAndOpenTxn(boolean startTxnImplicitly) {
...
txnMgr.acquireLocks(plan, ctx, userFromUGI);
...

runInternal会调用requiresLock判断是否需要lock,requiresLock有两个判断:

  • 调用checkConcurrency,checkConcurrency会检查hive.support.concurrency=true才需要lock;
  • 调用Task.requireLock,只有部分task才需要lock;

如果判断需要lock,会调用acquireLocksAndOpenTxn,acquireLocksAndOpenTxn会调用HiveTxnManager.acquireLocks来获取lock;

1)先看那些task需要lock:

org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer

  private void analyzeAlterTablePartMergeFiles(ASTNode ast,
String tableName, HashMap<String, String> partSpec)
throws SemanticException {
...
DDLWork ddlWork = new DDLWork(getInputs(), getOutputs(), mergeDesc);
ddlWork.setNeedLock(true);
...

可见DDL操作需要;

2)再看怎样获取lock:

org.apache.hadoop.hive.ql.lockmgr.DbTxnManager

  public void acquireLocks(QueryPlan plan, Context ctx, String username) throws LockException {
try {
acquireLocksWithHeartbeatDelay(plan, ctx, username, 0);
... void acquireLocksWithHeartbeatDelay(QueryPlan plan, Context ctx, String username, long delay) throws LockException {
LockState ls = acquireLocks(plan, ctx, username, true);
... LockState acquireLocks(QueryPlan plan, Context ctx, String username, boolean isBlocking) throws LockException {
...
switch (output.getType()) {
case DATABASE:
compBuilder.setDbName(output.getDatabase().getName());
break; case TABLE:
case DUMMYPARTITION: // in case of dynamic partitioning lock the table
t = output.getTable();
compBuilder.setDbName(t.getDbName());
compBuilder.setTableName(t.getTableName());
break; case PARTITION:
compBuilder.setPartitionName(output.getPartition().getName());
t = output.getPartition().getTable();
compBuilder.setDbName(t.getDbName());
compBuilder.setTableName(t.getTableName());
break; default:
// This is a file or something we don't hold locks for.
continue;
}
...
LockState lockState = lockMgr.lock(rqstBuilder.build(), queryId, isBlocking, locks);
ctx.setHiveLocks(locks);
return lockState;
}

可见当开启动态分区时,锁的粒度是DbName+TableName,这样就会导致多个sql只有1个sql可以拿到lock,其他sql只能等待;

三 总结

解决问题的方式有几种:

  1. 关闭动态分区:set hive.exec.dynamic.partition=false
  2. 关闭并发:set hive.support.concurrency=false
  3. 关闭事务:set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager

三者任选其一,推荐第1种,因为在刚才的场景下,不需要动态分区;

【原创】大叔问题定位分享(22)hive同时执行多个insert overwrite table只有1个可以执行的更多相关文章

  1. 【原创】大叔问题定位分享(21)spark执行insert overwrite非常慢,比hive还要慢

    最近把一些sql执行从hive改到spark,发现执行更慢,sql主要是一些insert overwrite操作,从执行计划看到,用到InsertIntoHiveTable spark-sql> ...

  2. 【原创】大叔问题定位分享(15)spark写parquet数据报错ParquetEncodingException: empty fields are illegal, the field should be ommited completely instead

    spark 2.1.1 spark里执行sql报错 insert overwrite table test_parquet_table select * from dummy 报错如下: org.ap ...

  3. hive INSERT OVERWRITE table could not be cleaned up.

    create table maats.account_channel ROW FORMAT DELIMITED FIELDS TERMINATED BY '^' STORED AS TEXTFILE ...

  4. 【原创】大叔问题定位分享(18)beeline连接spark thrift有时会卡住

    spark 2.1.1 beeline连接spark thrift之后,执行use database有时会卡住,而use database 在server端对应的是 setCurrentDatabas ...

  5. 【原创】大叔问题定位分享(16)spark写数据到hive外部表报错ClassCastException: org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat cannot be cast to org.apache.hadoop.hive.ql.io.HiveOutputFormat

    spark 2.1.1 spark在写数据到hive外部表(底层数据在hbase中)时会报错 Caused by: java.lang.ClassCastException: org.apache.h ...

  6. 【原创】大叔问题定位分享(31)hive metastore报错

    hive metastore在建表时报错 [pool-5-thread-2]: MetaException(message:Got exception: java.net.ConnectExcepti ...

  7. 【原创】大叔问题定位分享(13)HBase Region频繁下线

    问题现象:hive执行sql报错 select count(*) from test_hive_table; 报错 Error: java.io.IOException: org.apache.had ...

  8. 【原创】大叔问题定位分享(30)mesos agent启动失败:Failed to perform recovery: Incompatible agent info detected

    mesos agent启动失败,报错如下: Feb 15 22:03:18 server1.bj mesos-slave[1190]: E0215 22:03:18.622994 1192 slave ...

  9. 【原创】大叔问题定位分享(28)openssh升级到7.4之后ssh跳转异常

    服务器集群之间忽然ssh跳转不通 # ssh 192.168.0.1The authenticity of host '192.168.0.1 (192.168.0.1)' can't be esta ...

随机推荐

  1. 使用VMware安装Ubuntu虚拟机,创建后开启显示黑屏的解决方法

    将使用的VMware-workstation-full-14.0.0.24051卸载改为使用VMware-workstation_full_12.1.1.6932. 安装VMware成功后,创建新的虚 ...

  2. python对 if __name__=='__main__'的理解

    对于学过其他编程语言的人来说都知道程序都是从main函数开始执行的,而对于python来说他并没有主函数,他不像其他语言需要需要转化为二进制文件 然后才能执行,他时通过翻译器从第一行开始逐行执行,所以 ...

  3. SyntaxError: missing ) after argument list

    消息 语法错误: 参数列表后面缺少 ) 错误类型 SyntaxError. 什么地方出错了? 有一个函数在调用时出现错误.这可能是一个错误,丢失运算符或者转义字符等. 示例 因为没有使用 ”+“ 操作 ...

  4. 2015年旧闻 CNNIC发布伪造CA证书

    谷歌称CNNIC发布伪造CA证书 2015-3-24 15:6:17 | 作者: 月光 | 分类: 业界动态 | 评论: 64 | 浏览: 6755   根据谷歌官方安全博客报道和Mozilla官方博 ...

  5. npm ERR! code ENOENT

    npm ERR! path F:\VsCodeWorkspace\labWeb\front\LabWebAdminFrontEnd\node_modules\core-jsnpm ERR! code ...

  6. SpringBoot2.0初识

    核心特性 组件自动装配: Web MVC , Web Flux , JDBC 等 激活: @EnableAutoConfiguration 配置: /META_INF/spring.factories ...

  7. 使用Redisson实现分布式锁,Spring AOP简化之

    源码 Redisson概述 Redisson是一个在Redis的基础上实现的Java驻内存数据网格(In-Memory Data Grid).它不仅提供了一系列的分布式的Java常用对象,还提供了许多 ...

  8. usb输入子系统键盘(四)

    目录 usb输入子系统键盘 设计思路 内核的上报代码 完整代码 title: usb输入子系统键盘 tags: linux date: 2018/12/20/ 17:05:08 toc: true - ...

  9. Java进程线程笔记

    什么是并行和并发? 并发和并行是即相似又有区别:(微观) 并行:指两个或多个事件在同一时刻发生: 强调的是时间点. 并发:指两个或多个事件在同一时间段内发生: 强调的是时间段. 进程和线程的区别? 进 ...

  10. 第十节: 利用SQLServer实现Quartz的持久化和双机热备的集群模式 :

    背景: 默认情况下,Quartz.Net作业是持久化在内存中的,即 quartz.jobStore.type = "Quartz.Simpl.RAMJobStore, Quartz" ...