Hive Concurrency Model

Use Cases

Concurrency support (http://issues.apache.org/jira/browse/HIVE-1293) is a must in databases and their use cases are well understood. At a minimum, we want to support concurrent readers and writers whenever possible. It would be useful to add a mechanism to discover the current locks which have been acquired. There is no immediate requirement to add an API to explicitly acquire any locks, so all locks would be acquired implicitly.

The following lock modes will be defined in hive (Note that Intent lock is not needed).

  • Shared (S)
  • Exclusive (X)

As the name suggests, multiple shared locks can be acquired at the same time, whereas X lock blocks all other locks.

The compatibility matrix is as follows:

Lock
Compatibility

Existing Lock

S

X

Requested 
Lock

S

True

False

X

False

False

For some operations, locks are hierarchical in nature -- for example for some partition operations, the table is also locked (to make sure that the table cannot be dropped while a new partition is being created).

The rational behind the lock mode to acquire is as follows:

For a non-partitioned table, the lock modes are pretty intuitive. When the table is being read, a S lock is acquired, whereas an X lock is acquired for all other operations (insert into the table, alter table of any kind etc.)

For a partitioned table, the idea is as follows:

A 'S' lock on table and relevant partition is acquired when a read is being performed. For all other operations, an 'X' lock is taken on the partition. However, if the change is only applicable to the newer partitions, a 'S' lock is acquired on the table, whereas if the change is applicable to all partitions, a 'X' lock is acquired on the table. Thus, older partitions can be read and written into, while the newer partitions are being converted to RCFile. Whenever a partition is being locked in any mode, all its parents are locked in 'S' mode.

Based on this, the lock acquired for an operation is as follows:

Hive Command

Locks Acquired

select .. T1 partition P1

S on T1, T1.P1

insert into T2(partition P2) select .. T1 partition P1

S on T2, T1, T1.P1 and X on T2.P2

insert into T2(partition P.Q) select .. T1 partition P1

S on T2, T2.P, T1, T1.P1 and X on T2.P.Q

alter table T1 rename T2

X on T1

alter table T1 add cols

X on T1

alter table T1 replace cols

X on T1

alter table T1 change cols

X on T1

alter table T1 add partition P1

S on T1, X on T1.P1

alter table T1 drop partition P1

S on T1, X on T1.P1

alter table T1 touch partition P1

S on T1, X on T1.P1

alter table T1 set serdeproperties

S on T1

alter table T1 set serializer

S on T1

alter table T1 set file format

S on T1

alter table T1 set tblproperties

X on T1

drop table T1

X on T1

In order to avoid deadlocks, a very simple scheme is proposed here. All the objects to be locked are sorted lexicographically, and the required mode lock is acquired. Note that in some cases, the list of objects may not be known -- for example in case of dynamic partitions, the list of partitions being modified is not known at compile time -- so, the list is generated conservatively. Since the number of partitions may not be known, an exclusive lock is taken on the table, or the prefix that is known.

Two new configurable parameters will be added to decide the number of retries for the lock and the wait time between each retry. If the number of retries are really high, it can lead to a live lock. Look at ZooKeeper recipes (http://hadoop.apache.org/zookeeper/docs/r3.1.2/recipes.html#sc_recipes_Locks) to see how read/write locks can be implemented using the zookeeper apis. Note that instead of waiting, the lock request will be denied. The existing locks will be released, and all of them will be retried after the retry interval.

The recipe listed above will not work as specified, because of the hierarchical nature of locks.

The 'S' lock for table T is specified as follows:

  • Call create( ) to create a node with pathname "/warehouse/T/read-". This is the lock node used later in the protocol. Make sure to set the sequence and ephemeral flag.
  • Call getChildren( ) on the lock node without setting the watch flag.
  • If there is a child with a pathname starting with "write-" and a lower sequence number than the one obtained, the lock cannot be acquired. Delete the node created in the first step and return.
  • Otherwise the lock is granted.

The 'X' lock for table T is specified as follows:

  • Call create( ) to create a node with pathname "/warehouse/T/write-". This is the lock node used later in the protocol. Make sure to set the sequence and ephemeral flag.
  • Call getChildren( ) on the lock node without setting the watch flag.
  • If there is a child with a pathname starting with "read-" or "write-" and a lower sequence number than the one obtained, the lock cannot be acquired. Delete the node created in the first step and return.
  • Otherwise the lock is granted.

The proposed scheme starves the writers for readers. In case of long readers, it may lead to starvation for writers.

The default Hive behavior will not be changed, and concurrency will not be supported.

Turn Off Concurrency

You can turn off concurrency by setting the following variable to false: hive.support.concurrency.

Debugging

You can see the locks on a table by issuing the following command:

  • SHOW LOCKS <TABLE_NAME>;
  • SHOW LOCKS <TABLE_NAME> EXTENDED;
  • SHOW LOCKS <TABLE_NAME> PARTITION (<PARTITION_DESC>);
  • SHOW LOCKS <TABLE_NAME> PARTITION (<PARTITION_DESC>) EXTENDED;

Configuration

Configuration properties for Hive locking are described in Locking.

Locking in Hive Transactions

Hive 0.13.0 adds transactions with row-level ACID semantics, using a new lock manager. For more information, see:

 

[Hive - LanguageManual] Hive Concurrency Model (待)的更多相关文章

  1. [HIve - LanguageManual] Hive Operators and User-Defined Functions (UDFs)

    Hive Operators and User-Defined Functions (UDFs) Hive Operators and User-Defined Functions (UDFs) Bu ...

  2. [Hive - LanguageManual] Hive Default Authorization - Legacy Mode

    Disclaimer Prerequisites Users, Groups, and Roles Names of Users and Roles Creating/Dropping/Using R ...

  3. [Hive - LanguageManual] Create/Drop/Grant/Revoke Roles and Privileges / Show Use

    Create/Drop/Grant/Revoke Roles and Privileges Hive Default Authorization - Legacy Mode has informati ...

  4. [Hive - LanguageManual ] ]SQL Standard Based Hive Authorization

    Status of Hive Authorization before Hive 0.13 SQL Standards Based Hive Authorization (New in Hive 0. ...

  5. [Hive - LanguageManual] Archiving for File Count Reduction

    Archiving for File Count Reduction Note: Archiving should be considered an advanced command due to t ...

  6. [HIve - LanguageManual] Joins

    Hive Joins Hive Joins Join Syntax Examples MapJoin Restrictions Join Optimization Predicate Pushdown ...

  7. 【hive】——Hive sql语法详解

    Hive 是基于Hadoop 构建的一套数据仓库分析系统,它提供了丰富的SQL查询方式来分析存储在Hadoop 分布式文件系统中的数据,可以将结构 化的数据文件映射为一张数据库表,并提供完整的SQL查 ...

  8. Hive 文件格式 & Hive操作(外部表、内部表、区、桶、视图、索引、join用法、内置操作符与函数、复合类型、用户自定义函数UDF、查询优化和权限控制)

    本博文的主要内容如下: Hive文件存储格式 Hive 操作之表操作:创建外.内部表 Hive操作之表操作:表查询 Hive操作之表操作:数据加载 Hive操作之表操作:插入单表.插入多表 Hive语 ...

  9. 【hive】——Hive四种数据导入方式

    Hive的几种常见的数据导入方式这里介绍四种:(1).从本地文件系统中导入数据到Hive表:(2).从HDFS上导入数据到Hive表:(3).从别的表中查询出相应的数据并导入到Hive表中:(4).在 ...

随机推荐

  1. 确认某端口占用情况并结束相应进程(Windows)

    (1)确认某端口是否被占用 (2)通过查找对应的PID号,定位是哪一个进程在使用该端口 (3)通过PID号结束该进程 # 查找端口2000是否被占用C:\Users\tdcqma>netstat ...

  2. 7、单向一对多的关联关系(1的一方有n的一方的集合属性,n的一方却没有1的一方的引用)

    单向一对多的关联关系 具体体现:1的一方有n的一方的集合的引用,n的一方却没有1的一方的引用 举个例子:顾客Customer对订单Order是一个单向一对多的关联关系.Customer一方有对Orde ...

  3. SAP 物料基本单位与BOM单位

    比如:物料的基本单位是G,可该物料放到BOM中的单位却是PC,该如何实现呢? 1. 首先要弄清楚BOM单位优先取的是发货单位(工厂数据视图1),当发货单位为空时,取基本单位: 2. 然后再建立单位G ...

  4. BZOJ 2005 能量采集(容斥原理)

    题目链接:http://61.187.179.132/JudgeOnline/problem.php?id=2005 题意:给定n和m,求 思路:本题主要是解决对于给定的t,有多少对(i,j)满足x= ...

  5. IDEA 创建Web项目并在Tomcat中部署运行

    IDEA 14.0.5 apache-tomcat-8.0.32 步骤:File->New Project,在Java列表中勾选Web Application(3.1),点击Next 建立web ...

  6. bzoj1562

    很明显是二分图匹配,关键是怎么求字典序最小 想到两种做法,首先是直接匹配,然后从第一位贪心调整 第二种是从最后一个倒着匹配,每次匹配都尽量选小的,这样一定能保证字典序最小 type node=reco ...

  7. js风格技巧

    1.一个页面的所有js都可以写成这样,比如:   var index ={};   index.User = ****;   index.Init = function(){ $("$tes ...

  8. BZOJ3681: Arietta

    题解: 数据结构来优化网络流,貌似都是用一段区间来表示一个点,然后各种乱搞... 发现主席树好吊...在树上建主席树貌似有三种方法: 1.建每个点到根节点这条链上的主席树,可以回答和两点间的路径的XX ...

  9. javascript中的关键字和保留字

    javascript中关键字的问题,将名称替换了下,确实就没有问题了.现在将它的关键字和保留字贴出来,便于日后查看和避免在次出现类似的问题. 1 关键字breakcasecatchcontinuede ...

  10. android中的ellipsize设置(省略号的问题)

    textview中有个内容过长加省略号的属性,即ellipsize,可以较偷懒地解决这个问题,哈哈~ 用法如下: 在xml中 android:ellipsize = "end"   ...