the ACID properties of HBase
http://hbase.apache.org/acid-semantics.html
Apache HBase (TM) is not an ACID compliant database. However, it does guarantee certain specific properties.
This specification enumerates the ACID properties of HBase.
Definitions
For the sake of common vocabulary, we define the following terms:
- Atomicity
- an operation is atomic if it either completes entirely or not at all
- Consistency
- all actions cause the table to transition from one valid state directly to another (eg a row will not disappear during an update, etc)
- Isolation
- an operation is isolated if it appears to complete independently of any other concurrent transaction
- Durability
- any update that reports "successful" to the client will not be lost
- Visibility
- an update is considered visible if any subsequent read will see the update as having been committed
The terms must and may are used as specified by RFC 2119. In short, the word "must" implies that, if some case exists where the statement is not true, it is a bug. The word "may" implies that, even if the guarantee is provided in a current release, users should not rely on it.
APIs to consider
- Read APIs
- get
- scan
- Write APIs
- put
- batch put
- delete
- Combination (read-modify-write) APIs
- incrementColumnValue
- checkAndPut
Guarantees Provided
Atomicity
- All mutations are atomic within a row. Any put will either wholly succeed or wholly fail.[3]
- An operation that returns a "success" code has completely succeeded.
- An operation that returns a "failure" code has completely failed.
- An operation that times out may have succeeded and may have failed. However, it will not have partially succeeded or failed.
- This is true even if the mutation crosses multiple column families within a row.
- APIs that mutate several rows will _not_ be atomic across the multiple rows. For example, a multiput that operates on rows 'a','b', and 'c' may return having mutated some but not all of the rows. In such cases, these APIs will return a list of success codes, each of which may be succeeded, failed, or timed out as described above.
- The checkAndPut API happens atomically like the typical compareAndSet (CAS) operation found in many hardware architectures.
- The order of mutations is seen to happen in a well-defined order for each row, with no interleaving. For example, if one writer issues the mutation "a=1,b=1,c=1" and another writer issues the mutation "a=2,b=2,c=2", the row must either be "a=1,b=1,c=1" or "a=2,b=2,c=2" and must not be something like "a=1,b=2,c=1".
- Please note that this is not true _across rows_ for multirow batch mutations.
Consistency and Isolation
- All rows returned via any access API will consist of a complete row that existed at some point in the table's history.
- This is true across column families - i.e a get of a full row that occurs concurrent with some mutations 1,2,3,4,5 will return a complete row that existed at some point in time between mutation i and i+1 for some i between 1 and 5.
- The state of a row will only move forward through the history of edits to it.
Consistency of Scans
A scan is not a consistent view of a table. Scans do not exhibit snapshot isolation.
Rather, scans have the following properties:
- Any row returned by the scan will be a consistent view (i.e. that version of the complete row existed at some point in time) [1]
- A scan will always reflect a view of the data at least as new as the beginning of the scan. This satisfies the visibility guarantees enumerated below.
- For example, if client A writes data X and then communicates via a side channel to client B, any scans started by client B will contain data at least as new as X.
- A scan _must_ reflect all mutations committed prior to the construction of the scanner, and _may_ reflect some mutations committed subsequent to the construction of the scanner.
- Scans must include all data written prior to the scan (except in the case where data is subsequently mutated, in which case it _may_ reflect the mutation)
Those familiar with relational databases will recognize this isolation level as "read committed".
Please note that the guarantees listed above regarding scanner consistency are referring to "transaction commit time", not the "timestamp" field of each cell. That is to say, a scanner started at time t may see edits with a timestamp value greater than t, if those edits were committed with a "forward dated" timestamp before the scanner was constructed.
Visibility
- When a client receives a "success" response for any mutation, that mutation is immediately visible to both that client and any client with whom it later communicates through side channels. [3]
- A row must never exhibit so-called "time-travel" properties. That is to say, if a series of mutations moves a row sequentially through a series of states, any sequence of concurrent reads will return a subsequence of those states.
- For example, if a row's cells are mutated using the "incrementColumnValue" API, a client must never see the value of any cell decrease.
- This is true regardless of which read API is used to read back the mutation.
- Any version of a cell that has been returned to a read operation is guaranteed to be durably stored.
Durability
- All visible data is also durable data. That is to say, a read will never return data that has not been made durable on disk[2]
- Any operation that returns a "success" code (eg does not throw an exception) will be made durable.[3]
- Any operation that returns a "failure" code will not be made durable (subject to the Atomicity guarantees above)
- All reasonable failure scenarios will not affect any of the guarantees of this document.
Tunability
All of the above guarantees must be possible within Apache HBase. For users who would like to trade off some guarantees for performance, HBase may offer several tuning options. For example:
- Visibility may be tuned on a per-read basis to allow stale reads or time travel.
- Durability may be tuned to only flush data to disk on a periodic basis
More Information
For more information, see the client architecture or data model sections in the Apache HBase Reference Guide.
Footnotes
[1] A consistent view is not guaranteed intra-row scanning -- i.e. fetching a portion of a row in one RPC then going back to fetch another portion of the row in a subsequent RPC. Intra-row scanning happens when you set a limit on how many values to return per Scan#next (See Scan#setBatch(int)).
[2] In the context of Apache HBase, "durably on disk" implies an hflush() call on the transaction log. This does not actually imply an fsync() to magnetic media, but rather just that the data has been written to the OS cache on all replicas of the log. In the case of a full datacenter power loss, it is possible that the edits are not truly durable.
[3] Puts will either wholly succeed or wholly fail, provided that they are actually sent to the RegionServer. If the writebuffer is used, Puts will not be sent until the writebuffer is filled or it is explicitly flushed.
the ACID properties of HBase的更多相关文章
- [翻译]HBase 中的 ACID
同前面翻译的一篇关联的,同作者的另一篇:ACID in HBase 这一篇不是单纯地描述一个问题,而是以 ACID 为主题,介绍了其在 HBase 中各个部分的体现及实现. ACID,即:原子性(At ...
- HBase事务
众所周知,ACID是指原子性(Atomicity),一致性(Consistency),隔离性(Isolation)和持久性(Durability). HBase对同一行数据的操作提供ACID保证.HB ...
- Hbase基础篇
namespace:命名空间的作用是把多个属于相同业务领域的表分成一个组.一个表可以自由选择是否有命名空间,如果创建表的时候加上了命名空间后,这个表名字就成为了:<NameSpace> : ...
- hbase+hadoop+hdfs集群搭建 集成spring
序言 最近公司一个汽车项目想用hbase做存储,然后就有了这篇文字,来,来,来, 带你一起征服hbase,并推荐一本书<hbase权威指南> 这是一本极好的hbase入门书籍,我花了一个晚 ...
- 深入理解大数据之——事务及其ACID特性
目录 事务简介 事物的定义 事务的目的 事务的状态 事务的ACID属性 ACID简介 原子性(Atomicity) 一致性(Consistency) 隔离性(Isolation) 持久性(Durabi ...
- MySQL vs. MongoDB: Choosing a Data Management Solution
原文地址:http://www.javacodegeeks.com/2015/07/mysql-vs-mongodb.html 1. Introduction It would be fair to ...
- 【Python之路】特别篇--Redis
NoSQL(NoSQL = Not Only SQL ),意即“不仅仅是SQL”,泛指非关系型的数据库 随着互联网web2.0网站的兴起,传统的关系数据库在应付web2.0网站,特别是超大规模和高并发 ...
- infoq - neo4j graph db
My name is Charles Humble and I am here at QCon New York 2014 with Ian Robinson. Ian, can you introd ...
- MySQL binlog 组提交与 XA(两阶段提交)
1. XA-2PC (two phase commit, 两阶段提交 ) XA是由X/Open组织提出的分布式事务的规范(X代表transaction; A代表accordant?).XA规范主要定义 ...
随机推荐
- display的32种写法--摘抄
你知道『回』字有四种写法,但你知道display有32种写法吗?今天我们一一道来,让你一次性完全掌握display,从此再也不用对它发愁. 从大的分类来讲,display的32种写法可以分为6个大类, ...
- “百度杯”CTF比赛 十月场_GetFlag(验证码爆破+注入+绝对路径文件下载)
题目在i春秋ctf大本营 页面给出了验证码经过md5加密后前6位的值,依照之前做题的套路,首先肯定是要爆破出验证码,这里直接给我写的爆破代码 #coding:utf-8 import hashlib ...
- LeetCode OJ--Best Time to Buy and Sell Stock III
http://oj.leetcode.com/problems/best-time-to-buy-and-sell-stock-iii/ 这三道题,很好的进阶.1题简单处理,2题使用贪心,3题使用动态 ...
- Codeforces 620E New Year Tree(线段树+位运算)
题目链接 New Year Tree 考虑到$ck <= 60$,那么用位运算统计颜色种数 对于每个点,重新标号并算出他对应的进和出的时间,然后区间更新+查询. 用线段树来维护. #includ ...
- 洛谷—— P2117 小Z的矩阵
https://www.luogu.org/problemnew/show/2117 题目描述 小Z最近迷上了矩阵,他定义了一个对于一种特殊矩阵的特征函数G.对于N*N的矩阵A,A的所有元素均为0或1 ...
- 洛谷—— P2880 [USACO07JAN]平衡的阵容Balanced Lineup
https://www.luogu.org/problemnew/show/P2880 题目背景 题目描述: 每天,农夫 John 的N(1 <= N <= 50,000)头牛总是按同一序 ...
- Codeforces 471 D MUH and Cube Walls
题目大意 Description 给你一个字符集合,你从其中找出一些字符串出来. 希望你找出来的这些字符串的最长公共前缀*字符串的总个数最大化. Input 第一行给出数字N.N在[2,1000000 ...
- debian6之eclipse和jdk安装
安装JDK 目前最新的JDK版本是:Java SE Development Kit 7u5 下载地址:http://www.oracle.com/technetwork/java/javase/dow ...
- 启动eclipse时出现“Failed to load the JNI shared library jvm.dll”错误及解决-及eclipse版本查看
启动eclipse时出现“Failed to load the JNI shared library jvm.dll”错误及解决-及eclipse版本查看 学习了:https://www.cnblog ...
- Django的安装使用,以及建立本地网站
一.安装Django pip install django 完成后即可 二.pycharm 建立django 点击file ——>new project 选择django项目——>more ...