Overview

  • 讨论一些(分布式)(存储)系统的一致性

CAP原理

  • 随着分布式事务的出现,传统的单机事务模型(ACID)已经无法胜任,尤其是对于一个高访问量、高并发的互联网分布式系统来说。
  • 如何构建一个兼顾可用性和一致性的分布式系统成为了无数工程师探讨的难题。
  • CAP定理:一个分布式系统不可能同时满足一致性(C:Consistency)、可用性(A:Availability)和分区容错性(P:Partition tolerance)这三个基本需求,最多只能同时满足其中的两项。

Consistency

  • 在分布式环境中,一致性是指数据在多个副本之间是否能够保持一致的特性。
  • 通俗来说就是,All the servers in the system will have the same data so anyone using the system will get the same copy regardless of which server answers their request.
  • CAP中的一致性与ACID中的一致性是完全不同的定义。

Availability

  • 可用性是指系统提供的服务必须一直处于可用的状态,对于用户的每一个操作请求总是能够在有限的时间内返回结果。

    [“有限的时间内”是一个在系统设计之初就设定好的系统运行指标,通常不同的系统之间会有很大的不同。

    “返回结果”要求系统在完成对用户请求的处理后,返回一个正常的响应结果。]

  • The system will always respond to a request (even if it's not the latest data or consistent across the system or just a message saying the system isn't working).

Partition Tolerance

  • “网络分区”是指在分布式系统中,不同的节点分布在不同的子网络(机房或异地网络等)中,由于一些特殊的原因导致这些子网络之间出现网络不连通的状况,但各个子网的内部网络是正常的,从而导致整个系统的网络环境被切分成了若干个孤立的区域。
  • The system continues to operate as a whole even if individual servers fail or can't be reached.

取舍

  • 首先,要明确的是对于一个分布式系统,不可能舍弃Partition Tolerance,否则系统便不再是一个分布式了。【放弃分区容错性的话,则放弃了分布式,放弃了系统的可扩展性】[No distributed system is safe from network failures, thus network partitioning generally has to be tolerated.]
  • 放弃Availability:一旦系统遇到网络分区或其它故障时,那么受到影响的服务需要等待一定的时间,在此期间系统无法对外提供正常的服务,即不可用。
  • 放弃Consistency:放弃C,指的是放弃数据的强一致性(实时一致性),而保留数据的最终一致性。这就引入了一个时间窗口的概念【在时间窗口内,数据是不一致的】,具体多久能够达到数据一致取决于系统的设计,主要包括主副本在不同节点之间的复制时间长短。

Consistency VS Availability

  • Web-scale companies such as LinkedIn, Netflix, Google, Facebook, etc... have several requirements of their database systems around scalability, availability, and performance.
  • For performance first: A company will deploy a few data centers in different parts of the world and partition their users using something like IP Anycast so that all of their users experience the fewest possible hops by being routed to the closest data center.
  • 进而,table按user partitioned,但是为了failover between all data centers,所有data center都保留有所有的数据。但是实际场景是一个特点用户的写操作是发生在一个数据中心的,进而需要被复制到其他数据中心。
  • This is where CAP comes in.

Consistency over Availability

  • If the replication is synchronous, then you can achieve "consistency".
  • The problem is that these protocols reduce the throughput of transactions. They take longer to run and hence less work gets done. As a result, transactions "back-up", connection pools are drained, and your scalability (number of concurrent transaction) drops.
  • You are now prone to hitting a scalability bottleneck, which will cause intermittent outages of sorts (e.g. every other click on your site will timeout or fail fast). Hence, at the cost of consistency, you have compromised availability.
  • eg: HBase is CP.

Availability over Consistency

  • To replicate data asynchronously. In this model, every application writes to its local data base and immediately returns.
  • All transactions remain fast and the transaction throughput remains high. Availability is not impacted.
  • However, views are not consistent between data centers because data is delayed by definition, though this window of inconsistency can be made to be few minutes or better on average. This model is also called eventual consistency.
  • eg: Cassandra is an AP system.

Summary:

  • 在有分区的情况下,只能选择可用性 or 一致性。

    • When choosing consistency over availability, the system will return an error or a time-out if particular information cannot be guaranteed to be up to date due to network partitioning.
    • When choosing availability over consistency, the system will always process the query and try to return the most recent available version of the information, even if it cannot guarantee it is up to date due to network partitioning.
  • 如果没有网络故障,也即分布式系统保持正常运行,那么可用性和一致性就都能被满足。[在实际场景中并不可用]

HBase

  • HBase是拥有强一致性的CP系统。
  • 这是因为对每一个region同时只有一台region server为它服务,对一个region所有的操作请求,都由这一台region server来响应,自然是强一致性的。【与HBase底层的HDFS副本存储无关。】
  • 在这台region server fail的时候,它管理的region failover到其他region server时,需要根据WAL log来redo,这时候进行redo的region应该是unavailable的,所以hbase降低了可用性,提高了一致性。

HDFS

正常情况下,client会等待所有data packages(minimum number of replicas)的ack。

当副本数小于required number of replicas(不同于minimum number of replicas)时,块会被标记成unreplicated,然后NameNode会异步地复制。

dfs.namenode.replication.min (default 1)

dfs.replication (default 3)

HDFS Availability: NameNode是单点故障点。即使2.x引入了High Availability,也不是complete-availability。

【以上,有点乱,TBD...先别看】

FYI

ACID

  • Atomicity - Everything in a transaction must happen successfully or none of the changes are committed. This avoids a transaction that changes multiple pieces of data from failing halfway and only making a few changes.
  • Consistency - The data will only be committed if it passes all the rules in place in the database (ie: data types, triggers, constraints, etc).
  • Isolation - Transactions won't affect other transactions by changing data that another operation is counting on; and other users won't see partial results of a transaction in progress (depending on isolation mode).
  • Durability - Once data is committed, it is durably stored and safe against errors, crashes or any other (software) malfunctions within the database.

SQL/Relational DB

  • ACID is commonly provided by most classic relational databases like MySQL, Microsoft SQL ServerOracle and others. These databases are known for storing data in spreadsheet-like tables that have their columns and data types strictly defined. The tables can have relationships between each other and the data is queried with SQL (Structured Query Language), which is a standardized language for working with databases.

FYI

  • HBase on CAP: With respect to CAP, HBase is decidedly CP. HBase makes strong consistency guarantees. If a client succeeds in writing a value, other clients will receive the updated value on the next request.

    In HBase, data is only served by one region server (even if it resides on multiple data nodes). If region server dies, clients need to wait for a long time because the fact of the region reassignment and log replay.

    HBase isn't designed that multiple region servers can simultaneously serve the same region, because that would be difficult or impossible to achieve otherwise features:single-row put atomicity, atomic check-and-set operations, atomic increment operations, etc. That are only possible if you know for sure exactly one machine is in control of the row.

    HBase does trade some availability to achieve a stronger level of consistency.

    Partition tolerance in CAP, in short, is the ability of a system to survive despite  message loss (due to server failure, network problem, etc.). HBase does this of course, a server failure or message loss does not damage the database. When that happens, we give up availability or give up consistency. In HBase's case we choose consistency, so we have to give up some availability.

<Consistency><of HBase><CAP><ACID>的更多相关文章

  1. 简单物联网:外网访问内网路由器下树莓派Flask服务器

    最近做一个小东西,大概过程就是想在教室,宿舍控制实验室的一些设备. 已经在树莓上搭了一个轻量的flask服务器,在实验室的路由器下,任何设备都是可以访问的:但是有一些限制条件,比如我想在宿舍控制我种花 ...

  2. 利用ssh反向代理以及autossh实现从外网连接内网服务器

    前言 最近遇到这样一个问题,我在实验室架设了一台服务器,给师弟或者小伙伴练习Linux用,然后平时在实验室这边直接连接是没有问题的,都是内网嘛.但是回到宿舍问题出来了,使用校园网的童鞋还是能连接上,使 ...

  3. 外网访问内网Docker容器

    外网访问内网Docker容器 本地安装了Docker容器,只能在局域网内访问,怎样从外网也能访问本地Docker容器? 本文将介绍具体的实现步骤. 1. 准备工作 1.1 安装并启动Docker容器 ...

  4. 外网访问内网SpringBoot

    外网访问内网SpringBoot 本地安装了SpringBoot,只能在局域网内访问,怎样从外网也能访问本地SpringBoot? 本文将介绍具体的实现步骤. 1. 准备工作 1.1 安装Java 1 ...

  5. 外网访问内网Elasticsearch WEB

    外网访问内网Elasticsearch WEB 本地安装了Elasticsearch,只能在局域网内访问其WEB,怎样从外网也能访问本地Elasticsearch? 本文将介绍具体的实现步骤. 1. ...

  6. 怎样从外网访问内网Rails

    外网访问内网Rails 本地安装了Rails,只能在局域网内访问,怎样从外网也能访问本地Rails? 本文将介绍具体的实现步骤. 1. 准备工作 1.1 安装并启动Rails 默认安装的Rails端口 ...

  7. 怎样从外网访问内网Memcached数据库

    外网访问内网Memcached数据库 本地安装了Memcached数据库,只能在局域网内访问,怎样从外网也能访问本地Memcached数据库? 本文将介绍具体的实现步骤. 1. 准备工作 1.1 安装 ...

  8. 怎样从外网访问内网CouchDB数据库

    外网访问内网CouchDB数据库 本地安装了CouchDB数据库,只能在局域网内访问,怎样从外网也能访问本地CouchDB数据库? 本文将介绍具体的实现步骤. 1. 准备工作 1.1 安装并启动Cou ...

  9. 怎样从外网访问内网DB2数据库

    外网访问内网DB2数据库 本地安装了DB2数据库,只能在局域网内访问,怎样从外网也能访问本地DB2数据库? 本文将介绍具体的实现步骤. 1. 准备工作 1.1 安装并启动DB2数据库 默认安装的DB2 ...

  10. 怎样从外网访问内网OpenLDAP数据库

    外网访问内网OpenLDAP数据库 本地安装了OpenLDAP数据库,只能在局域网内访问,怎样从外网也能访问本地OpenLDAP数据库? 本文将介绍具体的实现步骤. 1. 准备工作 1.1 安装并启动 ...

随机推荐

  1. android -------- 常用依赖库

    //  bannercompile 'com.youth.banner:banner:1.4.9' // recyclecompile 'com.android.support:recyclervie ...

  2. JavaScript高级程序设计笔记(一)

    ---恢复内容开始--- 前三章为基础知识,为了方便以后查看,所以比较啰嗦.这里对函数的基本操作没有记录. 1.JavaScript的实现 虽然 JavaScript 和 ECMAScript 通常都 ...

  3. Parking Lot CodeForces - 480E

    大意: 给定01矩阵, 单点赋值为1, 求最大全0正方形. 将询问倒序处理, 那么答案一定是递增的, 最多增长$O(n)$次, 对于每次操作暴力判断答案是否增长即可, 也就是说转化为判断是否存在一个边 ...

  4. python记录_day01 初始

    一.python介绍 python的创始人为吉多·范罗苏姆(Guido van Rossum),人称龟叔.目前python主要应用于web开发.云计算.科学计算.人工智能.系统运维.金融.图形GUI等 ...

  5. 安卓——BroadcastReceiver

    package com.example.administrator.myapplication_reciver; import android.content.BroadcastReceiver; i ...

  6. git 恢复误删的分支

    在使用git的过程中,因为人为因素造成分支(commit)被删除,可以使用以下步骤进行恢复. 首先用以下步骤创建一个新分支,修改一些文件后删除,以便进行恢复. 1.创建分支 abc git branc ...

  7. Oracle数据库,数字强制显示2位小数

    在银行.财务等对数字要求敏感的系统中,数字的显示一般有着严格的要求.今遇到一个需求,如题,要求将数字以两位小数的格式显示,如果没有小数,则强制显示为0.例如:123.4 显示为 123.4012   ...

  8. Redis+Twemproxy+HAProxy集群(转) 干货

    原文地址:Redis+Twemproxy+HAProxy集群  干货 Redis主从模式 Redis数据库与传统数据库属于并行关系,也就是说传统的关系型数据库保存的是结构化数据,而Redis保存的是一 ...

  9. Python面向对象之继承

    前言: 继承是面向对象的3大特性之一,对于继承一点要注意一下4点. 一.基本查找 如果子类继承了父类,子类的实例化对象,没有的方法和属性会去父类找 class Parent(object): #父类 ...

  10. Linux基线合规检查中各文件的作用及配置脚本

    1./etc/motd 操作:echo " Authorized users only. All activity may be monitored and reported " ...