最近看了亚麻的Dynamo,个人认为其中always writeable的业务目标,对于DHT,vector clock,merkel tree的应用,包括对于一致性和高可用的权衡(基于CAP猜想,实现默认保证分区容错,因此二选一)等都很有意思。建议参考原论文食用。

What is the problem that this paper tries to solve? How would summarise its main idea in a few sentences? How does it work in more detail?

What is good about the paper? What is not good about the paper?

To what extent is the design of Dynamo inspired by Distributed Hash Tables (DHTs)? What are the advantages and disadvantages of such a design?

(part 3.3)

can be described as a zero-hop DHT

P2P:global

dynamo:locality

How does the design of Dynamo compare to that of BigTable?

Dynamo:for ACID(transaction)

BigTable: for structured data


key point:

target: always writeable

consistency & available(dynamo) : always conflict

dynamo: weak consistency: eventual consistency

vector clocks

Dynamo

Requirements

  • simple query model: r/w op for unique key to value, no mutli-data & relational schema

  • consistency & available : sometimes conflict

    Experience at Amazon has shown that data stores that provide ACID guarantees tend to have poor availability.

  • efficiency: commodity hardware infrastructure(通用硬件), achieve SLA

  • other: internal service without security related requirements such as authentication and authorization.

Target: meet SLA

Figure 1: Typically, the aggregator services are stateless, although they use extensive caching.

common standard: average, median and expected variance

while amazon: measured at the 99.9th percentile of the distribution


design

it is well known that when dealing with the possibility of network failures, strong consistency and high data availability cannot be achieved simultaneously

conflict resolution: eventually consistent data store

An important design consideration is to decide when to perform the process of resolving update conflicts

eg. whether conflicts should be resolved during reads(tradition) or writes(dynamo, for "always writeable")

who performs the process of conflict resolution

  • data store: simple, eg. "last write win"
  • application: flexible & suitable

Other key principles:Incremental scalability, Symmetry, Decentralization, Heterogeneity


related work(omit here)

P2P system


Architecture

partitioning, replication, versioning, membership, failure handling and scaling.

interface

get() put()

partitioning

basic consistent hashing algorithm(hash ring):

  • non-uniform data and load distribution
  • oblivious to the heterogeneity

improvement:

virtual node: A virtual node looks like a single node in the system, but each node can be responsible for more than one virtual node.

when a new node is added to the system, it is assigned multiple positions (henceforth, “tokens”) in the ring.

Replication

In addition to locally storing each key within its range, the coordinator replicates these keys at the N-1 clockwise successor nodes in the ring.

eg. in figure2: B itself, & C,D replicated

for virtual nodes, avoid dual node -> preference list stepping position(num > N for possible node failure) -> distinct physical nodes

Data versioning(important for consistency)

Dynamo provides eventual consistency, which allows for updates to be propagated to all replicas asynchronously.

(temporary inconsistencies)

thus, possible multi-versions(even the same data)

vector clocks: capture causality between different versions of the same object

format: a list of (node, counter) pairs

data conflict: return all the data to the client/logic to deal with

size restriction(possible for node failure)

Execution of operation: get() & put()

how to get node?

  1. load balancer route choose
  2. partition-aware client library

configurable values: R and W.

R is the minimum number of nodes that must participate in a successful read operation.

W is the minimum number of nodes that must participate in a successful write operation.

Setting R and W such that R + W > N yields a quorum-like system.

In this model, the latency of a get (or put) operation is dictated by the slowest of the R (or W) replicas.

For this reason, R and W are usually configured to be less than N, to provide better latency.

Handling Failures(temporary node failure): Hinted Handoff

sloppy quorum

handling the failure of an entire data center: each object is replicated across multiple data centers

Handling Failures(permanent node failure): Replica synchronization

Merkle tree: To detect the inconsistencies between replicas faster and to minimize the amount of transferred data

hash the childnode, construct tree from bottom to uphill, anti-entropy

Ring Membership

how virtual node mapped to physical node?

When a node starts for the first time, it chooses its set of tokens (virtual nodes in the consistent hash space) and maps nodes to their respective token sets.

Adding/Removing Storage Nodes

add front keys to new nodes, then remove related repetitive keys from back nodes


Implementation: all Java

  • request coordination
  • membership and failure detection
  • local persistence engine

EXPERIENCES & LESSONS

Class discussion

internal service so dont care about the security problem

virtual node idea -> load balance(flexibility): random -> logical ring depend on token sets

large-scale distributed system:

block chain: for security & anonymous

web3

consistent hash works: the ring partition

DHT(distributed hash table) ring: each node contains previous range

how the data stored: checking alongside the ring efficiently

gossip-based protocol: propagates membership changes and maintains an eventually consistent view of membership

use binary research to find the destination

distinct physical nodes: the preference list skipping particular position in the ring

N: virtual nodes, while it is possible that the multi virtual nodes on the same physical nodes, thus skipping the same physical nodes.

Brewer's conjecture: CAP Theorem

consistency, availability, and partition-tolerance: pick 2 out of 3!

native design: confirm partition, thus sacrifice strong consistency to earn high availability

亚马逊Dynamo数据库解读(英文版)的更多相关文章

  1. 国外物联网平台(1):亚马逊AWS IoT

    国外物联网平台(1)——亚马逊AWS IoT 马智 平台定位 AWS IoT是一款托管的云平台,使互联设备可以轻松安全地与云应用程序及其他设备交互. AWS IoT可支持数十亿台设备和数万亿条消息,并 ...

  2. 国外物联网平台初探(一) ——亚马逊AWS IoT

    平台定位 AWS IoT是一款托管的云平台,使互联设备可以轻松安全地与云应用程序及其他设备交互. AWS IoT可支持数十亿台设备和数万亿条消息,并且可以对这些消息进行处理并将其安全可靠地路由至 AW ...

  3. [转帖]亚马逊彻底去掉 Oracle 数据库:迁移完成

    亚马逊彻底去掉 Oracle 数据库:迁移完成 https://mp.weixin.qq.com/s/KFonq8efDZ5K6x4YzIVbbg 云头条的信息挺不错的.. 2019 年 10 月 1 ...

  4. 亚马逊左侧菜单延迟z三角 jquery插件jquery.menu-aim.js源码解读

    关于亚马逊的左侧菜单延迟,之前一直不知道它的实现原理.梦神提到了z三角,我也不知道这是什么东西.13号那天很有空,等领导们签字完我就可以走了.下午的时候,找到了一篇博客:http://jayuh.co ...

  5. 借助亚马逊S3和RapidMiner将机器学习应用到文本挖掘

    本挖掘典型地运用了机器学习技术,例如聚类,分类,关联规则,和预测建模.这些技术揭示潜在内容中的意义和关系.文本发掘应用于诸如竞争情报,生命科学,客户呼声,媒体和出版,法律和税收,法律实施,情感分析和趋 ...

  6. 微软、谷歌、亚马逊、Facebook等硅谷大厂91个开源软件盘点(附下载地址)

    开源软件中有大量专家构建的代码,大大节省了开发人员的时间和成本,热衷于开源的大厂们总是能够带给我们新的惊喜.2016年9月GitHub报告显示,GitHub已经有超过 520 万的用户和超 30 万的 ...

  7. 程序员面试大揭秘——应聘微软、亚马逊、谷歌、苹果等IT公司你都要做什么准备?

    对于多数求职者而言,面试好似一个迷局.你去了,见了几个面试官,答了一堆问题,然后,或两手空空离开,或幸运地拿到录用通知. 你有没有想过: 面试结果是怎么得出的? 面试官会不会互相交流? 公司最看重哪些 ...

  8. AWS系列之一 亚马逊云服务概述

    云计算经过这几年的发展,已经不再是是一个高大上的名词,而是已经应用到寻常百姓家的技术.每天如果你和互联网打交道,那么或多或少都会和云扯上关系.gmail.github.各种网盘.GAE.heroku等 ...

  9. 成都亚马逊AWSome Day回顾

    6月25日我和公司同仁一起参加了亚马逊在成都的第一场AWSome Day活动.整个活动时间异常紧促,短短一天包含了7堂session,讲师的狂轰乱炸使得我们同学们普遍觉得比上班累多了.好了,废话不多说 ...

  10. 亚马逊云服务之CloudFormation

    亚马逊的Web Service其实包含了一套云服务.云服务主要分为三种: IaaS: Infrastructure as a service,基础设施即服务. PaaS: Platform as a ...

随机推荐

  1. Mysql 开启慢日志查询及查看慢日志 sql

    本文为博主原创,转载请注明出处: 目录:    1.Mysql 开启慢日志配置的查询    2. 通过sql 设置Mysql 的慢日志开启    3. 通过慢 sql 日志文件查看慢 sql  1.M ...

  2. webflux 的使用总结

    本文为博主原创,未经允许不得转载: 1. Servlet 3.0 简介 2. WebFlux 简介 及 特点  3. 基于函数式的 WebFlux 开发 4. webFlux 全局异常 5. webF ...

  3. Mongo库表占用空间统计

    1. 背景 DBA同事反馈说Mongp集群磁盘占用空间过大,超过监控告警95%阈值,因此建议删除部分资源或者申请扩容,本着开源节流的理念,还是乖乖看哪些老数据应该删除.但Mongo中的库和表过多,因此 ...

  4. Kafka 社区KIP-382中文译文(MirrorMaker2/集群复制/高可用/灾难恢复)

    译者:对于Kafka高可用的课题,我想每个公司都有自己的方案及思考,这是一个仁者见仁智者见智的命题,而社区给出了一个较大的特性,即MirrorMaker 2.0,不论是准备做高可用还是单纯的数据备份, ...

  5. MySQL重建表统计信息

    MySQL重建表统计信息 背景 最近一段时间遇到了一些性能问题 发现很多其实都是由于 数据库的索引/统计信息不准确导致的问题. Oracle和SQLServer都遇到了很多类似的问题. 我这边联想到 ...

  6. [转帖]Kubernetes部署Minio集群存储的选择,使用DirectPV CSI作为分布式存储的最佳实践

    Kubernetes部署Minio集群存储的选择,使用DirectPV CSI作为分布式存储的最佳实践 个人理解浅谈 1. 关于在kubernetes上部署分布式存储服务,K8s存储的选择 非云环境部 ...

  7. 【转帖】io_uring vs epoll ,谁在网络编程领域更胜一筹?

    io_uring vs epoll ,谁在网络编程领域更胜一筹? 2021-12-16 1473举报 简介: 从定量分析的角度,通过量化 io_uring 和 epoll 两种编程框架下的相关操作的耗 ...

  8. [转帖]FT-2000+/64 - Phytium

      https://en.wikichip.org/wiki/phytium/feiteng/ft-2000%2B-64 Edit Values FT-2000+/64 General Info De ...

  9. [转帖]UseG1GC垃圾回收技术解析

    https://www.cnblogs.com/yuanzipeng/p/13374690.html 介绍 G1 GC,全称Garbage-First Garbage Collector,通过-XX: ...

  10. [粘贴]【CPU】关于x86、x86_64/x64、amd64和arm64/aarch64

    [CPU]关于x86.x86_64/x64.amd64和arm64/aarch64 https://www.jianshu.com/p/2753c45af9bf 为什么叫x86和x86_64和AMD6 ...