ref: http://redis.io/topics/cluster-spec

1. 设计目标: 高性能;线性扩展;不支持合并操作;写操作安全:小概率丢弃;(对于每个key)只要有一个slave工作,就可用;

Redis Cluster is a distributed implementation of Redis with the following goals, in order of importance in the design:

  • High performance and linear scalability up to 1000 nodes.
  • No merge operations in order to play well with values size and semantics typical of the Redis data model.
  • Write safety: the system tries to retain all the writes originating from clients connected with the majority of the nodes. However there are small windows where acknowledged writes can be lost.
  • Availability: Redis Cluster is able to survive to partitions where the majority of the master nodes are reachable and there is at least a reachable slave for every master node that is no longer reachable.

What is described in this document is implemented in the unstable branch of the Github Redis repository. Redis Cluster has now entered the beta stage, so new betas are released every month and can be found in the download page of the Redis web site.

2. 使用hash tag实现将一个key总是路由到固定的节点;

3. 通信协议:

  • 通过一个“cluster bus”的二进制协议通信;
  • 每个几点都和其他所有节点建立tcp链接(在这点就不是线性扩展的);
  • client发起链接给任何请求都是允许的,但是节点不做proxy功能,而是像http那样,返回一个重定向的错误信息;

4. 安全写: 存在两个丢数据的可能:

Redis Cluster tries hard to retain all the writes that are performed by clients connected to the majority of masters, with two exceptions:

1) A write may reach a master, but while the master may be able to reply to the client, the write may not be propagated to slaves via the asynchronous replication used between master and slave nodes. If the master dies without the write reaching the slaves, the write is lost forever in case the master is unreachable for a long enough period that one of its slaves is promoted.

2) Another theoretically possible failure mode where writes are lost is the following:

  • A master is unreachable because of a partition.
  • It gets failed over by one of its slaves.
  • After some time it may be reachable again.
  • A client with a not updated routing table may write to it before the master is converted to a slave (of the new master) by the cluster.

5 可用性: 当网络分裂时,包含多数server的一侧可以正常使用,另一侧不可以使用; 不适用于大规模网络故障的场景; 对任何一个key,只要有一个master或slave存在,就能正常访问;

6. 性能: (每个节点)与单个redis基本相同(这就是所谓的性能线性增长);

7. 为什么不支持merge操作: 性能考虑;

8. key的分布:先CRC然后取模分配从16k的分片(slot),然后在分配到各个node上;

HASH_SLOT = CRC16(key) mod 16384

The CRC16 is specified as follows:

  • Name: XMODEM (also known as ZMODEM or CRC-16/ACORN)
  • Width: 16 bit
  • Poly: 1021 (That is actually x16 + x12 + x5 + 1)
  • Initialization: 0000
  • Reflect Input byte: False
  • Reflect Output CRC: False
  • Xor constant to output CRC: 0000
  • Output for "123456789": 31C3

9 。 keys hash tag: 在key “{tag}otherString”中,tag就是hash tags,用于计算这个key的slot位置,为了实现先同tag的key映射到相同的slot中。10。 node属性:node的标识是一个随机数,第一次运行时写入到配置文件,并保持不变;

Every node has other associated information that all the other nodes know:

  • The IP address and TCP port where the node is located.
  • A set of flags.
  • A set of hash slots served by the node.
  • Last time we sent a ping packet using the cluster bus.
  • Last time we received a pong packet in reply.
  • The time at which we flagged the node as failing.
  • The number of slaves of this node.
  • The master node ID, if this node is a slave (or 0000000... if it is a master).

11. Cluster topology 拓扑: 全连接且长tcp链接。

Redis cluster is a full mesh where every node is connected with every other node using a TCP connection.

In a cluster of N nodes, every node has N-1 outgoing TCP connections, and N-1 incoming connections.

These TCP connections are kept alive all the time and are not created on demand.

12. 节点间通信: 新节点加入时,只有管理员才能发起MEET消息;MEET消息会在cluster中传播。

13. 重定向策略

A Redis client is free to send queries to every node in the cluster, including slave nodes. The node will analyze the query, and if it is acceptable (that is, only a single key is mentioned in the query) it will see what node is responsible for the hash slot where the key belongs.

If the hash slot is served by the node, the query is simply processed, otherwise the node will check its internal hash slot -> node ID map and will reply to the client with a MOVED error.

A MOVED error is like the following:

GET x
-MOVED 3999 127.0.0.1:6381

14. key迁移:

The following subcommands are available:

  • CLUSTER ADDSLOTS slot1 [slot2] ... [slotN]
  • CLUSTER DELSLOTS slot1 [slot2] ... [slotN]
  • CLUSTER SETSLOT slot NODE node
  • CLUSTER SETSLOT slot MIGRATING node
  • CLUSTER SETSLOT slot IMPORTING node
  • CLUSTER GETKEYSINSLOT slot count
  • MIGRATE target_host target_port key target_database id timeout

15. Ask redirection: 查询一个key的位置

16. Client处理重定向: client应该适当记录key与slot的关系(减少redirect的次数),并且处理redirect的错误信息;  

17 . 多key。Multiple keys operations

Using hash tags clients are free to use multiple-keys operations. For example the following operation is valid:

MSET {user:1000}.name Angela {user:1000}.surname White

18 容错:

  1. 节点间心跳检测:随机发给随机数量的节点,使得整个cluster的总心跳数在N的规模;
  2. 心跳报文内容:

The common header has the following information:

  • Node ID, that is a 160 bit pseudorandom string that is assigned the first time a node is created and remains the same for all the life of a Redis Cluster node.
  • The currentEpoch and configEpoch field, that are used in order to mount the distributed algorithms used by Redis Cluster (this is explained in details in the next sections). If the node is a slave the configEpoch is the last known configEpoch of the master.
  • The node flags, indicating if the node is a slave, a master, and other single-bit node information.
  • A bitmap of the hash slots served by a given node, or if the node is a slave, a bitmap of the slots served by its master.
  • Port: the sender TCP base port (that is, the port used by Redis to accept client commands, add 10000 to this to obtain the cluster port).
  • State: the state of the cluster from the point of view of the sender (down or ok).
  • The master node ID, if this is a slave.

19 失效节点检测: PFAIL/FAIL标志。当A心跳检测B失败,A标志为B为PFAIL;然后A询问其他的node,如果多数节点返回B为PFAIL,则A标志B为FAIL,并通知其他所有的node B为FAIL。

This mechanism is used in order to escalate a PFAIL condition to a FAIL condition, when the following set of conditions are met:

  • Some node, that we'll call A, has another node B flagged as PFAIL.
  • Node A collected, via gossip sections, information about the state of B from the point of view of the majority of masters in the cluster.
  • The majority of masters signaled the PFAIL or PFAIL condition within NODE_TIMEOUT * FAIL_REPORT_VALIDITY_MULT time.

If all the above conditions are true, Node A will:

  • Mark the node as FAIL.
  • Send a FAIL message to all the reachable nodes.

The FAIL message will force every receiving node to mark the node in FAIL state.

20. 逻辑时钟:Cluster epoch

21. Slave提升为Master: 检测到master失效-》slave发起选举-》赢得选举的slave将自己变为master. 选举过程:

  • slave A发送FAILOVER_AUTH_REQUEST给其他所有的master,并等待回应(至少NODE_TIMEOUT*2时长);
  • 其他master收到FAILOVER_AUTH_REQUEST请求后,决定如果同意就回应FAILOVER_AUTH_ACK消息,并且在2*NODE_TIMEOUT时间内不在同意其他请求(和zk的类似)
  • slave A 如果收到大多少(超过半数)master的ack回应,则赢得选举,广播自己赢得选举的消息。 然后就可以升为master了。

22. key slot分配与信息传播。

Rule 1: If an hash slot is unassigned, and a known node claims it, I'll modify my hash slot table to associate the hash slot to this node.

Rule 2: If an hash slot is already assigned, and a known node is advertising it using a configEpoch that is greater than theconfigEpoch advertised by the current owner of the slot, I'll rebind the hash slot to the new node.

22. publish and subscribe: 可以向任何一个节点publish或subscribe,cluter内部会通知到正确的节点;

地方

Redis cluster Specification 笔记的更多相关文章

  1. [redis] 与redis cluster有关的学习笔记

    主要是以下三个官方文档,只略读了前两个,第三个还没有读. <redis cluster tutorial> <redis sentinel> <redis cluster ...

  2. 近千节点的Redis Cluster高可用集群案例:优酷蓝鲸优化实战(摘自高可用架构)

    (原创)2016-07-26 吴建超 高可用架构导读:Redis Cluster 作者建议的最大集群规模 1,000 节点,目前优酷在蓝鲸项目中管理了超过 700 台节点,积累了 Redis Clus ...

  3. Redis Cluster(Redis 3.X)设计要点

    Redis 3.0.0 RC1版本号10.9号公布,Release Note这个版本号支持Redis Cluster.相信非常多同学期待已久,只是这个版本号仅仅是RC版本号,要应用到生产环境,还得等等 ...

  4. 在 Windows 上测试 Redis Cluster的集群填坑笔记

    redis 集群实现的原理请参考http://www.tuicool.com/articles/VvIZje       集群环境至少需要3个节点.推荐使用6个节点配置,即3个主节点,3个从节点. 新 ...

  5. Redis Cluster笔记

    Redis Cluster .0搭建与使用 介绍: 特性:使用数据分片(sharding)而非一致性哈希(consistency hashing)来实现: 一个 Redis 集群包含 个哈希槽(has ...

  6. K8S部署Redis Cluster集群(三主三从模式) - 部署笔记

    一.Redis 介绍 Redis代表REmote DIctionary Server是一种开源的内存中数据存储,通常用作数据库,缓存或消息代理.它可以存储和操作高级数据类型,例如列表,地图,集合和排序 ...

  7. Redis Cluster集群搭建与应用

    1.redis-cluster设计 Redis集群搭建的方式有多种,例如使用zookeeper,但从redis 3.0之后版本支持redis-cluster集群,redis-cluster采用无中心结 ...

  8. 43 【redis cluster】

    有两篇文章不错,可以看下: 1,初步理解redis cluster:https://blog.csdn.net/dc_726/article/details/48552531 2,仔细理解redis ...

  9. 第十一课——codis-server的高可用,对比codis和redis cluster的优缺点

    [作业描述] 1.配置codis-ha 2.总结对比codis的集群方式和redis的cluster集群的优缺点 =========================================== ...

随机推荐

  1. vue组件之间通信的8种方式

    对于vue来说,组件之间的消息传递是非常重要的,下面是我对组件之间消息传递的常用方式的总结. props和$emit(常用) $attrs和$listeners 中央事件总线(非父子组件间通信) v- ...

  2. vue项目进行时,script标签中,methods事件中函数使用的async/await

    用 async/await 来处理异步 await关键字只能放到async函数里面,通过await得到就是Promise返回的内容:当然也能通过then()去获取,若通过then()获取了则就无Pro ...

  3. 学习-Pytest(五)yield操作

    1.fixture的teardown操作并不是独立的函数,用yield关键字呼唤teardown操作 2.scope="module" 1.fixture参数scope=”modu ...

  4. 如何修改Git已提交的日志

    情况一:最后一次提交且未push 执行以下命令: git commit --amend git会打开$EDITOR编辑器,它会加载这次提交的日志,这样我们就可以在上面编辑,编辑后保存即完成此次的修改. ...

  5. 新手的Linux zcat命令示例

    Zcat是一个命令行实用程序,用于查看压缩文件的内容.它将压缩文件扩展为标准输出,允许您查看内容. 分类:Linux命令操作系统 2018-08-13 00:00:00 通常,使用gzip压缩的文件可 ...

  6. 树上独立集数量 树型DP

    题目描述: 对于一棵树,独立集是指两两互不相邻的节点构成的集合.例如,图1有5个不同的独立集(1个双点集合.3个单点集合.1个空集),图2有14个不同的独立集,图3有5536个不同的独立集.  输入: ...

  7. OOP三大核心封装继承多态

    OOP支柱 3 个核心:封装 继承 多态 封装就是将实现细节隐藏起来,也起到了数据保护的作用. 继承就是基于已有类来创建新类可以继承基类的核心功能. 在继承中 另外一种代码重用是:包含/委托,这种重用 ...

  8. Bootstarp-table入门(1)

    https://blog.csdn.net/dlf123321/article/details/52231926?locationNum=11&fps=1

  9. while循环练习:

    输入姑娘的年龄后,进行以下判断: 如果姑娘小于18岁,打印"不接受未成年" 如果姑娘大于18岁小于25岁,打印"心动表白" 如果姑娘大于25岁小于45岁,打印& ...

  10. SQL语句 运算符

    6.2 运算符   6.2.1 算术运算符 加 / 减 / 乘 / 除 6.2.2 连接运算符 是用来连接字符串的.跟java中的 + 是一致的. select 'abc' || ' bcd ' as ...