REdis主挂掉后复制节点才起来会如何?
结论:
这种情况下复制节点(即从节点)无法提升为主节点,复制节点会一直尝试和主节点建立连接,直接成功。主节点恢复后,复制节点仍然保持为复制节点,并不会成为主节点。
复制节点无法提升为主节点的原因是复制节点未发起成为主节点的选举。
复制节点日志:
14304:S 26 Mar 2019 15:42:01.158 * Connecting to MASTER 10.49.126.98:4076 14304:S 26 Mar 2019 15:42:01.158 * MASTER <-> REPLICA sync started 14304:S 26 Mar 2019 15:42:01.158 # Error condition on socket for SYNC: Connection refused 14304:S 26 Mar 2019 15:42:02.161 * Connecting to MASTER 10.49.126.98:4076 14304:S 26 Mar 2019 15:42:02.161 * MASTER <-> REPLICA sync started 14304:S 26 Mar 2019 15:42:02.161 # Error condition on socket for SYNC: Connection refused 14304:S 26 Mar 2019 15:42:03.167 * Connecting to MASTER 10.49.126.98:4076 14304:S 26 Mar 2019 15:42:03.167 * MASTER <-> REPLICA sync started 14304:S 26 Mar 2019 15:42:03.167 * Non blocking connect for SYNC fired the event. 主节点正在从磁盘加载数据集(-LOADING)到内存 14304:S 26 Mar 2019 15:42:03.173 # Error reply to PING from master: '-LOADING Redis is loading the dataset in memory' 14304:S 26 Mar 2019 15:42:03.770 * Clear FAIL state for node c67dc9e02e25f2e6321df8ac2eb4d99789917783: is reachable again and nobody is serving its slots after some time. 集群状态恢复正常(之前因为其中一个master故障转为fail状态) 14304:S 26 Mar 2019 15:42:03.770 # Cluster state changed: ok 14304:S 26 Mar 2019 15:42:04.169 * Connecting to MASTER 10.49.126.98:4076 14304:S 26 Mar 2019 15:42:04.169 * MASTER <-> REPLICA sync started 14304:S 26 Mar 2019 15:42:04.169 * Non blocking connect for SYNC fired the event. 14304:S 26 Mar 2019 15:42:04.169 * Master replied to PING, replication can continue... 14304:S 26 Mar 2019 15:42:04.169 * Trying a partial resynchronization (request 725b1fbcfc073eec81837cb0f1fd786c995f4d46:1). 复制节点全量复制主节点数据 14304:S 26 Mar 2019 15:42:04.174 * Full resync from master: 68ef812d5b3dc70adca8c6ed0f306249725df91f:0 因为是全量复制,所以原来的状态没用了(Discarding) 14304:S 26 Mar 2019 15:42:04.174 * Discarding previously cached master state. 14304:S 26 Mar 2019 15:42:04.275 * MASTER <-> REPLICA sync: receiving 106404 bytes from master 14304:S 26 Mar 2019 15:42:04.275 * MASTER <-> REPLICA sync: Flushing old data 14304:S 26 Mar 2019 15:42:04.275 * MASTER <-> REPLICA sync: Loading DB in memory 14304:S 26 Mar 2019 15:42:04.292 * MASTER <-> REPLICA sync: Finished with success 复制节点开始重新AOF文件 14304:S 26 Mar 2019 15:42:04.293 * Background append only file rewriting started by pid 21172 14304:S 26 Mar 2019 15:42:04.325 * AOF rewrite child asks to stop sending diffs. 21172:C 26 Mar 2019 15:42:04.326 * Parent agreed to stop sending diffs. Finalizing AOF... 21172:C 26 Mar 2019 15:42:04.326 * Concatenating 0.00 MB of AOF diff received from parent. 21172:C 26 Mar 2019 15:42:04.326 * SYNC append only file rewrite performed 21172:C 26 Mar 2019 15:42:04.326 * AOF rewrite: 0 MB of memory used by copy-on-write 14304:S 26 Mar 2019 15:42:04.370 * Background AOF rewrite terminated with success 14304:S 26 Mar 2019 15:42:04.370 * Residual parent diff successfully flushed to the rewritten AOF (0.00 MB) 14304:S 26 Mar 2019 15:42:04.370 * Background AOF rewrite finished successfully |
在主节点未恢复之前,从节点无法提供读写服务,即使设置了READONLY:
127.0.0.1:4071> get k4156 (error) CLUSTERDOWN The cluster is down 127.0.0.1:4071> readonly OK 127.0.0.1:4071> get k4156 (error) CLUSTERDOWN The cluster is down |
虽然执行SCAN可看到数据:
127.0.0.1:4071> scan 0 1) "10752" 2) 1) "k5948" 2) "k4156" 3) "k12819" 4) "k24497" 5) "k5926" 6) "k10947" 7) "k7653" 8) "k21631" 9) "k6672" 10) "k2687" 11) "k29036" |
如果主节点永久无法恢复,那么怎么恢复集群?
127.0.0.1:4071> CLUSTER FAILOVER (error) ERR Master is down or failed, please use CLUSTER FAILOVER FORCE |
也就是这种情况下,只能强制恢复(丢失数据和数据不一风险),这个时候复制节点日志变化如下:
1021:S 26 Mar 2019 16:02:04.994 * Connecting to MASTER 10.49.126.98:4076 1021:S 26 Mar 2019 16:02:04.994 * MASTER <-> REPLICA sync started 未强制恢复之前 1021:S 26 Mar 2019 16:02:04.995 # Error condition on socket for SYNC: Connection refused 强制恢复 1021:S 26 Mar 2019 16:02:05.842 # Forced failover user request accepted. 准备发起选举(随机延迟后发起) 1021:S 26 Mar 2019 16:02:05.896 # Start of election delayed for 0 milliseconds (rank #0, offset 0). 1021:S 26 Mar 2019 16:02:05.996 * Connecting to MASTER 10.49.126.98:4076 1021:S 26 Mar 2019 16:02:05.996 * MASTER <-> REPLICA sync started 正式发起选举(任期号为34) 1021:S 26 Mar 2019 16:02:05.996 # Starting a failover election for epoch 34. 1021:S 26 Mar 2019 16:02:06.004 # Error condition on socket for SYNC: Connection refused 不出意料地赢得选举 1021:S 26 Mar 2019 16:02:06.020 # Failover election won: I'm the new master. 1021:S 26 Mar 2019 16:02:06.020 # configEpoch set to 34 after successful failover 1021:M 26 Mar 2019 16:02:06.020 # Setting secondary replication ID to 7b83d297fa53f119c79021661fff533eafabc222, valid up to offset: 1. New replication ID is c5011813ad8fda9ef68da648f2fdfc27eae2afd3 自己已为主,不需要“cached master”了 1021:M 26 Mar 2019 16:02:06.020 * Discarding previously cached master state. 集群状态又恢复正常 1021:M 26 Mar 2019 16:02:06.021 # Cluster state changed: ok |
同时段集群其它主节点日志:
30651:M 26 Mar 2019 15:31:45.438 * Marking node c67dc9e02e25f2e6321df8ac2eb4d99789917783 as failing (quorum reached). 集群状态变标记为fail 30651:M 26 Mar 2019 15:31:45.438 # Cluster state changed: fail 30651:M 26 Mar 2019 15:34:03.022 * Clear FAIL state for node f805e652ff8abe151393430cb3bcbf514b8a7399: replica is reachable again. 30651:M 26 Mar 2019 15:35:45.005 * 10 changes in 300 seconds. Saving... 30651:M 26 Mar 2019 15:35:45.006 * Background saving started by pid 28683 28683:C 26 Mar 2019 15:35:45.016 * DB saved on disk 28683:C 26 Mar 2019 15:35:45.018 * RDB: 0 MB of memory used by copy-on-write 30651:M 26 Mar 2019 15:35:45.106 * Background saving terminated with success 30651:M 26 Mar 2019 15:42:03.769 * Clear FAIL state for node c67dc9e02e25f2e6321df8ac2eb4d99789917783: is reachable again and nobody is serving its slots after some time. 集群状态恢复正常 30651:M 26 Mar 2019 15:42:03.769 # Cluster state changed: ok |
同时段集群其它复制节点日志:
31463:S 26 Mar 2019 15:31:45.438 * FAIL message received from 29fcce29837d3e5266b6178a15aecfa938ff241a about c67dc9e02e25f2e6321df8ac2eb4d99789917783 集群状态变标记为fail 31463:S 26 Mar 2019 15:31:45.439 # Cluster state changed: fail 31463:S 26 Mar 2019 15:34:03.023 * Clear FAIL state for node f805e652ff8abe151393430cb3bcbf514b8a7399: replica is reachable again. 31463:S 26 Mar 2019 15:35:45.100 * 10 changes in 300 seconds. Saving... 31463:S 26 Mar 2019 15:35:45.101 * Background saving started by pid 28695 28695:C 26 Mar 2019 15:35:45.116 * DB saved on disk 28695:C 26 Mar 2019 15:35:45.118 * RDB: 0 MB of memory used by copy-on-write 31463:S 26 Mar 2019 15:35:45.201 * Background saving terminated with success 31463:S 26 Mar 2019 15:42:03.769 * Clear FAIL state for node c67dc9e02e25f2e6321df8ac2eb4d99789917783: is reachable again and nobody is serving its slots after some time. 集群状态恢复正常 31463:S 26 Mar 2019 15:42:03.769 # Cluster state changed: ok |
REdis主挂掉后复制节点才起来会如何?的更多相关文章
- Redis服务挂掉后,重启时闪退
这个时候去进程管理器里找一个 redisservice.exe 的进程..杀死他 杀死他 杀死他!!! 整理领结,嘬口咖啡, 嗯... 然后再来启动服务..
- Redis - Keepalived + redis 主备热备切换
1. 热备方案 硬件:server两台,分别用于master-redis及slave-redis 软件:redis.keepalived 实现目标: 由keepalived对外提供虚拟IP(VIP)进 ...
- 万答#12,MGR整个集群挂掉后,如何才能自动选主,不用手动干预
欢迎来到 GreatSQL社区分享的MySQL技术文章,如有疑问或想学习的内容,可以在下方评论区留言,看到后会进行解答 本文转载自微信公众号"老叶茶馆" MGR整个集群挂掉后,如能 ...
- 如何在主Form出现之前,弹出密码验证From,Cancel就退出程序,Ok后密码正确才出现主Form
如何在主Form出现之前,弹出密码验证From,Cancel就退出程序,Ok后密码正确才出现主Form本文地址 :CodeGo.net/5175478/ ----------------------- ...
- Redis 之江湖遇险-复制运维及优化
一. 前言 上一篇Redis 之深入江湖-复制原理中说了复制的原理,那么在理解复制原理之后,还要知道在这复制功能的背后,还有哪些坑要注意一下,毕竟坑是要跳过去的,而不是跳进去的. 二. 读写分离的一些 ...
- Redis命令参考之复制(Replication)
Redis 支持简单且易用的主从复制(master-slave replication)功能, 该功能可以让从服务器(slave server)成为主服务器(master server)的精确复制品. ...
- Redis(六)复制
在分布式系统中为了解决单点问题,通常会把数据复制多个副本部署到其他机器,满足故障恢复和负载均衡等需求.Redis也是如此,它为我们提供了复制功能,实现了相同数据的多个Redis副本.复制功能是高可用R ...
- 搭建和测试 Redis 主备和集群
本文章只是自我学习用,不适宜转载. 1. Redis主备集群 1.1 搭建步骤 机器:海航云虚机(2核4GB内存),使用 Centos 7.2 64bit 操作系统,IP 分别是 192.168.10 ...
- Redis 之深入江湖-复制原理
一.前言 上一篇文章Redis 之复制-初入江湖中,讲了关于Redis复制配置,如:如何建立配置.如何断开复制.关于链接的安全性等等,那么本篇文章将深入的去说一下关于Redis复制原理,如下: 复制过 ...
随机推荐
- Container and Injection in Java
一.Container 1.为什么使用Container 通常,瘦客户端多层应用程序很难编写,因为它们涉及处理事务和状态管理.多线程.资源池和其他复杂的低级细节的复杂代码行.基于组件和独立于平台的Ja ...
- 小A与小B-(双向bfs)
链接:https://ac.nowcoder.com/acm/contest/549/G来源:牛客网 题目描述 小A与小B这次两个人都被困在了迷宫里面的两个不同的位置,而他们希望能够迅速找到对方,然后 ...
- yii2.0 url美化-apache服务器
//配置内容 'urlManager' => [ 'enablePrettyUrl' => true, 'enableStrictParsing' => false, //不启用严格 ...
- WMS程序部署
UI部署UI-20190107-landor-修改什么BUG.JAR162\163 APP部署 外部JSP部署 备份META这个SCHEMA
- 解题(LeatestCarFee -计算最少过路费)
NowCoder今年买了一辆新车,他决定自己开车回家过年.回家过程中要经过ň个大小收费站,每个收费站的费用不同,你能帮他计算一下最少需要给多少过路费吗? 输入描述: 输入包含多组数据,每组数据第一行包 ...
- 《Orange‘s》FAT12文件系统
FAT12 层次 扇区(Sector):磁盘上的最小数据单元 簇(Cluster):一个或多个扇区 分区(Partition):通常指整个文件系统 引导扇区 引导扇区是整块软盘的第0个扇区,在这个扇区 ...
- innodb 关键特性(两次写与自适应哈希索引)
两次写: 场景: 当发生数据库宕机时,可能innodb存储引擎正在写入某个页到表中,而这个页只写了一部分,这种情况被称为部分写失效,如果发生,可以通过重做日志进行恢复,重做日志中记录的是对页的物理操作 ...
- 【noip模拟赛5】细菌 状压dp
[noip模拟赛5]细菌 描述 近期,农场出现了D(1<=D<=15)种细菌.John要从他的 N(1<=N<=1,000)头奶牛中尽可能多地选些产奶.但是如果选中的奶牛携 ...
- 用python计算圆周率PI
1.蒙特卡洛求圆周率 向区域内随即撒点 当点的数目足够多时,落在圆的点数目与在正方形点数目成正比 即圆的面积和正方形的面积成正比 可以得出计算圆周率的算法 DARTS=100000000 hits ...
- setTimeout与Promise的区别
1,4,3,2 Promise是一个micro task 主线程是一个task micro task queue会在task后面执行 setTimeout返回的函数是一个新的task macro ...