RabbitMQ脑裂
在RabbitMQ3.4.x中会出现脑裂的现象,本文通过实验验证此脑裂现象,愿小伙伴们少走弯路。
Preview
网上有两篇帖子(需要FQ) 
https://groups.google.com/forum/#!topic/rabbitmq-users/dt8VFhMb2zM 
https://groups.google.com/forum/#!topic/rabbitmq-users/06OQkYtLJd8 
陈述了脑裂的现象。
帖子中描述现象:
Hey Folk,
i just set up a rabbitmq cluster:
Three Nodes:
Node A | Node B | Node C
All three nodes see each other (same erlang-cookie, mode: pause_minority).
 rabbitmqctl cluster_status => shows status of all nodes on every instance.
Every queue is mirrored to the other nodes.
If i shutdown Node B, the following is happening:
* Node A realizes Node B is offline.
* Node A asks Node C for Node B status.
* Node C answers: "I still have connection to Node B."
* Node A shuts down itself.
* Node C realizes some seconds later, that the connection to Node B is no more possible.
From three Nodes only one is left in case of an unexpected outage.
I would like to realize a setup where Node A and C keep the connection even if Node B goes offline.
Is there any way to do this?Michael Klishin(rabbitmq-server第二贡献者)回复:
A known issue which is partially resolve in 3.4.x releases. 26474 can be related. (根据RabbitMQ 3.4.2 Release日志:26474 prevent false positive detection of partial partitions (since 3.4.0))
Simon MacMullen(也是rabbitmq-server的contributor):
So this is caused by the new partial partition detection in 3.4.x. It
looks like it is too sensitive - C should only reply "yes" if it has
positive confirmation that it can still talk to B, not if the connection
just hasn't failed yet. 
This will be fixed in 3.4.2. 假设
自此可以假设:rabbitmq3.4.0存在脑裂现象,rabbitmq3.4.2修复了此bug。 
论证过程:分别对rabbitmq3.4.0, rabbitmq3.4.1, rabbitmq3.4.2, rabbitmq3.6.0进行实验, 分别配置A B C三个节点组成一个cluster,然后通过停止C的网络来验证A和B是否出现脑裂.
可以通过发现脑裂问题证明rabbitmq3.4.0和rabbitmq3.4.1有bug,而无法通过没发现脑裂问题证明rabbitmq3.4.2之后的版本没有此bug.
就好像证明一个男人是坏男人,一次出轨就可以说明了;要证明一个男人是好男人,要盯着他一辈子看他有没有出轨……
论证
论证1
rabbitmq版本:3.4.0 
rabbitmq节点配置 
共三个节点:A B C,分别为: 
A:rabbit@zhuzhonghua2-fqawb 
B:rabbit@hiddenzhu-8drd 
C:rabbit@hidden-local 
B join_cluster A; C join_cluster A
查看cluster_status:(rabbitmqctl cluster_status)
Cluster status of node 'rabbit@zhuzhonghua2-fqawb' ...
[{nodes,[{disc,['rabbit@hidden-local','rabbit@hiddenzhu-8drdc',
                'rabbit@zhuzhonghua2-fqawb']}]},
 {running_nodes,['rabbit@hidden-local','rabbit@hiddenzhu-8drdc',
                 'rabbit@zhuzhonghua2-fqawb']},
 {cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},
 {partitions,[]}]在C节点执行service network stop 
在A节点查看cluster_status
[{nodes,[{disc,['rabbit@hidden-local','rabbit@hiddenzhu-8drdc',
                'rabbit@zhuzhonghua2-fqawb']}]},
 {running_nodes,['rabbit@zhuzhonghua2-fqawb']},
 {cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},
 {partitions,[]}]再次在A节点查看cluster_status
Cluster status of node 'rabbit@zhuzhonghua2-fqawb' ...
[{nodes,[{disc,['rabbit@hidden-local','rabbit@hiddenzhu-8drdc',
                'rabbit@zhuzhonghua2-fqawb']}]},
 {running_nodes,['rabbit@zhuzhonghua2-fqawb']},
 {cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},
 {partitions,[{'rabbit@zhuzhonghua2-fqawb',['rabbit@hiddenzhu-8drdc']}]}]
在B节点查看cluster_status
[{nodes,[{disc,['rabbit@hidden-local','rabbit@hiddenzhu-8drdc',
                'rabbit@zhuzhonghua2-fqawb']}]},
 {running_nodes,['rabbit@hiddenzhu-8drdc']},
 {cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},
 {partitions,[{'rabbit@hiddenzhu-8drdc',['rabbit@zhuzhonghua2-fqawb']}]}]结论:【复现脑裂】
在C节点执行service network start 
查看A节点cluster_status
[{nodes,
     [{disc,
          ['rabbit@hidden-local','rabbit@hiddenzhu-8drdc',
           'rabbit@zhuzhonghua2-fqawb']}]},
 {running_nodes,['rabbit@zhuzhonghua2-fqawb']},
 {cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},
 {partitions,
     [{'rabbit@zhuzhonghua2-fqawb',
          ['rabbit@hidden-local','rabbit@hiddenzhu-8drdc']}]}]查看B节点cluster_status
[{nodes,[{disc,['rabbit@hidden-local','rabbit@hiddenzhu-8drdc',
                'rabbit@zhuzhonghua2-fqawb']}]},
 {running_nodes,['rabbit@hiddenzhu-8drdc']},
 {cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},
 {partitions,[{'rabbit@hiddenzhu-8drdc',['rabbit@zhuzhonghua2-fqawb']}]}]查看C节点cluster_status
[{nodes,[{disc,['rabbit@hidden-local','rabbit@hiddenzhu-8drdc',
                'rabbit@zhuzhonghua2-fqawb']}]},
 {running_nodes,['rabbit@hidden-local']},
 {cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},
 {partitions,[{'rabbit@hidden-local',['rabbit@zhuzhonghua2-fqawb']}]}]论证2
rabbitmq版本:3.4.1 
节点配置如上(B join_cluster A, C join_cluster A) 
查看节点状态:
[{nodes,[{disc,['rabbit@hidden-local','rabbit@hiddenzhu-8drdc',
                'rabbit@zhuzhonghua2-fqawb']}]},
 {running_nodes,['rabbit@hiddenzhu-8drdc','rabbit@zhuzhonghua2-fqawb',
                 'rabbit@hidden-local']},
 {cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},
 {partitions,[]}]在C节点执行service network stop 
查看A节点cluster_status
Cluster status of node 'rabbit@zhuzhonghua2-fqawb' ...
[{nodes,[{disc,['rabbit@hidden-local','rabbit@hiddenzhu-8drdc',
                'rabbit@zhuzhonghua2-fqawb']}]},
 {running_nodes,['rabbit@zhuzhonghua2-fqawb']},
 {cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},
 {partitions,[{'rabbit@zhuzhonghua2-fqawb',['rabbit@hiddenzhu-8drdc']}]}]查看B节点cluster_status
[{nodes,[{disc,['rabbit@hidden-local','rabbit@hiddenzhu-8drdc',
                'rabbit@zhuzhonghua2-fqawb']}]},
 {running_nodes,['rabbit@hiddenzhu-8drdc']},
 {cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},
 {partitions,[{'rabbit@hiddenzhu-8drdc',['rabbit@zhuzhonghua2-fqawb']}]}]结论:【复现脑裂】
在C节点执行service network start 
查看A节点cluster_status
[{nodes,
     [{disc,
          ['rabbit@hidden-local','rabbit@hiddenzhu-8drdc',
           'rabbit@zhuzhonghua2-fqawb']}]},
 {running_nodes,['rabbit@zhuzhonghua2-fqawb']},
 {cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},
 {partitions,
     [{'rabbit@zhuzhonghua2-fqawb',
          ['rabbit@hidden-local','rabbit@hiddenzhu-8drdc']}]}]查看B节点cluster_status
[{nodes,[{disc,['rabbit@hidden-local','rabbit@hiddenzhu-8drdc',
                'rabbit@zhuzhonghua2-fqawb']}]},
 {running_nodes,['rabbit@hiddenzhu-8drdc']},
 {cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},
 {partitions,[{'rabbit@hiddenzhu-8drdc',['rabbit@zhuzhonghua2-fqawb']}]}]查看C节点cluster_status
[{nodes,[{disc,['rabbit@hidden-local','rabbit@hiddenzhu-8drdc',
                'rabbit@zhuzhonghua2-fqawb']}]},
 {running_nodes,['rabbit@hidden-local']},
 {cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},
 {partitions,[{'rabbit@hidden-local',['rabbit@zhuzhonghua2-fqawb']}]}]论证3
rabbitmq版本:3.4.2 (版本3.6.0与此相同) 
节点配置如上(B join_cluster A, C join_cluster A) 
查看节点状态
[{nodes,[{disc,['rabbit@hidden-local','rabbit@hiddenzhu-8drdc',
                'rabbit@zhuzhonghua2-fqawb']}]},
 {running_nodes,['rabbit@hiddenzhu-8drdc','rabbit@zhuzhonghua2-fqawb',
                 'rabbit@hidden-local']},
 {cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},
 {partitions,[]}]在C节点执行service network stop 
查看A节点cluster_status
[{nodes,[{disc,['rabbit@hidden-local','rabbit@hiddenzhu-8drdc',
                'rabbit@zhuzhonghua2-fqawb']}]},
 {running_nodes,['rabbit@hiddenzhu-8drdc','rabbit@zhuzhonghua2-fqawb']},
 {cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},
 {partitions,[]}]查看B节点cluster_status
[{nodes,[{disc,['rabbit@hidden-local','rabbit@hiddenzhu-8drdc',
                'rabbit@zhuzhonghua2-fqawb']}]},
 {running_nodes,['rabbit@zhuzhonghua2-fqawb','rabbit@hiddenzhu-8drdc']},
 {cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},
 {partitions,[]}]结论:【未发现脑裂】
在C节点执行service network start 
查看A节点cluster_status
[{nodes,[{disc,['rabbit@hidden-local','rabbit@hiddenzhu-8drdc',
                'rabbit@zhuzhonghua2-fqawb']}]},
 {running_nodes,['rabbit@hiddenzhu-8drdc','rabbit@zhuzhonghua2-fqawb']},
 {cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},
 {partitions,[{'rabbit@zhuzhonghua2-fqawb',['rabbit@hidden-local']}]}]查看B节点cluster_status
[{nodes,[{disc,['rabbit@hidden-local','rabbit@hiddenzhu-8drdc',
                'rabbit@zhuzhonghua2-fqawb']}]},
 {running_nodes,['rabbit@zhuzhonghua2-fqawb','rabbit@hiddenzhu-8drdc']},
 {cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},
 {partitions,[{'rabbit@zhuzhonghua2-fqawb',['rabbit@hidden-local']}]}]查看C节点cluster_status
[{nodes,[{disc,['rabbit@hidden-local','rabbit@hiddenzhu-8drdc',
                'rabbit@zhuzhonghua2-fqawb']}]},
 {running_nodes,['rabbit@hidden-local']},
 {cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},
 {partitions,[{'rabbit@hidden-local',['rabbit@zhuzhonghua2-fqawb']}]}]结论
版本问题基本得到验证,为了防止脑裂现象,建议正在使用rabbitmq的小伙伴升级,避免使用3.4.0和3.4.1这两个版本。 
但是依然会有网络分区的问题!!!!
网络分区
有关网络分区有篇文章(RabbitMQ 网络分区问题)这样介绍:
RabbitMQ 集群的网络分区容错性并不是非常高,在网络经常发生分区时会有些问题,最明显的就是脑裂问题。
官方文档是这样介绍的:
RabbitMQ clusters do not tolerate network partitions well. If you are thinking of clustering across a WAN, don't. You should use federation or the shovel instead.从中我们可以看出,在广域网环境下不应该使用集群,而应该使用 federation 或者 shovel 来解决。
不过即使是在局域网环境下,网络分区也不可能完全避免,网络设备(比如中继设备、网卡)出现故障也会导致网络分区。
Network partition detected
Mnesia reports that this RabbitMQ cluster has experienced a network partition. This is a dangerous situation. RabbitMQ clusters should not be installed on networks which can experience partitions. 当出现网络分区时,不同分区里的节点会认为不属于自身所在分区的节点都已经挂了,对 queue、exchange、binding 的操作仅对当前分区有效。在 RabbitMQ 的默认配置下,即使网络恢复了也不会自动处理网络分区带来的问题从而恢复集群。RabbitMQ(3.1+)会自动探测网络分区,并且提供了配置来解决这个问题。
[
 {rabbit,
 [{tcp_listeners,[5672]},
 {cluster_partition_handling, ignore}]
 }
].RabbitMQ 提供了三种配置:
- ignore:默认配置,发生网络分区时不作处理,当认为网络是可靠时选用该配置
- autoheal:各分区协商后重启客户端连接最少的分区节点,恢复集群(CAP 中保证 AP,有状态丢失)
- pause_minority:分区发生后判断自己所在分区内节点是否超过集群总节点数一半,如果没有超过则暂停这些节点(保证 CP,总节点数为奇数个)
参考: 
  ● RabbitMQ 官方文档 
  ● 网络分区 
  ● 脑裂问题
RabbitMQ脑裂的更多相关文章
- RabbitMq脑裂问题
		现象 部署在阿里云上的2台RabbitMQ主从,访问management页面时出现如下所示的内容: 查看其中一个mq的日志,发现如下内容: 00:06:32.423 [warning] <0.5 ... 
- RabbitMQ脑裂问题解决方案调查
		现象: RabbitMQ GUI上显示 Network partition detectedMnesia reports that this RabbitMQ cluster has experien ... 
- rabbitmq 脑裂(网络分区)
		1.产生的原因 https://blog.csdn.net/zyz511919766/article/details/45198055 2.相关配置.如何规避 https://blog.csdn.ne ... 
- [译]如何防止elasticsearch的脑裂问题
		本文翻译自blog.trifork.com的博文 地址是http://blog.trifork.com/2013/10/24/how-to-avoid-the-split-brain-problem- ... 
- 如何防止ElasticSearch集群出现脑裂现象(转)
		原文:http://xingxiudong.com/2015/01/05/resolve-elasticsearch-split-brain/ 什么是“脑裂”现象? 由于某些节点的失效,部分节点的网络 ... 
- 高可用性中的脑裂问题(split-brain problem in HA)(转)
		欢迎关注我的社交账号: 邮箱: jiangxinnju@163.com 博客园地址: http://www.cnblogs.com/jiangxinnju GitHub地址: https://gith ... 
- Zookeeper 脑裂
		转自 http://blog.csdn.net/u010185262/article/details/49910301 Zookeeper zookeeper是一个分布式应用程序的协调服务.它是一个为 ... 
- AIX下解决POWERHA的脑裂问题
		一.安装创建并发vg时必需的软件包clvm包,该包安装.升级.后必须重启os clvm包的描述:Enhanced Concurrent Logical Volume Manager 软件包在aix61 ... 
- Elasticsearch笔记八之脑裂
		Elasticsearch笔记八之脑裂 概述: 一个正常es集群中只有一个主节点,主节点负责管理整个集群,集群的所有节点都会选择同一个节点作为主节点所以无论访问那个节点都可以查看集群的状态信息. 而脑 ... 
随机推荐
- linux tree命令以树形结构显示文件目录结构
			http://jingyan.baidu.com/article/acf728fd19c7eff8e510a3eb.html winscp 传递文件到ubuntu上用winscp 
- Net Core MVC6 RC2 启动过程分析
			入口程序 如果做过Web之外开发的人,应该记得这个是标准的Console或者Winform的入口.为什么会这样呢?.NET Web Development and Tools Blog ASP.NET ... 
- Spark源码分析 – BlockManager
			参考, Spark源码分析之-Storage模块 对于storage, 为何Spark需要storage模块?为了cache RDD Spark的特点就是可以将RDD cache在memory或dis ... 
- Spring AOP和事务的相关陷阱
			1.前言 2.嵌套方法拦截失效 2.1 问题场景 2.2 解决方案 2.3 原因分析 2.3.1 原理 2.3.2 源代码分析 3.Spring事务在多线程环境下失效 3.1 问题场景 3.2 解决方 ... 
- .Net站点架构设计(八)測试
			.Net站点架构时间(八)測试 一般而言.总体測试策略是:先针对部分系统进行性能及压力測试,得到各部分的峰值处理性能:再模拟总体流程測试,此时倒不用依照峰值跑,重点測试总体业务流程及业务预期负荷. 在 ... 
- (0.2)linux下Mysql的安装配置与管理入门(目录篇)
			本章学习内容: 1.基于Linux平台的Mysql项目场景介绍 1.1.互联网各类网站.购物网站.门户网站.博客系统.IDC,云平台,VPS,虚拟主机空间,论坛,嵌入式. 2.mysql数据库运行环境 ... 
- centos7部署PaaS平台环境(mesos+marathon)
			假如有5台主机可以使用,ip地址如下 规划(2master+3slave) master: 192.168.248.205 ---master1 192.168.248.206 ---master2 ... 
- 创建Java不可变类
			不可变(immutable)类的意思是创建该类的实例后,该实例的Field是不可改变的,Java提供的8个包装类和java.lang.String类都是不可变类. 如果需要创建自定义的不可变类,可遵守 ... 
- 专项训练错题整理-nowcoder-算法
			一.排序 1.快速排序在下列哪种情况下最易发挥其长处? 答案是: 被排序的数据完全无序. 在数据基本有序的情况下,会退化为冒泡排序,复杂度会退化为O(n^2). ①[因为,如果是基本有序的话, 那么每 ... 
- OpenSSL拒绝服务漏洞(CNVD-2016-01479)
			更新OpenSSL版本. [root@nginx ~]# openssl version -a OpenSSL 1.0.1e-fips 11 Feb 2013 built on: Wed Mar 22 ... 
