zookeeper选举代码分析
本文将以zookeeper的3.4.6版本作为源码分析版本。主要的代码类包括QuorumPeerMain、QuorumPeer、FastLeaderElection、QuorumMaj等。
假设有a,b,c三个zookeeper服务,serverid分别是1、2、3:
1.先启动集群中的a服务,先投票自己a为leader,并将投票信息发送给自己;
QuorumPeerMain对象调用QuorumPeer线程的startLeaderElection方法,最终调用FastLeaderElection的lookForLeader方法
2.再启动集群中的b服务,先投票自己b为leader,并将投票信息发送给自己和a服务;
3.a收到b的投票信息,服务a通过对比算法,选择b为leader,并通知自己和服务b;
protected boolean totalOrderPredicate(long newId, long newZxid, long newEpoch, long curId, long curZxid, long curEpoch) {
LOG.debug("id: " + newId + ", proposed id: " + curId + ", zxid: 0x" +
Long.toHexString(newZxid) + ", proposed zxid: 0x" + Long.toHexString(curZxid));
if(self.getQuorumVerifier().getWeight(newId) == 0){
return false;
}
/*
* We return true if one of the following three cases hold:
* 1- New epoch is higher
* 2- New epoch is the same as current epoch, but new zxid is higher
* 3- New epoch is the same as current epoch, new zxid is the same
* as current zxid, but server id is higher.
*/
return ((newEpoch > curEpoch) ||
((newEpoch == curEpoch) &&
((newZxid > curZxid) || ((newZxid == curZxid) && (newId > curId)))));
}
4.服务a通过验证算法,确认b为leader;
5.服务b收到a的通知信息,通过验证算法b确认自身为leader;
protected boolean termPredicate(
HashMap<Long, Vote> votes,
Vote vote) { HashSet<Long> set = new HashSet<Long>(); /*
* First make the views consistent. Sometimes peers will have
* different zxids for a server depending on timing.
*/
for (Map.Entry<Long,Vote> entry : votes.entrySet()) {
if (vote.equals(entry.getValue())){
set.add(entry.getKey());
}
} return self.getQuorumVerifier().containsQuorum(set);
} public boolean containsQuorum(HashSet<Long> set){
return (set.size() > half);
}
6.服务c启动后,发现集群已经选出leader为服务b;
7.下面的代码是选举过程中最重要的代码:
public Vote lookForLeader() throws InterruptedException {
try {
self.jmxLeaderElectionBean = new LeaderElectionBean();
MBeanRegistry.getInstance().register(
self.jmxLeaderElectionBean, self.jmxLocalPeerBean);
} catch (Exception e) {
LOG.warn("Failed to register with JMX", e);
self.jmxLeaderElectionBean = null;
}
if (self.start_fle == 0) {
self.start_fle = System.currentTimeMillis();
}
try {
HashMap<Long, Vote> recvset = new HashMap<Long, Vote>();
HashMap<Long, Vote> outofelection = new HashMap<Long, Vote>();
int notTimeout = finalizeWait;
synchronized(this){
logicalclock++;
updateProposal(getInitId(), getInitLastLoggedZxid(), getPeerEpoch());
}
LOG.info("New election. My id = " + self.getId() +
", proposed zxid=0x" + Long.toHexString(proposedZxid));
sendNotifications();
/*
* Loop in which we exchange notifications until we find a leader
*/
while ((self.getPeerState() == ServerState.LOOKING) &&
(!stop)){
/*
* Remove next notification from queue, times out after 2 times
* the termination time
*/
Notification n = recvqueue.poll(notTimeout,
TimeUnit.MILLISECONDS);
/*
* Sends more notifications if haven't received enough.
* Otherwise processes new notification.
*/
if(n == null){
if(manager.haveDelivered()){
sendNotifications();
} else {
manager.connectAll();
}
/*
* Exponential backoff
*/
int tmpTimeOut = notTimeout*2;
notTimeout = (tmpTimeOut < maxNotificationInterval?
tmpTimeOut : maxNotificationInterval);
LOG.info("Notification time out: " + notTimeout);
}
else if(self.getVotingView().containsKey(n.sid)) {
/*
* Only proceed if the vote comes from a replica in the
* voting view.
*/
switch (n.state) {
case LOOKING:
System.out.println("receive message : electionEpoch,logicalclock"+n.electionEpoch+":"+logicalclock);
// If notification > current, replace and send messages out
if (n.electionEpoch > logicalclock) {
logicalclock = n.electionEpoch;
recvset.clear();
if(totalOrderPredicate(n.leader, n.zxid, n.peerEpoch,
getInitId(), getInitLastLoggedZxid(), getPeerEpoch())) {
updateProposal(n.leader, n.zxid, n.peerEpoch);
} else {
updateProposal(getInitId(),
getInitLastLoggedZxid(),
getPeerEpoch());
}
sendNotifications();
} else if (n.electionEpoch < logicalclock) {
if(LOG.isDebugEnabled()){
LOG.debug("Notification election epoch is smaller than logicalclock. n.electionEpoch = 0x"
+ Long.toHexString(n.electionEpoch)
+ ", logicalclock=0x" + Long.toHexString(logicalclock));
}
break;
} else if (totalOrderPredicate(n.leader, n.zxid, n.peerEpoch,
proposedLeader, proposedZxid, proposedEpoch)) {
updateProposal(n.leader, n.zxid, n.peerEpoch);
sendNotifications();
}
if(LOG.isDebugEnabled()){
LOG.debug("Adding vote: from=" + n.sid +
", proposed leader=" + n.leader +
", proposed zxid=0x" + Long.toHexString(n.zxid) +
", proposed election epoch=0x" + Long.toHexString(n.electionEpoch));
}
recvset.put(n.sid, new Vote(n.leader, n.zxid, n.electionEpoch, n.peerEpoch));
if (termPredicate(recvset,
new Vote(proposedLeader, proposedZxid,
logicalclock, proposedEpoch))) {
// Verify if there is any change in the proposed leader
while((n = recvqueue.poll(finalizeWait,
TimeUnit.MILLISECONDS)) != null){
if(totalOrderPredicate(n.leader, n.zxid, n.peerEpoch,
proposedLeader, proposedZxid, proposedEpoch)){
recvqueue.put(n);
break;
}
}
/*
* This predicate is true once we don't read any new
* relevant message from the reception queue
*/
if (n == null) {
self.setPeerState((proposedLeader == self.getId()) ?
ServerState.LEADING: learningState());
Vote endVote = new Vote(proposedLeader,
proposedZxid,
logicalclock,
proposedEpoch);
leaveInstance(endVote);
return endVote;
}
}
break;
case OBSERVING:
LOG.debug("Notification from observer: " + n.sid);
break;
case FOLLOWING:
case LEADING:
/*
* Consider all notifications from the same epoch
* together.
*/
if(n.electionEpoch == logicalclock){
recvset.put(n.sid, new Vote(n.leader,
n.zxid,
n.electionEpoch,
n.peerEpoch));
if(ooePredicate(recvset, outofelection, n)) {
self.setPeerState((n.leader == self.getId()) ?
ServerState.LEADING: learningState());
Vote endVote = new Vote(n.leader,
n.zxid,
n.electionEpoch,
n.peerEpoch);
leaveInstance(endVote);
return endVote;
}
}
/*
* Before joining an established ensemble, verify
* a majority is following the same leader.
*/
outofelection.put(n.sid, new Vote(n.version,
n.leader,
n.zxid,
n.electionEpoch,
n.peerEpoch,
n.state));
if(ooePredicate(outofelection, outofelection, n)) {
synchronized(this){
logicalclock = n.electionEpoch;
self.setPeerState((n.leader == self.getId()) ?
ServerState.LEADING: learningState());
}
Vote endVote = new Vote(n.leader,
n.zxid,
n.electionEpoch,
n.peerEpoch);
leaveInstance(endVote);
return endVote;
}
break;
default:
LOG.warn("Notification state unrecognized: {} (n.state), {} (n.sid)",
n.state, n.sid);
break;
}
} else {
LOG.warn("Ignoring notification from non-cluster member " + n.sid);
}
}
return null;
} finally {
try {
if(self.jmxLeaderElectionBean != null){
MBeanRegistry.getInstance().unregister(
self.jmxLeaderElectionBean);
}
} catch (Exception e) {
LOG.warn("Failed to unregister with JMX", e);
}
self.jmxLeaderElectionBean = null;
}
}
zookeeper选举代码分析的更多相关文章
- Zookeeper ZAB 协议分析[转]
写在开始:这是我找到一篇比较好的博客,转载到这来进行备份原文参考: Zookeeper ZAB 协议分析 前言 ZAB 协议是为分布式协调服务 ZooKeeper 专门设计的一种支持崩溃恢复的原子广播 ...
- zookeeper源码分析之五服务端(集群leader)处理请求流程
leader的实现类为LeaderZooKeeperServer,它间接继承自标准ZookeeperServer.它规定了请求到达leader时需要经历的路径: PrepRequestProcesso ...
- zookeeper源码分析之四服务端(单机)处理请求流程
上文: zookeeper源码分析之一服务端启动过程 中,我们介绍了zookeeper服务器的启动过程,其中单机是ZookeeperServer启动,集群使用QuorumPeer启动,那么这次我们分析 ...
- 完整全面的Java资源库(包括构建、操作、代码分析、编译器、数据库、社区等等)
构建 这里搜集了用来构建应用程序的工具. Apache Maven:Maven使用声明进行构建并进行依赖管理,偏向于使用约定而不是配置进行构建.Maven优于Apache Ant.后者采用了一种过程化 ...
- Zookeeper 源码分析-启动
Zookeeper 源码分析-启动 博客分类: Zookeeper 本文主要介绍了zookeeper启动的过程 运行zkServer.sh start命令可以启动zookeeper.入口的main ...
- 学习笔记:Zookeeper选举机制
1.Zookeeper选举机制 Zookeeper虽然在配置文件中并没有指定master和slave 但是,zookeeper工作时,是有一个节点为leader,其他则为follower Leader ...
- 一个简单的"RPC框架"代码分析
0,服务接口定义---Echo.java /* * 定义了服务器提供的服务类型 */ public interface Echo { public String echo(String string) ...
- Zookeeper选举算法原理
Zookeeper选举算法原理 Leader选举 Leader选举是保证分布式数据一致性的关键所在.当Zookeeper集群中的一台服务器出现以下两种情况之一时,需要进入Leader选举. (1) 服 ...
- 分布式系统的一致性级别划分及Zookeeper一致性级别分析
最近在研究分布式系统的一些理论概念,例如关于分布式系统一致性的讨论,看了一些文章我有一些不解.大多数对分布式系统一致性的划分是将其分为三类:强一致性,顺序一致性以及弱一致性.强一致性(Strict C ...
随机推荐
- win7下.NET 2.0未在web服务器上注册的问题(转)
转自:http://blog.sina.com.cn/s/blog_6d15b547010192hx.html 电脑装了win7操作系统,装上vs2008后运行dotnetnuke项目后出现" ...
- 矩形嵌套问题-ACM集训
参考 http://blog.csdn.net/xujinsmile/article/details/7861412 有n个矩形,每个矩形可以用a,b来描述,表示长和宽.矩形X(a,b)可以嵌套在矩形 ...
- 滚动条响应鼠标滑轮事件实现上下滚动的js代码
<script type="text/javascript"> var scrollFunc=function(e){ e=e || window.event; if( ...
- phpcms 如何获取文章
请求地址http://127.0.0.1/phpcms/index.php?m=content&c=index&a=show&catid=6&id=8 先来判断地址对应 ...
- php生成缩略图
<?php /** * 生成缩略图函数(支持图片格式:gif.jpeg.png和bmp) * @author ruxing.li * @param string $src 源图片路径 * @pa ...
- NodeJs开发学习目录
1.Nodejs基本概念及Nodejs.npm安装测试[2014-06-06] 2.开发工具简介(主要介绍Sublime Text使用) [2014-06-06] 3.Sublime text插件安装 ...
- ps的使用方法
1.打开原图素材,Ctrl + J把背景图层复制一层,按Ctrl + Shift + U去色,执行:滤镜 > 模糊 > 高斯模糊,数值4,图层混合模式为滤色,图层不透明度改为27%. 2. ...
- BZOJ 1257 余数之和
Description 给出正整数\(n\)和\(k\),计算\(j(n, k)=k\;mod\;1\;+\;k\;mod\;2\;+\;k\;mod\;3\;+\;-\;+\;k\;mod\;n\) ...
- XML CDATA
/* <![CDATA[ */var mv_dynamic_to_top = {"text":"To Top","version":& ...
- cf C Bear and Prime Numbers
题意:给你一个n,输入n个数,然后输入m,接下来有m个询问,每一个询问为[l,r],然后输出在区间内[l,r]内f(p)的和,p为[l,r]的素数,f(p)的含义为在n个数中是p的倍数的个数. 思路: ...