zookeeper选举代码分析
本文将以zookeeper的3.4.6版本作为源码分析版本。主要的代码类包括QuorumPeerMain、QuorumPeer、FastLeaderElection、QuorumMaj等。
假设有a,b,c三个zookeeper服务,serverid分别是1、2、3:
1.先启动集群中的a服务,先投票自己a为leader,并将投票信息发送给自己;
QuorumPeerMain对象调用QuorumPeer线程的startLeaderElection方法,最终调用FastLeaderElection的lookForLeader方法
2.再启动集群中的b服务,先投票自己b为leader,并将投票信息发送给自己和a服务;
3.a收到b的投票信息,服务a通过对比算法,选择b为leader,并通知自己和服务b;
protected boolean totalOrderPredicate(long newId, long newZxid, long newEpoch, long curId, long curZxid, long curEpoch) {
LOG.debug("id: " + newId + ", proposed id: " + curId + ", zxid: 0x" +
Long.toHexString(newZxid) + ", proposed zxid: 0x" + Long.toHexString(curZxid));
if(self.getQuorumVerifier().getWeight(newId) == 0){
return false;
}
/*
* We return true if one of the following three cases hold:
* 1- New epoch is higher
* 2- New epoch is the same as current epoch, but new zxid is higher
* 3- New epoch is the same as current epoch, new zxid is the same
* as current zxid, but server id is higher.
*/
return ((newEpoch > curEpoch) ||
((newEpoch == curEpoch) &&
((newZxid > curZxid) || ((newZxid == curZxid) && (newId > curId)))));
}
4.服务a通过验证算法,确认b为leader;
5.服务b收到a的通知信息,通过验证算法b确认自身为leader;
protected boolean termPredicate(
HashMap<Long, Vote> votes,
Vote vote) { HashSet<Long> set = new HashSet<Long>(); /*
* First make the views consistent. Sometimes peers will have
* different zxids for a server depending on timing.
*/
for (Map.Entry<Long,Vote> entry : votes.entrySet()) {
if (vote.equals(entry.getValue())){
set.add(entry.getKey());
}
} return self.getQuorumVerifier().containsQuorum(set);
} public boolean containsQuorum(HashSet<Long> set){
return (set.size() > half);
}
6.服务c启动后,发现集群已经选出leader为服务b;
7.下面的代码是选举过程中最重要的代码:
public Vote lookForLeader() throws InterruptedException {
try {
self.jmxLeaderElectionBean = new LeaderElectionBean();
MBeanRegistry.getInstance().register(
self.jmxLeaderElectionBean, self.jmxLocalPeerBean);
} catch (Exception e) {
LOG.warn("Failed to register with JMX", e);
self.jmxLeaderElectionBean = null;
}
if (self.start_fle == 0) {
self.start_fle = System.currentTimeMillis();
}
try {
HashMap<Long, Vote> recvset = new HashMap<Long, Vote>();
HashMap<Long, Vote> outofelection = new HashMap<Long, Vote>();
int notTimeout = finalizeWait;
synchronized(this){
logicalclock++;
updateProposal(getInitId(), getInitLastLoggedZxid(), getPeerEpoch());
}
LOG.info("New election. My id = " + self.getId() +
", proposed zxid=0x" + Long.toHexString(proposedZxid));
sendNotifications();
/*
* Loop in which we exchange notifications until we find a leader
*/
while ((self.getPeerState() == ServerState.LOOKING) &&
(!stop)){
/*
* Remove next notification from queue, times out after 2 times
* the termination time
*/
Notification n = recvqueue.poll(notTimeout,
TimeUnit.MILLISECONDS);
/*
* Sends more notifications if haven't received enough.
* Otherwise processes new notification.
*/
if(n == null){
if(manager.haveDelivered()){
sendNotifications();
} else {
manager.connectAll();
}
/*
* Exponential backoff
*/
int tmpTimeOut = notTimeout*2;
notTimeout = (tmpTimeOut < maxNotificationInterval?
tmpTimeOut : maxNotificationInterval);
LOG.info("Notification time out: " + notTimeout);
}
else if(self.getVotingView().containsKey(n.sid)) {
/*
* Only proceed if the vote comes from a replica in the
* voting view.
*/
switch (n.state) {
case LOOKING:
System.out.println("receive message : electionEpoch,logicalclock"+n.electionEpoch+":"+logicalclock);
// If notification > current, replace and send messages out
if (n.electionEpoch > logicalclock) {
logicalclock = n.electionEpoch;
recvset.clear();
if(totalOrderPredicate(n.leader, n.zxid, n.peerEpoch,
getInitId(), getInitLastLoggedZxid(), getPeerEpoch())) {
updateProposal(n.leader, n.zxid, n.peerEpoch);
} else {
updateProposal(getInitId(),
getInitLastLoggedZxid(),
getPeerEpoch());
}
sendNotifications();
} else if (n.electionEpoch < logicalclock) {
if(LOG.isDebugEnabled()){
LOG.debug("Notification election epoch is smaller than logicalclock. n.electionEpoch = 0x"
+ Long.toHexString(n.electionEpoch)
+ ", logicalclock=0x" + Long.toHexString(logicalclock));
}
break;
} else if (totalOrderPredicate(n.leader, n.zxid, n.peerEpoch,
proposedLeader, proposedZxid, proposedEpoch)) {
updateProposal(n.leader, n.zxid, n.peerEpoch);
sendNotifications();
}
if(LOG.isDebugEnabled()){
LOG.debug("Adding vote: from=" + n.sid +
", proposed leader=" + n.leader +
", proposed zxid=0x" + Long.toHexString(n.zxid) +
", proposed election epoch=0x" + Long.toHexString(n.electionEpoch));
}
recvset.put(n.sid, new Vote(n.leader, n.zxid, n.electionEpoch, n.peerEpoch));
if (termPredicate(recvset,
new Vote(proposedLeader, proposedZxid,
logicalclock, proposedEpoch))) {
// Verify if there is any change in the proposed leader
while((n = recvqueue.poll(finalizeWait,
TimeUnit.MILLISECONDS)) != null){
if(totalOrderPredicate(n.leader, n.zxid, n.peerEpoch,
proposedLeader, proposedZxid, proposedEpoch)){
recvqueue.put(n);
break;
}
}
/*
* This predicate is true once we don't read any new
* relevant message from the reception queue
*/
if (n == null) {
self.setPeerState((proposedLeader == self.getId()) ?
ServerState.LEADING: learningState());
Vote endVote = new Vote(proposedLeader,
proposedZxid,
logicalclock,
proposedEpoch);
leaveInstance(endVote);
return endVote;
}
}
break;
case OBSERVING:
LOG.debug("Notification from observer: " + n.sid);
break;
case FOLLOWING:
case LEADING:
/*
* Consider all notifications from the same epoch
* together.
*/
if(n.electionEpoch == logicalclock){
recvset.put(n.sid, new Vote(n.leader,
n.zxid,
n.electionEpoch,
n.peerEpoch));
if(ooePredicate(recvset, outofelection, n)) {
self.setPeerState((n.leader == self.getId()) ?
ServerState.LEADING: learningState());
Vote endVote = new Vote(n.leader,
n.zxid,
n.electionEpoch,
n.peerEpoch);
leaveInstance(endVote);
return endVote;
}
}
/*
* Before joining an established ensemble, verify
* a majority is following the same leader.
*/
outofelection.put(n.sid, new Vote(n.version,
n.leader,
n.zxid,
n.electionEpoch,
n.peerEpoch,
n.state));
if(ooePredicate(outofelection, outofelection, n)) {
synchronized(this){
logicalclock = n.electionEpoch;
self.setPeerState((n.leader == self.getId()) ?
ServerState.LEADING: learningState());
}
Vote endVote = new Vote(n.leader,
n.zxid,
n.electionEpoch,
n.peerEpoch);
leaveInstance(endVote);
return endVote;
}
break;
default:
LOG.warn("Notification state unrecognized: {} (n.state), {} (n.sid)",
n.state, n.sid);
break;
}
} else {
LOG.warn("Ignoring notification from non-cluster member " + n.sid);
}
}
return null;
} finally {
try {
if(self.jmxLeaderElectionBean != null){
MBeanRegistry.getInstance().unregister(
self.jmxLeaderElectionBean);
}
} catch (Exception e) {
LOG.warn("Failed to unregister with JMX", e);
}
self.jmxLeaderElectionBean = null;
}
}
zookeeper选举代码分析的更多相关文章
- Zookeeper ZAB 协议分析[转]
写在开始:这是我找到一篇比较好的博客,转载到这来进行备份原文参考: Zookeeper ZAB 协议分析 前言 ZAB 协议是为分布式协调服务 ZooKeeper 专门设计的一种支持崩溃恢复的原子广播 ...
- zookeeper源码分析之五服务端(集群leader)处理请求流程
leader的实现类为LeaderZooKeeperServer,它间接继承自标准ZookeeperServer.它规定了请求到达leader时需要经历的路径: PrepRequestProcesso ...
- zookeeper源码分析之四服务端(单机)处理请求流程
上文: zookeeper源码分析之一服务端启动过程 中,我们介绍了zookeeper服务器的启动过程,其中单机是ZookeeperServer启动,集群使用QuorumPeer启动,那么这次我们分析 ...
- 完整全面的Java资源库(包括构建、操作、代码分析、编译器、数据库、社区等等)
构建 这里搜集了用来构建应用程序的工具. Apache Maven:Maven使用声明进行构建并进行依赖管理,偏向于使用约定而不是配置进行构建.Maven优于Apache Ant.后者采用了一种过程化 ...
- Zookeeper 源码分析-启动
Zookeeper 源码分析-启动 博客分类: Zookeeper 本文主要介绍了zookeeper启动的过程 运行zkServer.sh start命令可以启动zookeeper.入口的main ...
- 学习笔记:Zookeeper选举机制
1.Zookeeper选举机制 Zookeeper虽然在配置文件中并没有指定master和slave 但是,zookeeper工作时,是有一个节点为leader,其他则为follower Leader ...
- 一个简单的"RPC框架"代码分析
0,服务接口定义---Echo.java /* * 定义了服务器提供的服务类型 */ public interface Echo { public String echo(String string) ...
- Zookeeper选举算法原理
Zookeeper选举算法原理 Leader选举 Leader选举是保证分布式数据一致性的关键所在.当Zookeeper集群中的一台服务器出现以下两种情况之一时,需要进入Leader选举. (1) 服 ...
- 分布式系统的一致性级别划分及Zookeeper一致性级别分析
最近在研究分布式系统的一些理论概念,例如关于分布式系统一致性的讨论,看了一些文章我有一些不解.大多数对分布式系统一致性的划分是将其分为三类:强一致性,顺序一致性以及弱一致性.强一致性(Strict C ...
随机推荐
- X3850 Linux 下DSA日志收集办法
收集工具下载 RHEL 6: 32bit-- [IBM 下载]http://delivery04.dhe.ibm.com/sar/CMA/XSA/03tza/1/ibm_utl_dsa_dsytb7x ...
- 总结Widows 7 Start->Run 命令
Widows + R基本上成为很常用的方式,那么通过Windows + R我们可以在运行中做什么手脚呢. 下面从最基本的系统命令说起 notepad--------打开记事本 services. ...
- ROW_NUMBER分页的注意事项
之前在使用ROW_NUMBER分页获取数据的时候,直接用ROW_NUMBER里的SELECT语句查出了所有的数据. like this: select * from ( select row_numb ...
- width:100%缩小窗口时背景图片出现空白bug
页面容器(#wrap)与页面头部(#header )为100%宽度.而内容的容器(#page)为固定宽度960px.浏览窗口缩小而小于内容层宽度时会产生宽度理解上的差异.如下图所示窗口宽度大于内容层宽 ...
- python内置字符串操作方法
1.capitalize() S.capitalize()->string 首字母大写,其余字母小写. str='A222aaA' str.capitalize()#首字母大写,其余字母小写. ...
- 如何在Android Studio中使用Gradle发布项目至Jcenter仓库
简述 目前非常流行将开源库上传至Jcenter仓库中,使用起来非常方便且易于维护,特别是在Android Studio环境中,只需几步配置就可以轻松实现上传和发布. Library的转换和引用 博主的 ...
- 【转】Spring注解详解
http://blog.csdn.net/xyh820/article/details/7303330/ 概述 注释配置相对于 XML 配置具有很多的优势: 它可以充分利用 Java 的反射机制获取类 ...
- gulp配置browserify多入口
需要 var es = require('event-stream'); gulp.task('browserify', function(){ var files = [ { fpath: './j ...
- Maven实战六
转载:http://www.iteye.com/topic/1132509 一.简介 settings.xml对于maven来说相当于全局性的配置,用于所有的项目,当Maven运行过程中的各种配置,例 ...
- Linux下安装软件心得
1 软件安装方法: 源代码编译安装:tar.gz等压缩格式,需要经过手动编译,./configure,make ,make install ,然后进行配置操作 二进制安装:tar.gz等压缩格式,解压 ...