本文将以zookeeper的3.4.6版本作为源码分析版本。主要的代码类包括QuorumPeerMain、QuorumPeer、FastLeaderElection、QuorumMaj等。

假设有a,b,c三个zookeeper服务,serverid分别是1、2、3:

1.先启动集群中的a服务,先投票自己a为leader,并将投票信息发送给自己;

QuorumPeerMain对象调用QuorumPeer线程的startLeaderElection方法,最终调用FastLeaderElection的lookForLeader方法

2.再启动集群中的b服务,先投票自己b为leader,并将投票信息发送给自己和a服务;

3.a收到b的投票信息,服务a通过对比算法,选择b为leader,并通知自己和服务b;

    protected boolean totalOrderPredicate(long newId, long newZxid, long newEpoch, long curId, long curZxid, long curEpoch) {
LOG.debug("id: " + newId + ", proposed id: " + curId + ", zxid: 0x" +
Long.toHexString(newZxid) + ", proposed zxid: 0x" + Long.toHexString(curZxid));
if(self.getQuorumVerifier().getWeight(newId) == 0){
return false;
} /*
* We return true if one of the following three cases hold:
* 1- New epoch is higher
* 2- New epoch is the same as current epoch, but new zxid is higher
* 3- New epoch is the same as current epoch, new zxid is the same
* as current zxid, but server id is higher.
*/ return ((newEpoch > curEpoch) ||
((newEpoch == curEpoch) &&
((newZxid > curZxid) || ((newZxid == curZxid) && (newId > curId)))));
}

4.服务a通过验证算法,确认b为leader;

5.服务b收到a的通知信息,通过验证算法b确认自身为leader;

    protected boolean termPredicate(
HashMap<Long, Vote> votes,
Vote vote) { HashSet<Long> set = new HashSet<Long>(); /*
* First make the views consistent. Sometimes peers will have
* different zxids for a server depending on timing.
*/
for (Map.Entry<Long,Vote> entry : votes.entrySet()) {
if (vote.equals(entry.getValue())){
set.add(entry.getKey());
}
} return self.getQuorumVerifier().containsQuorum(set);
} public boolean containsQuorum(HashSet<Long> set){
return (set.size() > half);
}

6.服务c启动后,发现集群已经选出leader为服务b;

7.下面的代码是选举过程中最重要的代码:

    public Vote lookForLeader() throws InterruptedException {
try {
self.jmxLeaderElectionBean = new LeaderElectionBean();
MBeanRegistry.getInstance().register(
self.jmxLeaderElectionBean, self.jmxLocalPeerBean);
} catch (Exception e) {
LOG.warn("Failed to register with JMX", e);
self.jmxLeaderElectionBean = null;
}
if (self.start_fle == 0) {
self.start_fle = System.currentTimeMillis();
}
try {
HashMap<Long, Vote> recvset = new HashMap<Long, Vote>(); HashMap<Long, Vote> outofelection = new HashMap<Long, Vote>(); int notTimeout = finalizeWait; synchronized(this){
logicalclock++;
updateProposal(getInitId(), getInitLastLoggedZxid(), getPeerEpoch());
} LOG.info("New election. My id = " + self.getId() +
", proposed zxid=0x" + Long.toHexString(proposedZxid));
sendNotifications(); /*
* Loop in which we exchange notifications until we find a leader
*/ while ((self.getPeerState() == ServerState.LOOKING) &&
(!stop)){
/*
* Remove next notification from queue, times out after 2 times
* the termination time
*/
Notification n = recvqueue.poll(notTimeout,
TimeUnit.MILLISECONDS); /*
* Sends more notifications if haven't received enough.
* Otherwise processes new notification.
*/
if(n == null){
if(manager.haveDelivered()){
sendNotifications();
} else {
manager.connectAll();
} /*
* Exponential backoff
*/
int tmpTimeOut = notTimeout*2;
notTimeout = (tmpTimeOut < maxNotificationInterval?
tmpTimeOut : maxNotificationInterval);
LOG.info("Notification time out: " + notTimeout);
}
else if(self.getVotingView().containsKey(n.sid)) {
/*
* Only proceed if the vote comes from a replica in the
* voting view.
*/
switch (n.state) {
case LOOKING:
System.out.println("receive message : electionEpoch,logicalclock"+n.electionEpoch+":"+logicalclock);
// If notification > current, replace and send messages out
if (n.electionEpoch > logicalclock) {
logicalclock = n.electionEpoch;
recvset.clear();
if(totalOrderPredicate(n.leader, n.zxid, n.peerEpoch,
getInitId(), getInitLastLoggedZxid(), getPeerEpoch())) {
updateProposal(n.leader, n.zxid, n.peerEpoch);
} else {
updateProposal(getInitId(),
getInitLastLoggedZxid(),
getPeerEpoch());
}
sendNotifications();
} else if (n.electionEpoch < logicalclock) {
if(LOG.isDebugEnabled()){
LOG.debug("Notification election epoch is smaller than logicalclock. n.electionEpoch = 0x"
+ Long.toHexString(n.electionEpoch)
+ ", logicalclock=0x" + Long.toHexString(logicalclock));
}
break;
} else if (totalOrderPredicate(n.leader, n.zxid, n.peerEpoch,
proposedLeader, proposedZxid, proposedEpoch)) {
updateProposal(n.leader, n.zxid, n.peerEpoch);
sendNotifications();
} if(LOG.isDebugEnabled()){
LOG.debug("Adding vote: from=" + n.sid +
", proposed leader=" + n.leader +
", proposed zxid=0x" + Long.toHexString(n.zxid) +
", proposed election epoch=0x" + Long.toHexString(n.electionEpoch));
} recvset.put(n.sid, new Vote(n.leader, n.zxid, n.electionEpoch, n.peerEpoch)); if (termPredicate(recvset,
new Vote(proposedLeader, proposedZxid,
logicalclock, proposedEpoch))) { // Verify if there is any change in the proposed leader
while((n = recvqueue.poll(finalizeWait,
TimeUnit.MILLISECONDS)) != null){
if(totalOrderPredicate(n.leader, n.zxid, n.peerEpoch,
proposedLeader, proposedZxid, proposedEpoch)){
recvqueue.put(n);
break;
}
} /*
* This predicate is true once we don't read any new
* relevant message from the reception queue
*/
if (n == null) {
self.setPeerState((proposedLeader == self.getId()) ?
ServerState.LEADING: learningState()); Vote endVote = new Vote(proposedLeader,
proposedZxid,
logicalclock,
proposedEpoch);
leaveInstance(endVote);
return endVote;
}
}
break;
case OBSERVING:
LOG.debug("Notification from observer: " + n.sid);
break;
case FOLLOWING:
case LEADING:
/*
* Consider all notifications from the same epoch
* together.
*/
if(n.electionEpoch == logicalclock){
recvset.put(n.sid, new Vote(n.leader,
n.zxid,
n.electionEpoch,
n.peerEpoch)); if(ooePredicate(recvset, outofelection, n)) {
self.setPeerState((n.leader == self.getId()) ?
ServerState.LEADING: learningState()); Vote endVote = new Vote(n.leader,
n.zxid,
n.electionEpoch,
n.peerEpoch);
leaveInstance(endVote);
return endVote;
}
} /*
* Before joining an established ensemble, verify
* a majority is following the same leader.
*/
outofelection.put(n.sid, new Vote(n.version,
n.leader,
n.zxid,
n.electionEpoch,
n.peerEpoch,
n.state)); if(ooePredicate(outofelection, outofelection, n)) {
synchronized(this){
logicalclock = n.electionEpoch;
self.setPeerState((n.leader == self.getId()) ?
ServerState.LEADING: learningState());
}
Vote endVote = new Vote(n.leader,
n.zxid,
n.electionEpoch,
n.peerEpoch);
leaveInstance(endVote);
return endVote;
}
break;
default:
LOG.warn("Notification state unrecognized: {} (n.state), {} (n.sid)",
n.state, n.sid);
break;
}
} else {
LOG.warn("Ignoring notification from non-cluster member " + n.sid);
}
}
return null;
} finally {
try {
if(self.jmxLeaderElectionBean != null){
MBeanRegistry.getInstance().unregister(
self.jmxLeaderElectionBean);
}
} catch (Exception e) {
LOG.warn("Failed to unregister with JMX", e);
}
self.jmxLeaderElectionBean = null;
}
}

  

zookeeper选举代码分析的更多相关文章

  1. Zookeeper ZAB 协议分析[转]

    写在开始:这是我找到一篇比较好的博客,转载到这来进行备份原文参考: Zookeeper ZAB 协议分析 前言 ZAB 协议是为分布式协调服务 ZooKeeper 专门设计的一种支持崩溃恢复的原子广播 ...

  2. zookeeper源码分析之五服务端(集群leader)处理请求流程

    leader的实现类为LeaderZooKeeperServer,它间接继承自标准ZookeeperServer.它规定了请求到达leader时需要经历的路径: PrepRequestProcesso ...

  3. zookeeper源码分析之四服务端(单机)处理请求流程

    上文: zookeeper源码分析之一服务端启动过程 中,我们介绍了zookeeper服务器的启动过程,其中单机是ZookeeperServer启动,集群使用QuorumPeer启动,那么这次我们分析 ...

  4. 完整全面的Java资源库(包括构建、操作、代码分析、编译器、数据库、社区等等)

    构建 这里搜集了用来构建应用程序的工具. Apache Maven:Maven使用声明进行构建并进行依赖管理,偏向于使用约定而不是配置进行构建.Maven优于Apache Ant.后者采用了一种过程化 ...

  5. Zookeeper 源码分析-启动

    Zookeeper 源码分析-启动 博客分类: Zookeeper   本文主要介绍了zookeeper启动的过程 运行zkServer.sh start命令可以启动zookeeper.入口的main ...

  6. 学习笔记:Zookeeper选举机制

    1.Zookeeper选举机制 Zookeeper虽然在配置文件中并没有指定master和slave 但是,zookeeper工作时,是有一个节点为leader,其他则为follower Leader ...

  7. 一个简单的"RPC框架"代码分析

    0,服务接口定义---Echo.java /* * 定义了服务器提供的服务类型 */ public interface Echo { public String echo(String string) ...

  8. Zookeeper选举算法原理

    Zookeeper选举算法原理 Leader选举 Leader选举是保证分布式数据一致性的关键所在.当Zookeeper集群中的一台服务器出现以下两种情况之一时,需要进入Leader选举. (1) 服 ...

  9. 分布式系统的一致性级别划分及Zookeeper一致性级别分析

    最近在研究分布式系统的一些理论概念,例如关于分布式系统一致性的讨论,看了一些文章我有一些不解.大多数对分布式系统一致性的划分是将其分为三类:强一致性,顺序一致性以及弱一致性.强一致性(Strict C ...

随机推荐

  1. qrcode-php生成二维码

    调用PHP QR Code非常简单,如下代码即可生成一张内容为"http://www.baidu.com"的二维码. include 'phpqrcode.php'; QRcode ...

  2. Cocos Studio1.5.0.1开发学习笔记(一)

    听说Cocos Studio很久了,主要是因为骨骼动画.目前看来Cocos2d-x播放动画的方式只有2种: 第一种:是播放序列帧动画,即将动画的每一帧都加载进缓存里,需要播放时再使用Animation ...

  3. Extjs4.2.1学习笔记[更新]

    心血来潮准备学习一下Extjs,就从官方网站http://extjs.org.cn/下载了最新版本4.2.1,开始从头学习,记一下笔记,让自己能够持之以恒. 先说一下基本文件类库引用吧, 每个项目一开 ...

  4. 防止mysql注入

    function check($sql_str) { $checks=eregi('select|insert|update|delete|\'|\/|\\\|\*|\.|union|into|loa ...

  5. Day14 HTML补充

    一.认识前端 前端开发的核心语言: html - 超文本标记语言 结构 css - 层叠样式表 样式 javascript - 脚本语言 行为 <html></html> 双标 ...

  6. C# 正则表达式、Json

    正则表达式: 正则表达式主要的参考文章:http://www.cnblogs.com/stg609/archive/2009/06/03/1492709.html#anchorD. 需求:将cocos ...

  7. Linux下软件的安装

    想必linux新手刚开始对于linux软件安装很茫然吧,不知到怎么安装,软件到底安装在哪里,如果我需要删除软件怎么删除,配置文件到哪里去找. 想学习linux的话,最快上手的应该是Ubuntu,它特有 ...

  8. Oracle问题解决(sqlplus无法登陆)

    命令行 sqlplus 无法登陆,常常是用户名/密码错误.监听配置错误或未启动.数据库服务名丢失等等原因. 用户名/密码错误 找到自己设的密码 这全靠自己创建数据库实例时,备份或记住相关信息 若最后没 ...

  9. Linux SendMail 使用外部SMTP服务发送邮件

    这个今天刚好用到,就测试了一下.OK了..因为,PYTHON模块是可以,但SHELL脚本用SHELL发,还是合拍点.. http://my.oschina.net/duangr/blog/183162 ...

  10. Java super与this用法解析

    1.     子类的构造函数如果要引用super的话,必须把super放在函数的首位. class Base { Base() { System.out.println("Base" ...