Apache Kafka源码分析 – Replica and Partition
Replica
对于local replica, 需要记录highWatermarkValue,表示当前已经committed的数据
对于remote replica,需要记录logEndOffsetValue以及更新的时间
package kafka.cluster class Replica(val brokerId: Int,
val partition: Partition,
time: Time = SystemTime,
initialHighWatermarkValue: Long = 0L,
val log: Option[Log] = None) extends Logging {
//only defined in local replica
private[this] var highWatermarkValue: AtomicLong = new AtomicLong(initialHighWatermarkValue)
// only used for remote replica; logEndOffsetValue for local replica is kept in log
private[this] var logEndOffsetValue = new AtomicLong(ReplicaManager.UnknownLogEndOffset)
private[this] var logEndOffsetUpdateTimeMsValue: AtomicLong = new AtomicLong(time.milliseconds)
val topic = partition.topic
val partitionId = partition.partitionId
}
Partition
主要用于管理leader,ISR,AR
package kafka.cluster /**
* Data structure that represents a topic partition. The leader maintains the AR, ISR, CUR, RAR
*/
class Partition(val topic: String,
val partitionId: Int,
var replicationFactor: Int,
time: Time,
val replicaManager: ReplicaManager) extends Logging with KafkaMetricsGroup {
private val localBrokerId = replicaManager.config.brokerId //当前的brokerId
private val logManager = replicaManager.logManager
private val zkClient = replicaManager.zkClient
var leaderReplicaIdOpt: Option[Int] = None //leaderReplica的id
var inSyncReplicas: Set[Replica] = Set.empty[Replica] //ISR
private val assignedReplicaMap = new Pool[Int,Replica] //AR,往往是大于等于ISR的
private val leaderIsrUpdateLock = new Object
private var zkVersion: Int = LeaderAndIsr.initialZKVersion
private var leaderEpoch: Int = LeaderAndIsr.initialLeaderEpoch - 1
/* Epoch of the controller that last changed the leader. This needs to be initialized correctly upon broker startup.
* One way of doing that is through the controller's start replica state change command. When a new broker starts up
* the controller sends it a start replica command containing the leader for each partition that the broker hosts.
* In addition to the leader, the controller can also send the epoch of the controller that elected the leader for
* each partition. */
private var controllerEpoch: Int = KafkaController.InitialControllerEpoch - 1
}
getOrCreateReplica
def getOrCreateReplica(replicaId: Int = localBrokerId): Replica = {
val replicaOpt = getReplica(replicaId)
replicaOpt match {
case Some(replica) => replica
case None => //需要create
if (isReplicaLocal(replicaId)) {
val config = LogConfig.fromProps(logManager.defaultConfig.toProps, AdminUtils.fetchTopicConfig(zkClient, topic))
val log = logManager.createLog(TopicAndPartition(topic, partitionId), config) //创建logfile
val checkpoint = replicaManager.highWatermarkCheckpoints(log.dir.getParent) //试图读取HW的checkpoint
val offsetMap = checkpoint.read
if (!offsetMap.contains(TopicAndPartition(topic, partitionId)))
warn("No checkpointed highwatermark is found for partition [%s,%d]".format(topic, partitionId))
val offset = offsetMap.getOrElse(TopicAndPartition(topic, partitionId), 0L).min(log.logEndOffset) //offset为cp读出的HW和logEndOffset中小的那个
val localReplica = new Replica(replicaId, this, time, offset, Some(log)) //创建Replica对象
addReplicaIfNotExists(localReplica) //加到AR中去
} else { //Remote Replica,直接创建对象
val remoteReplica = new Replica(replicaId, this, time)
addReplicaIfNotExists(remoteReplica)
}
getReplica(replicaId).get
}
}
makeLeader
/**
* Make the local replica the leader by resetting LogEndOffset for remote replicas (there could be old LogEndOffset from the time when this broker was the leader last time)
* and setting the new leader and ISR
*/
def makeLeader(controllerId: Int,
partitionStateInfo: PartitionStateInfo, correlationId: Int): Boolean = {
leaderIsrUpdateLock synchronized {
val allReplicas = partitionStateInfo.allReplicas //取出所有replicaid
val leaderIsrAndControllerEpoch = partitionStateInfo.leaderIsrAndControllerEpoch
val leaderAndIsr = leaderIsrAndControllerEpoch.leaderAndIsr
// record the epoch of the controller that made the leadership decision. This is useful while updating the isr
// to maintain the decision maker controller's epoch in the zookeeper path
controllerEpoch = leaderIsrAndControllerEpoch.controllerEpoch //每次做leadership decision的时候需要更新controllerEpoch
// add replicas that are new
allReplicas.foreach(replica => getOrCreateReplica(replica)) //生成所有replica对象,对于new的需要create
val newInSyncReplicas = leaderAndIsr.isr.map(r => getOrCreateReplica(r)).toSet //生成新的ISR
// remove assigned replicas that have been removed by the controller
(assignedReplicas().map(_.brokerId) -- allReplicas).foreach(removeReplica(_)) //对于AR中已经不再有效的replica,remove
// reset LogEndOffset for remote replicas
assignedReplicas.foreach(r => if (r.brokerId != localBrokerId) r.logEndOffset = ReplicaManager.UnknownLogEndOffset) //清空所有remote replica的LEO
inSyncReplicas = newInSyncReplicas
leaderEpoch = leaderAndIsr.leaderEpoch
zkVersion = leaderAndIsr.zkVersion
leaderReplicaIdOpt = Some(localBrokerId)
// we may need to increment high watermark
maybeIncrementLeaderHW(getReplica().get) //更新HW,用min(所有remote replica的LEO)
true
}
}
maybeIncrementLeaderHW
用所有remote replica的LEO的最小值来替换当前的HW(如果大于HW的话)
/**
* There is no need to acquire the leaderIsrUpdate lock here since all callers of this private API acquire that lock
* @param leaderReplica
*/
private def maybeIncrementLeaderHW(leaderReplica: Replica) {
val allLogEndOffsets = inSyncReplicas.map(_.logEndOffset)
val newHighWatermark = allLogEndOffsets.min
val oldHighWatermark = leaderReplica.highWatermark
if(newHighWatermark > oldHighWatermark) {
leaderReplica.highWatermark = newHighWatermark //更新HW
}
else
}
makeFollower
/**
* Make the local replica the follower by setting the new leader and ISR to empty
*/
def makeFollower(controllerId: Int,
partitionStateInfo: PartitionStateInfo,
leaders: Set[Broker], correlationId: Int): Boolean = {
leaderIsrUpdateLock synchronized {
val allReplicas = partitionStateInfo.allReplicas
val leaderIsrAndControllerEpoch = partitionStateInfo.leaderIsrAndControllerEpoch
val leaderAndIsr = leaderIsrAndControllerEpoch.leaderAndIsr
val newLeaderBrokerId: Int = leaderAndIsr.leader //新的leaderid
// record the epoch of the controller that made the leadership decision. This is useful while updating the isr
// to maintain the decision maker controller's epoch in the zookeeper path
controllerEpoch = leaderIsrAndControllerEpoch.controllerEpoch
// TODO: Delete leaders from LeaderAndIsrRequest in 0.8.1
leaders.find(_.id == newLeaderBrokerId) match {
case Some(leaderBroker) =>
// add replicas that are new
allReplicas.foreach(r => getOrCreateReplica(r))
// remove assigned replicas that have been removed by the controller
(assignedReplicas().map(_.brokerId) -- allReplicas).foreach(removeReplica(_))
inSyncReplicas = Set.empty[Replica] //将ISR置空
leaderEpoch = leaderAndIsr.leaderEpoch
zkVersion = leaderAndIsr.zkVersion
leaderReplicaIdOpt = Some(newLeaderBrokerId)
case None => // we should not come here
}
true
}
}
maybeShrinkIsr
从ISR中将Stuck followers和Slow followers去除
def maybeShrinkIsr(replicaMaxLagTimeMs: Long, replicaMaxLagMessages: Long) {
leaderIsrUpdateLock synchronized {
leaderReplicaIfLocal() match {
case Some(leaderReplica) =>
val outOfSyncReplicas = getOutOfSyncReplicas(leaderReplica, replicaMaxLagTimeMs, replicaMaxLagMessages) // 获取OutofSync的replicas
if(outOfSyncReplicas.size > 0) {
val newInSyncReplicas = inSyncReplicas – outOfSyncReplicas //从ISR中去除outOfSyncReplicas
// update ISR in zk and in cache
updateIsr(newInSyncReplicas)
// we may need to increment high watermark since ISR could be down to 1
maybeIncrementLeaderHW(leaderReplica)
replicaManager.isrShrinkRate.mark()
}
case None => // do nothing if no longer leader
}
}
}
def getOutOfSyncReplicas(leaderReplica: Replica, keepInSyncTimeMs: Long, keepInSyncMessages: Long): Set[Replica] = {
/**
* there are two cases that need to be handled here -
* 1. Stuck followers: If the leo of the replica hasn't been updated for keepInSyncTimeMs ms,
* the follower is stuck and should be removed from the ISR
* 2. Slow followers: If the leo of the slowest follower is behind the leo of the leader by keepInSyncMessages, the
* follower is not catching up and should be removed from the ISR
**/
val leaderLogEndOffset = leaderReplica.logEndOffset
val candidateReplicas = inSyncReplicas - leaderReplica
// Case 1 above
val stuckReplicas = candidateReplicas.filter(r => (time.milliseconds - r.logEndOffsetUpdateTimeMs) > keepInSyncTimeMs)
if(stuckReplicas.size > 0)
debug("Stuck replicas for partition [%s,%d] are %s".format(topic, partitionId, stuckReplicas.map(_.brokerId).mkString(",")))
// Case 2 above
val slowReplicas = candidateReplicas.filter(r => r.logEndOffset >= 0 && (leaderLogEndOffset - r.logEndOffset) > keepInSyncMessages)
if(slowReplicas.size > 0)
debug("Slow replicas for partition [%s,%d] are %s".format(topic, partitionId, slowReplicas.map(_.brokerId).mkString(",")))
stuckReplicas ++ slowReplicas
}
Apache Kafka源码分析 – Replica and Partition的更多相关文章
- Apache Kafka源码分析 – Broker Server
1. Kafka.scala 在Kafka的main入口中startup KafkaServerStartable, 而KafkaServerStartable这是对KafkaServer的封装 1: ...
- apache kafka源码分析-Producer分析---转载
原文地址:http://www.aboutyun.com/thread-9938-1-1.html 问题导读1.Kafka提供了Producer类作为java producer的api,此类有几种发送 ...
- Apache Kafka源码分析 – Controller
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Controller+Internalshttps://cwiki.apache.org ...
- Apache Kafka源码分析 - kafka controller
前面已经分析过kafka server的启动过程,以及server所能处理的所有的request,即KafkaApis 剩下的,其实关键就是controller,以及partition和replica ...
- Apache Kafka源码分析 - autoLeaderRebalanceEnable
在broker的配置中,auto.leader.rebalance.enable (false) 那么这个leader是如何进行rebalance的? 首先在controller启动的时候会打开一个s ...
- Apache Kafka源码分析 - KafkaApis
kafka apis反映出kafka broker server可以提供哪些服务,broker server主要和producer,consumer,controller有交互,搞清这些api就清楚了 ...
- Apache Kafka源码分析 – Log Management
LogManager LogManager会管理broker上所有的logs(在一个log目录下),一个topic的一个partition对应于一个log(一个log子目录)首先loadLogs会加载 ...
- Apache Kafka源码分析 - ReplicaStateMachine
startup 在onControllerFailover中被调用, /** * Invoked on successful controller election. First registers ...
- Apache Kafka源码分析 - PartitionStateMachine
startup 在onControllerFailover中被调用, initializePartitionState private def initializePartitionState() { ...
随机推荐
- java中的类加载器ClassLoader和类初始化
每个类编译后产生一个Class对象,存储在.class文件中,JVM使用类加载器(Class Loader)来加载类的字节码文件(.class),类加载器实质上是一条类加载器链,一般的,我们只会用到一 ...
- char数组与string转换
1.char数组转string 有很多种方法: 假设c字符串定义为char ch[]="hello world!"; 1.向构造函数传入c字符串创建string对象: string ...
- dependent-name ‘xxx::yyy’ is parsed as a non-type, but instantiation yields a type
简言之,就是说你该用typename的地方没用typename,如以下代码 template<class Cont> void frontInsertion(Cont& ci) { ...
- (转载) STL中map用法详解
Map是STL的一个关联容器,它提供一对一(其中第一个可以称为关键字,每个关键字只能在map中出现一次,第二个可能称为该关键字的值)的数据处理能力,由于这个特性,它完成有可能在我们处理一对一数据的时候 ...
- Requests blocked by CORS policy in spring boot and angular
在本地启动Spring Boot后端和Angular前端调试时遇到跨域访问的问题导致前端请求失败. 错误描述 Access to XMLHttpRequest at 'http://localhost ...
- Tensorflow之快速加载MNIST数据集
from tensorflow.examples.tutorials.mnist import input_data import tensorflow as tf def myprint(v): p ...
- jQuery补充,基于jQuery的bxslider轮播器插件
基于jQuery的bxslider轮播器插件 html <!DOCTYPE html> <html lang="zh-cn"> <head> & ...
- 【BZOJ】1061: [Noi2008]志愿者招募(费用流+数学)
http://www.lydsy.com/JudgeOnline/problem.php?id=1061 好神的一题! 学会了一种建模方式: 当方程组内的任意变量都在其中两个方程出现且一正一负,可以建 ...
- 【BZOJ】1662: [Usaco2006 Nov]Round Numbers 圆环数(数位dp)
http://www.lydsy.com/JudgeOnline/problem.php?id=1662 这道题折腾了我两天啊-!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 果然 ...
- 纪念,BZOJ AC 100题!
虽然说有将近50+是usaco,然后还有很多水题T_T 看来我还是刷水题.... 看来我还是那么弱. T_T 但是好歹也要留个纪念..