Apache Kafka源码分析 – Replica and Partition
Replica
对于local replica, 需要记录highWatermarkValue,表示当前已经committed的数据
对于remote replica,需要记录logEndOffsetValue以及更新的时间
package kafka.cluster class Replica(val brokerId: Int,
val partition: Partition,
time: Time = SystemTime,
initialHighWatermarkValue: Long = 0L,
val log: Option[Log] = None) extends Logging {
//only defined in local replica
private[this] var highWatermarkValue: AtomicLong = new AtomicLong(initialHighWatermarkValue)
// only used for remote replica; logEndOffsetValue for local replica is kept in log
private[this] var logEndOffsetValue = new AtomicLong(ReplicaManager.UnknownLogEndOffset)
private[this] var logEndOffsetUpdateTimeMsValue: AtomicLong = new AtomicLong(time.milliseconds)
val topic = partition.topic
val partitionId = partition.partitionId
}
Partition
主要用于管理leader,ISR,AR
package kafka.cluster /**
* Data structure that represents a topic partition. The leader maintains the AR, ISR, CUR, RAR
*/
class Partition(val topic: String,
val partitionId: Int,
var replicationFactor: Int,
time: Time,
val replicaManager: ReplicaManager) extends Logging with KafkaMetricsGroup {
private val localBrokerId = replicaManager.config.brokerId //当前的brokerId
private val logManager = replicaManager.logManager
private val zkClient = replicaManager.zkClient
var leaderReplicaIdOpt: Option[Int] = None //leaderReplica的id
var inSyncReplicas: Set[Replica] = Set.empty[Replica] //ISR
private val assignedReplicaMap = new Pool[Int,Replica] //AR,往往是大于等于ISR的
private val leaderIsrUpdateLock = new Object
private var zkVersion: Int = LeaderAndIsr.initialZKVersion
private var leaderEpoch: Int = LeaderAndIsr.initialLeaderEpoch - 1
/* Epoch of the controller that last changed the leader. This needs to be initialized correctly upon broker startup.
* One way of doing that is through the controller's start replica state change command. When a new broker starts up
* the controller sends it a start replica command containing the leader for each partition that the broker hosts.
* In addition to the leader, the controller can also send the epoch of the controller that elected the leader for
* each partition. */
private var controllerEpoch: Int = KafkaController.InitialControllerEpoch - 1
}
getOrCreateReplica
def getOrCreateReplica(replicaId: Int = localBrokerId): Replica = {
val replicaOpt = getReplica(replicaId)
replicaOpt match {
case Some(replica) => replica
case None => //需要create
if (isReplicaLocal(replicaId)) {
val config = LogConfig.fromProps(logManager.defaultConfig.toProps, AdminUtils.fetchTopicConfig(zkClient, topic))
val log = logManager.createLog(TopicAndPartition(topic, partitionId), config) //创建logfile
val checkpoint = replicaManager.highWatermarkCheckpoints(log.dir.getParent) //试图读取HW的checkpoint
val offsetMap = checkpoint.read
if (!offsetMap.contains(TopicAndPartition(topic, partitionId)))
warn("No checkpointed highwatermark is found for partition [%s,%d]".format(topic, partitionId))
val offset = offsetMap.getOrElse(TopicAndPartition(topic, partitionId), 0L).min(log.logEndOffset) //offset为cp读出的HW和logEndOffset中小的那个
val localReplica = new Replica(replicaId, this, time, offset, Some(log)) //创建Replica对象
addReplicaIfNotExists(localReplica) //加到AR中去
} else { //Remote Replica,直接创建对象
val remoteReplica = new Replica(replicaId, this, time)
addReplicaIfNotExists(remoteReplica)
}
getReplica(replicaId).get
}
}
makeLeader
/**
* Make the local replica the leader by resetting LogEndOffset for remote replicas (there could be old LogEndOffset from the time when this broker was the leader last time)
* and setting the new leader and ISR
*/
def makeLeader(controllerId: Int,
partitionStateInfo: PartitionStateInfo, correlationId: Int): Boolean = {
leaderIsrUpdateLock synchronized {
val allReplicas = partitionStateInfo.allReplicas //取出所有replicaid
val leaderIsrAndControllerEpoch = partitionStateInfo.leaderIsrAndControllerEpoch
val leaderAndIsr = leaderIsrAndControllerEpoch.leaderAndIsr
// record the epoch of the controller that made the leadership decision. This is useful while updating the isr
// to maintain the decision maker controller's epoch in the zookeeper path
controllerEpoch = leaderIsrAndControllerEpoch.controllerEpoch //每次做leadership decision的时候需要更新controllerEpoch
// add replicas that are new
allReplicas.foreach(replica => getOrCreateReplica(replica)) //生成所有replica对象,对于new的需要create
val newInSyncReplicas = leaderAndIsr.isr.map(r => getOrCreateReplica(r)).toSet //生成新的ISR
// remove assigned replicas that have been removed by the controller
(assignedReplicas().map(_.brokerId) -- allReplicas).foreach(removeReplica(_)) //对于AR中已经不再有效的replica,remove
// reset LogEndOffset for remote replicas
assignedReplicas.foreach(r => if (r.brokerId != localBrokerId) r.logEndOffset = ReplicaManager.UnknownLogEndOffset) //清空所有remote replica的LEO
inSyncReplicas = newInSyncReplicas
leaderEpoch = leaderAndIsr.leaderEpoch
zkVersion = leaderAndIsr.zkVersion
leaderReplicaIdOpt = Some(localBrokerId)
// we may need to increment high watermark
maybeIncrementLeaderHW(getReplica().get) //更新HW,用min(所有remote replica的LEO)
true
}
}
maybeIncrementLeaderHW
用所有remote replica的LEO的最小值来替换当前的HW(如果大于HW的话)
/**
* There is no need to acquire the leaderIsrUpdate lock here since all callers of this private API acquire that lock
* @param leaderReplica
*/
private def maybeIncrementLeaderHW(leaderReplica: Replica) {
val allLogEndOffsets = inSyncReplicas.map(_.logEndOffset)
val newHighWatermark = allLogEndOffsets.min
val oldHighWatermark = leaderReplica.highWatermark
if(newHighWatermark > oldHighWatermark) {
leaderReplica.highWatermark = newHighWatermark //更新HW
}
else
}
makeFollower
/**
* Make the local replica the follower by setting the new leader and ISR to empty
*/
def makeFollower(controllerId: Int,
partitionStateInfo: PartitionStateInfo,
leaders: Set[Broker], correlationId: Int): Boolean = {
leaderIsrUpdateLock synchronized {
val allReplicas = partitionStateInfo.allReplicas
val leaderIsrAndControllerEpoch = partitionStateInfo.leaderIsrAndControllerEpoch
val leaderAndIsr = leaderIsrAndControllerEpoch.leaderAndIsr
val newLeaderBrokerId: Int = leaderAndIsr.leader //新的leaderid
// record the epoch of the controller that made the leadership decision. This is useful while updating the isr
// to maintain the decision maker controller's epoch in the zookeeper path
controllerEpoch = leaderIsrAndControllerEpoch.controllerEpoch
// TODO: Delete leaders from LeaderAndIsrRequest in 0.8.1
leaders.find(_.id == newLeaderBrokerId) match {
case Some(leaderBroker) =>
// add replicas that are new
allReplicas.foreach(r => getOrCreateReplica(r))
// remove assigned replicas that have been removed by the controller
(assignedReplicas().map(_.brokerId) -- allReplicas).foreach(removeReplica(_))
inSyncReplicas = Set.empty[Replica] //将ISR置空
leaderEpoch = leaderAndIsr.leaderEpoch
zkVersion = leaderAndIsr.zkVersion
leaderReplicaIdOpt = Some(newLeaderBrokerId)
case None => // we should not come here
}
true
}
}
maybeShrinkIsr
从ISR中将Stuck followers和Slow followers去除
def maybeShrinkIsr(replicaMaxLagTimeMs: Long, replicaMaxLagMessages: Long) {
leaderIsrUpdateLock synchronized {
leaderReplicaIfLocal() match {
case Some(leaderReplica) =>
val outOfSyncReplicas = getOutOfSyncReplicas(leaderReplica, replicaMaxLagTimeMs, replicaMaxLagMessages) // 获取OutofSync的replicas
if(outOfSyncReplicas.size > 0) {
val newInSyncReplicas = inSyncReplicas – outOfSyncReplicas //从ISR中去除outOfSyncReplicas
// update ISR in zk and in cache
updateIsr(newInSyncReplicas)
// we may need to increment high watermark since ISR could be down to 1
maybeIncrementLeaderHW(leaderReplica)
replicaManager.isrShrinkRate.mark()
}
case None => // do nothing if no longer leader
}
}
} def getOutOfSyncReplicas(leaderReplica: Replica, keepInSyncTimeMs: Long, keepInSyncMessages: Long): Set[Replica] = {
/**
* there are two cases that need to be handled here -
* 1. Stuck followers: If the leo of the replica hasn't been updated for keepInSyncTimeMs ms,
* the follower is stuck and should be removed from the ISR
* 2. Slow followers: If the leo of the slowest follower is behind the leo of the leader by keepInSyncMessages, the
* follower is not catching up and should be removed from the ISR
**/
val leaderLogEndOffset = leaderReplica.logEndOffset
val candidateReplicas = inSyncReplicas - leaderReplica
// Case 1 above
val stuckReplicas = candidateReplicas.filter(r => (time.milliseconds - r.logEndOffsetUpdateTimeMs) > keepInSyncTimeMs)
if(stuckReplicas.size > 0)
debug("Stuck replicas for partition [%s,%d] are %s".format(topic, partitionId, stuckReplicas.map(_.brokerId).mkString(",")))
// Case 2 above
val slowReplicas = candidateReplicas.filter(r => r.logEndOffset >= 0 && (leaderLogEndOffset - r.logEndOffset) > keepInSyncMessages)
if(slowReplicas.size > 0)
debug("Slow replicas for partition [%s,%d] are %s".format(topic, partitionId, slowReplicas.map(_.brokerId).mkString(",")))
stuckReplicas ++ slowReplicas
}
Apache Kafka源码分析 – Replica and Partition的更多相关文章
- Apache Kafka源码分析 – Broker Server
1. Kafka.scala 在Kafka的main入口中startup KafkaServerStartable, 而KafkaServerStartable这是对KafkaServer的封装 1: ...
- apache kafka源码分析-Producer分析---转载
原文地址:http://www.aboutyun.com/thread-9938-1-1.html 问题导读1.Kafka提供了Producer类作为java producer的api,此类有几种发送 ...
- Apache Kafka源码分析 – Controller
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Controller+Internalshttps://cwiki.apache.org ...
- Apache Kafka源码分析 - kafka controller
前面已经分析过kafka server的启动过程,以及server所能处理的所有的request,即KafkaApis 剩下的,其实关键就是controller,以及partition和replica ...
- Apache Kafka源码分析 - autoLeaderRebalanceEnable
在broker的配置中,auto.leader.rebalance.enable (false) 那么这个leader是如何进行rebalance的? 首先在controller启动的时候会打开一个s ...
- Apache Kafka源码分析 - KafkaApis
kafka apis反映出kafka broker server可以提供哪些服务,broker server主要和producer,consumer,controller有交互,搞清这些api就清楚了 ...
- Apache Kafka源码分析 – Log Management
LogManager LogManager会管理broker上所有的logs(在一个log目录下),一个topic的一个partition对应于一个log(一个log子目录)首先loadLogs会加载 ...
- Apache Kafka源码分析 - ReplicaStateMachine
startup 在onControllerFailover中被调用, /** * Invoked on successful controller election. First registers ...
- Apache Kafka源码分析 - PartitionStateMachine
startup 在onControllerFailover中被调用, initializePartitionState private def initializePartitionState() { ...
随机推荐
- sass初学入门笔记(一)
我本身是个新手,一边学sass一边记下的笔记,可能有点罗嗦,但是复习起来的话还是比较全面直观的.当然,最重要的还是去实践,实践得真理 其它 CSS 预处理器语言: CSS 预处理器技术已经非常的成熟, ...
- 使用JSTL的sql:query标签制作分页查询遇到NoSuchFieldError: deferredExpression
参考:http://hi.baidu.com/desyle/item/4fe650265792d7182a0f1c33 症状: 如题所述,代码如下 <sql:query var="re ...
- spring配置:context:property-placeholder 读取配置文件信息 在配置文件中使用el表达式填充值
spring将properties文件读取后在配置文件中直接将对象的配置信息填充到bean中的变量里. 原本使用PropertyPlaceholderConfigurer类进行文件信息配置.Prope ...
- 示例 - 如何在ASP.NET中应用Spider Studio生成的DLL?
>> 接前文 "示例 - 如何在Console应用程序中应用SpiderStudio生成的DLL?", 将其运用到ASP.NET中: 1. 创建WebApplicati ...
- input checkbox 选中问题
对html控制不熟的人,估计被checkbox的选中问题发愁了,因为input的checkbox只有选中属性 checked='checked' 但是它有另外一个规则就是Request的时候 只有选中 ...
- 代码大全(code complete) 有感
软件开发的工作内容 问题定义 需求分析 实现计划 总体设计 详细设计 创建即实现(编码和调试) 系统集成 单元测试 校正性维护 功能强化 隐喻 好比监听器看做是某单位的看门老大爷 这里的类比: 通过把 ...
- IoC就是IoC,不是什么技术,与GoF一样,是一种设计模式。
IoC就是IoC,不是什么技术,与GoF一样,是一种设计模式. InterfaceDrivenDesign接口驱动,接口驱动有很多好处,可以提供不同灵活的子类实现,增加代码稳定和健壮性等等,但是接口一 ...
- java----Servlet的生命周期
Servlet生命周期分为三个阶段: 1,初始化阶段 调用init()方法 2,响应客户请求阶段 调用service()方法 3,终止阶段 调用destroy()方法 Servlet初始化阶段: 在 ...
- hrbustoj 1318:蛋疼的蚂蚁(计算几何,凸包变种,叉积应用)
蛋疼的蚂蚁 Time Limit: 1000 MS Memory Limit: 65536 K Total Submit: 39(22 users) Total Accepted: 26 ...
- MyBitis(iBitis)系列随笔之三:简单实现CRUD
Mybitis(iBitis)实现对对象增删改查操作要借助<select/>查询,<insert/>增加,<update/>更新,<delete/>删除 ...