如果说controller作为master,负责全局的事情,比如选取leader,reassignment等
那么ReplicaManager就是worker,负责完成replica的管理工作

主要工作包含,

stopReplica
getOrCreatePartition
getLeaderReplicaIfLocal
getReplica
readMessageSets
becomeLeaderOrFollower

StopReplicaCommand

处理很简单,主要就是停止fetcher线程,并删除partition目录

stopReplicas

  def stopReplicas(stopReplicaRequest: StopReplicaRequest): (mutable.Map[TopicAndPartition, Short], Short) = {
replicaStateChangeLock synchronized { // 加锁
val responseMap = new collection.mutable.HashMap[TopicAndPartition, Short]
if(stopReplicaRequest.controllerEpoch < controllerEpoch) { // 检查Epoch,防止收到过期的request
(responseMap, ErrorMapping.StaleControllerEpochCode)
} else {
controllerEpoch = stopReplicaRequest.controllerEpoch // 更新Epoch
// First stop fetchers for all partitions, then stop the corresponding replicas
replicaFetcherManager.removeFetcherForPartitions(stopReplicaRequest.partitions.map(r => TopicAndPartition(r.topic, r.partition))) // 先通过FetcherManager停止相关partition的Fetcher线程
for(topicAndPartition <- stopReplicaRequest.partitions){
val errorCode = stopReplica(topicAndPartition.topic, topicAndPartition.partition, stopReplicaRequest.deletePartitions) // 调用stopReplica
responseMap.put(topicAndPartition, errorCode)
}
(responseMap, ErrorMapping.NoError)
}
}
}

stopReplica

  def stopReplica(topic: String, partitionId: Int, deletePartition: Boolean): Short  = {
getPartition(topic, partitionId) match {
case Some(partition) =>
leaderPartitionsLock synchronized {
leaderPartitions -= partition
}
if(deletePartition) { // 仅仅在deletePartition=true时,才会真正删除该partition
val removedPartition = allPartitions.remove((topic, partitionId))
if (removedPartition != null)
removedPartition.delete() // this will delete the local log
}
case None => //do nothing if replica no longer exists. This can happen during delete topic retries
}
}

LeaderAndISRCommand

becomeLeaderOrFollower
做些epoch和valid的检查,然后区分出leader和follows,分别调用makeLeaders,makeFollowers

  def becomeLeaderOrFollower(leaderAndISRRequest: LeaderAndIsrRequest): (collection.Map[(String, Int), Short], Short) = {
replicaStateChangeLock synchronized {// 加锁
val responseMap = new collection.mutable.HashMap[(String, Int), Short]
if(leaderAndISRRequest.controllerEpoch < controllerEpoch) { // 检查requset epoch
(responseMap, ErrorMapping.StaleControllerEpochCode)
} else {
val controllerId = leaderAndISRRequest.controllerId
val correlationId = leaderAndISRRequest.correlationId
controllerEpoch = leaderAndISRRequest.controllerEpoch // First check partition's leader epoch // 前面只是检查了request的epoch,但是还要检查其中的每个partitionStateInfo中的leader epoch
val partitionState = new HashMap[Partition, PartitionStateInfo]()
leaderAndISRRequest.partitionStateInfos.foreach{ case ((topic, partitionId), partitionStateInfo) =>
val partition = getOrCreatePartition(topic, partitionId, partitionStateInfo.replicationFactor) // get或创建partition
val partitionLeaderEpoch = partition.getLeaderEpoch()
// If the leader epoch is valid record the epoch of the controller that made the leadership decision.
// This is useful while updating the isr to maintain the decision maker controller's epoch in the zookeeper path
if (partitionLeaderEpoch < partitionStateInfo.leaderIsrAndControllerEpoch.leaderAndIsr.leaderEpoch) { // local的partitionLeaderEpoch要小于request中的leaderEpoch,否则就是过时的request
if(partitionStateInfo.allReplicas.contains(config.brokerId)) // 判断该partition是否被assigned给当前的broker
partitionState.put(partition, partitionStateInfo)
else { }
} else { // Received invalid LeaderAndIsr request
// Otherwise record the error code in response
responseMap.put((topic, partitionId), ErrorMapping.StaleLeaderEpochCode)
}
} val partitionsTobeLeader = partitionState
.filter{ case (partition, partitionStateInfo) => partitionStateInfo.leaderIsrAndControllerEpoch.leaderAndIsr.leader == config.brokerId}
val partitionsToBeFollower = (partitionState -- partitionsTobeLeader.keys) if (!partitionsTobeLeader.isEmpty) makeLeaders(controllerId, controllerEpoch, partitionsTobeLeader, leaderAndISRRequest.correlationId, responseMap)
if (!partitionsToBeFollower.isEmpty) makeFollowers(controllerId, controllerEpoch, partitionsToBeFollower, leaderAndISRRequest.leaders, leaderAndISRRequest.correlationId, responseMap) // we initialize highwatermark thread after the first leaderisrrequest. This ensures that all the partitions
// have been completely populated before starting the checkpointing there by avoiding weird race conditions
if (!hwThreadInitialized) {
startHighWaterMarksCheckPointThread() // 启动HighWaterMarksCheckPointThread
hwThreadInitialized = true
}
replicaFetcherManager.shutdownIdleFetcherThreads()
(responseMap, ErrorMapping.NoError)
}
}
}

makeLeaders
停止Fetcher,调用partition.makeLeader,把这些partition加到leaderPartitions中

  /*
* Make the current broker to become leader for a given set of partitions by:
*
* 1. Stop fetchers for these partitions
* 2. Update the partition metadata in cache
* 3. Add these partitions to the leader partitions set
*
* If an unexpected error is thrown in this function, it will be propagated to KafkaApis where
* the error message will be set on each partition since we do not know which partition caused it
* TODO: the above may need to be fixed later
*/
private def makeLeaders(controllerId: Int, epoch: Int,
partitionState: Map[Partition, PartitionStateInfo],
correlationId: Int, responseMap: mutable.Map[(String, Int), Short]) = {
try {
// First stop fetchers for all the partitions
replicaFetcherManager.removeFetcherForPartitions(partitionState.keySet.map(new TopicAndPartition(_)))
// Update the partition information to be the leader
partitionState.foreach{ case (partition, partitionStateInfo) =>
partition.makeLeader(controllerId, partitionStateInfo, correlationId)} // Finally add these partitions to the list of partitions for which the leader is the current broker
leaderPartitionsLock synchronized {
leaderPartitions ++= partitionState.keySet
}
} catch {
}
}

makeFollowers
除了修改leaderPartitions和Mark as followers以外
作为followers,需要truncated log到highWatermark,然后启动fetcher去catch leader

  /*
* Make the current broker to become follower for a given set of partitions by:
*
* 1. Remove these partitions from the leader partitions set.
* 2. Mark the replicas as followers so that no more data can be added from the producer clients.
* 3. Stop fetchers for these partitions so that no more data can be added by the replica fetcher threads.
* 4. Truncate the log and checkpoint offsets for these partitions.
* 5. If the broker is not shutting down, add the fetcher to the new leaders.
*
* The ordering of doing these steps make sure that the replicas in transition will not
* take any more messages before checkpointing offsets so that all messages before the checkpoint
* are guaranteed to be flushed to disks
*
* If an unexpected error is thrown in this function, it will be propagated to KafkaApis where
* the error message will be set on each partition since we do not know which partition caused it
*/
private def makeFollowers(controllerId: Int, epoch: Int, partitionState: Map[Partition, PartitionStateInfo],
leaders: Set[Broker], correlationId: Int, responseMap: mutable.Map[(String, Int), Short]) {
try {
leaderPartitionsLock synchronized {
leaderPartitions --= partitionState.keySet
} partitionState.foreach{ case (partition, leaderIsrAndControllerEpoch) =>
partition.makeFollower(controllerId, leaderIsrAndControllerEpoch, leaders, correlationId)} replicaFetcherManager.removeFetcherForPartitions(partitionState.keySet.map(new TopicAndPartition(_))) logManager.truncateTo(partitionState.map{ case(partition, leaderISRAndControllerEpoch) => // 将当前replica的log truncate到highWatermark,因为只有committed的数据是可以保证和leader一致的
new TopicAndPartition(partition) -> partition.getOrCreateReplica().highWatermark
})
if (!isShuttingDown.get()) { // 如果该broker没有shutting down
val partitionAndOffsets = mutable.Map[TopicAndPartition, BrokerAndInitialOffset]()
partitionState.foreach {
case (partition, partitionStateInfo) =>
val leader = partitionStateInfo.leaderIsrAndControllerEpoch.leaderAndIsr.leader // 找到leader
leaders.find(_.id == leader) match {
case Some(leaderBroker) =>
partitionAndOffsets.put(new TopicAndPartition(partition), // get当前replica的logEndOffset
BrokerAndInitialOffset(leaderBroker, partition.getReplica().get.logEndOffset))
case None =>
}
}
replicaFetcherManager.addFetcherForPartitions(partitionAndOffsets) // 启动Fetcher去catch leader
}
else { }
}
} catch {
}
}

checkpointHighWatermarks
对于每个replica而已,HighWatermarks是很重要的,因为只有通过它可以知道到底哪些数据是一致的,这样就算broker crash,恢复的时候只需要基于HighWatermarks继续catch就可以
所以对于HighWatermarks,需要做cp

  /**
* Flushes the highwatermark value for all partitions to the highwatermark file
*/
def checkpointHighWatermarks() {
val replicas = allPartitions.values.map(_.getReplica(config.brokerId)).collect{case Some(replica) => replica}
val replicasByDir = replicas.filter(_.log.isDefined).groupBy(_.log.get.dir.getParent)
for((dir, reps) <- replicasByDir) {
val hwms = reps.map(r => (new TopicAndPartition(r) -> r.highWatermark)).toMap
try {
highWatermarkCheckpoints(dir).write(hwms)
} catch {
case e: IOException =>
fatal("Error writing to highwatermark file: ", e)
Runtime.getRuntime().halt(1)
}
}
}

Apache Kafka源码分析 – ReplicaManager的更多相关文章

  1. Apache Kafka源码分析 – Broker Server

    1. Kafka.scala 在Kafka的main入口中startup KafkaServerStartable, 而KafkaServerStartable这是对KafkaServer的封装 1: ...

  2. apache kafka源码分析-Producer分析---转载

    原文地址:http://www.aboutyun.com/thread-9938-1-1.html 问题导读1.Kafka提供了Producer类作为java producer的api,此类有几种发送 ...

  3. Apache Kafka源码分析 - kafka controller

    前面已经分析过kafka server的启动过程,以及server所能处理的所有的request,即KafkaApis 剩下的,其实关键就是controller,以及partition和replica ...

  4. Apache Kafka源码分析 - KafkaApis

    kafka apis反映出kafka broker server可以提供哪些服务,broker server主要和producer,consumer,controller有交互,搞清这些api就清楚了 ...

  5. Apache Kafka源码分析 – Replica and Partition

    Replica 对于local replica, 需要记录highWatermarkValue,表示当前已经committed的数据对于remote replica,需要记录logEndOffsetV ...

  6. Apache Kafka源码分析 – Controller

    https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Controller+Internalshttps://cwiki.apache.org ...

  7. Apache Kafka源码分析 – Log Management

    LogManager LogManager会管理broker上所有的logs(在一个log目录下),一个topic的一个partition对应于一个log(一个log子目录)首先loadLogs会加载 ...

  8. Apache Kafka源码分析 - autoLeaderRebalanceEnable

    在broker的配置中,auto.leader.rebalance.enable (false) 那么这个leader是如何进行rebalance的? 首先在controller启动的时候会打开一个s ...

  9. Apache Kafka源码分析 - ReplicaStateMachine

    startup 在onControllerFailover中被调用, /** * Invoked on successful controller election. First registers ...

随机推荐

  1. response.setHeader各种用法

    一秒刷新页面一次 response.setHeader("refresh","1"); 二秒跳到其他页面 (登陆跳转) response.setHeader(& ...

  2. Bootstrap学习笔记(5)--实现Bootstrap导航条可点击和鼠标悬停显示下拉菜单

    实现Bootstrap导航条可点击和鼠标悬停显示下拉菜单 微笑的鱼 2014-01-03 Bootstrap 5,281 次围观 11条评论 使用Bootstrap导航条组件时,如果你的导航条带有下拉 ...

  3. Unix系统编程()malloc和free的实现

    尽管malloc和free所提供的内存分配接口比之brk和sbrk要容易许多,但在使用时仍然容易犯下各种编程错误. 理解malloc和free的实现,将使我们洞悉产生这些错误的原因以及如何才能避免此类 ...

  4. Unix系统编程()执行非局部跳转:setjmp和longjmp

    使用库函数setjmp和longjmp可执行非局部跳转(local goto). 术语"非局部(nonlocal)"是指跳转目标为当前执行函数之外的某个位置. C语言里面有个&qu ...

  5. 【BZOJ】2005: [Noi2010]能量采集(欧拉函数+分块)

    http://www.lydsy.com/JudgeOnline/problem.php?id=2005 首先和某题一样应该一样可以看出每个点所在的线上有gcd(x,y)-1个点挡着了自己... 那么 ...

  6. Struts2_day03--课程安排_OGNL概述入门_什么是值栈_获取值栈对象_值栈内部结构

    Struts2_day03 上节内容 今天内容 OGNL概述 OGNL入门案例 什么是值栈 获取值栈对象 值栈内部结构 向值栈放数据 向值栈放对象 向值栈放list集合 从值栈获取数据 获取字符串 获 ...

  7. 《C++ Primer Plus》学习笔记 第1章 预备知识

    第一章 预备知识C++在C语言的基础上添加了对"面向对象编程"的支持和对"泛型编程"的支持.类 —— 面向对象模板 —— 泛型编程1.1 C++简介1.2 C+ ...

  8. Zabbix-3.0.x使用OneAlert发送告警

    导读 OneAlert 是国内首个 SaaS 模式的云告警平台,集成国内外主流监控/支撑系统,实现一个平台上集中处理所有 IT 事件,提升 IT 可靠性.它能以史上第二快的速度,对事件进行智能的组织. ...

  9. Excel 一个工作表进行按行数拆分

    1. 如下Excel表,总共有120多行数据,如何将以50行数据为一个工作表进行拆分 Sub ZheFenSheet() Dim r, c, i, WJhangshu, WJshu, bt As Lo ...

  10. [NOIP2017]列队 离线+SBT

    [NOIP2017]列队 题目描述 Sylvia 是一个热爱学习的女♂孩子. 前段时间,Sylvia 参加了学校的军训.众所周知,军训的时候需要站方阵. Sylvia 所在的方阵中有n×m名学生,方阵 ...