本文主要介绍了Kafka High level的代码架构和主要的类。

这张图是0.8版本的架构

Boker 架构

1 network layer

Kafka使用NIO自己实现了网络层的代码，而不是采用netty, mina等第三方的网络框架。从性能上来讲，这一块的代码不是性能的瓶颈。

它采用IO多路复用和多线程下的Reactor模式,主要实现类包括SocketServer, Acceptor, Processor和RequestChannel。

Kafka的服务器由SocketServer实现,它是一个NIO的服务器，线程模型如下：

1个Acceptor线程负责处理新连接
N个Processor线程，每个processor都有自己的selector，负责从socket中读取请求和发送response
M个Handler线程处理请求，并产生response给processor线程

可以从上面的图形中看到Acceptor, Processor和Handler的功能。

1.1 a. Boker在启动的时候会调用SocketServer的startup方法。

def startup() {

    ......

    for(i <- 0 until numProcessorThreads) {

      processors(i) = new Processor(i,

                                    time,

                                    maxRequestSize,

                                    aggregateIdleMeter,

                                    newMeter("IdlePercent", "percent", TimeUnit.NANOSECONDS, Map("networkProcessor" -> i.toString)),

                                    numProcessorThreads,

                                    requestChannel,

                                    quotas,

                                    connectionsMaxIdleMs)

      Utils.newThread("kafka-network-thread-%d-%d".format(port, i), processors(i), false).start()

    }

    ......

    // start accepting connections

    this.acceptor = new Acceptor(host, port, processors, sendBufferSize, recvBufferSize, quotas)

    Utils.newThread("kafka-socket-acceptor-%s-%d".format(protocol.toString, endpoint.port), acceptor, false).start()

    acceptor.awaitStartup

    ......

  }

1.2 b. 它为每个Processor生成一个线程并启动，然后启动一个Acceptor线程。

Acceptor是一个典型NIO 处理新连接的方法类：

private[kafka] class Acceptor(...) extends AbstractServerThread(connectionQuotas) {

	val serverChannel = openServerSocket(host, port)

	def run() {

		serverChannel.register(selector, SelectionKey.OP_ACCEPT);

		......

		while(isRunning) {

		  val ready = selector.select(500)

		  if(ready > 0) {

			val keys = selector.selectedKeys()

			val iter = keys.iterator()

			while(iter.hasNext && isRunning) {

				......

				accept(key, processors(currentProcessor))

				......

				currentProcessor = (currentProcessor + 1) % processors.length

			}

		  }

		}

		......

	}

}

1.3 c. 它会将新的连接均匀地分配给一个Processor。通过accept方法配置网络参数，并交给processor读写数据。

def accept(key: SelectionKey, processor: Processor) {

    val serverSocketChannel = key.channel().asInstanceOf[ServerSocketChannel]

    val socketChannel = serverSocketChannel.accept()

    try {

      connectionQuotas.inc(socketChannel.socket().getInetAddress)

      socketChannel.configureBlocking(false)

      socketChannel.socket().setTcpNoDelay(true)

      socketChannel.socket().setSendBufferSize(sendBufferSize)

      processor.accept(socketChannel)

    } catch {

      case e: TooManyConnectionsException =>

        info("Rejected connection from %s, address already has the configured maximum of %d connections.".format(e.ip, e.count))

        close(socketChannel)

    }

}

1.4 d. Processor的accept方法将新连接加入它的新连接待处理队列中

在configureNewConnections方法中注册OP_READ。

def accept(socketChannel: SocketChannel) {

	newConnections.add(socketChannel)

	wakeup()

}

private def configureNewConnections() {

	while(newConnections.size() > 0) {

	  val channel = newConnections.poll()

	  debug("Processor " + id + " listening to new connection from " + channel.socket.getRemoteSocketAddress)

	  channel.register(selector, SelectionKey.OP_READ)

	}

  }

1.5 e. Processor线程的主处理逻辑如下，这是一个死循环，会一直处理这些连接的读写

override def run() {

    startupComplete()

    while (isRunning) {

      try {

        // setup any new connections that have been queued up // 为新连接注册OP_READ

        configureNewConnections()

        // register any new responses for writing

        // 为新的response注册OP_WRITE， 它从requestChannel.receiveResponse(processor's id)读取response

        processNewResponses()

        poll()

        processCompletedReceives()

        processCompletedSends()

        processDisconnected()

      } catch {

        ...

      }

    }

    debug("Closing selector - processor " + id)

    swallowError(closeAll())

    shutdownComplete()

  }

这也是一个标准的NIO的处理代码。

1.6 f. 我们看看read和write是怎么实现的。<这个和0.10的代码对应不上，这个类是修改了的。>

因为Kafka的消息前四个字节代表(一个int)为后续消息的size,所以首先读取size,接着把一个完整的消息读取出来。

如果读取出来一个完整的Request,则将它放到requestChannel中。

具体的Kafka消息的格式可以参考 A Guide To The Kafka Protocol

我们再看看write方法的实现, 直到写完一个response,才讲Ops设为OP_READ,否则一直尝试写。

以上是网络层的主要代码逻辑，主要负责Kafka消息的读写。

2.API layer

API层的主要功能是由KafkaApis类实现的。

根据配置Kafka生成了一组KafkaRequestHandler线程，叫做KafkaRequestHandlerPool:

class KafkaRequestHandlerPool(......) extends Logging with KafkaMetricsGroup {

  ......

  val threads = new Array[Thread](numThreads)

  val runnables = new Array[KafkaRequestHandler](numThreads)

  for(i <- 0 until numThreads) {

    runnables(i) = new KafkaRequestHandler(i, brokerId, aggregateIdleMeter, numThreads, requestChannel, apis)

    threads(i) = Utils.daemonThread("kafka-request-handler-" + i, runnables(i))

    threads(i).start()

  }

  .....

}

KafkaRequestHandler不断的从requestChannel队列里面取出request交给apis处理。

class KafkaRequestHandler(......) extends Runnable with Logging {

   def run() {

    while(true) {

      try {

        var req : RequestChannel.Request = null

        while (req == null) {

          req = requestChannel.receiveRequest(300)

        }

        if(req eq RequestChannel.AllDone) {

          return

        }

        ......

        apis.handle(req)

      } catch {

        ......

      }

    }

  }

}

apis根据不同的请求类型调用不同的方法进行处理。

def handle(request: RequestChannel.Request) {

    try{

      request.requestId match {

        case RequestKeys.ProduceKey => handleProducerRequest(request)

        case RequestKeys.FetchKey => handleFetchRequest(request)

        case RequestKeys.OffsetsKey => handleOffsetRequest(request)

        case RequestKeys.MetadataKey => handleTopicMetadataRequest(request)

        case RequestKeys.LeaderAndIsrKey => handleLeaderAndIsrRequest(request)

        case RequestKeys.StopReplicaKey => handleStopReplicaRequest(request)

        case RequestKeys.UpdateMetadataKey => handleUpdateMetadataRequest(request)

        case RequestKeys.ControlledShutdownKey => handleControlledShutdownRequest(request)

        case RequestKeys.OffsetCommitKey => handleOffsetCommitRequest(request)

        case RequestKeys.OffsetFetchKey => handleOffsetFetchRequest(request)

        case RequestKeys.ConsumerMetadataKey => handleConsumerMetadataRequest(request)

        case RequestKeys.JoinGroupKey => handleJoinGroupRequest(request)

        case RequestKeys.HeartbeatKey => handleHeartbeatRequest(request)

        case requestId => throw new KafkaException("Unknown api code " + requestId)

      }

    } catch {

    } finally

      request.apiLocalCompleteTimeMs = SystemTime.milliseconds

}

显然，此处处理的速度影响Kafka整体的消息处理的速度。

这里我们分析一个处理方法handleProducerRequest。

def handleProducerRequest(request: RequestChannel.Request) {

    val produceRequest = request.body.asInstanceOf[ProduceRequest]

    val numBytesAppended = request.header.sizeOf + produceRequest.sizeOf

    val (authorizedRequestInfo, unauthorizedRequestInfo) = produceRequest.partitionRecords.asScala.partition {

      case (topicPartition, _) => authorize(request.session, Write, new Resource(Topic, topicPartition.topic))

    }

    // the callback for sending a produce response

    def sendResponseCallback(responseStatus: Map[TopicPartition, PartitionResponse]) {

    }

    if (authorizedRequestInfo.isEmpty)

      sendResponseCallback(Map.empty)

    else {

      val internalTopicsAllowed = request.header.clientId == AdminUtils.AdminClientId

      // Convert ByteBuffer to ByteBufferMessageSet

      val authorizedMessagesPerPartition = authorizedRequestInfo.map {

        case (topicPartition, buffer) => (topicPartition, new ByteBufferMessageSet(buffer))

      }

      // call the replica manager to append messages to the replicas

      replicaManager.appendMessages(

        produceRequest.timeout.toLong,

        produceRequest.acks,

        internalTopicsAllowed,

        authorizedMessagesPerPartition,

        sendResponseCallback)

      // if the request is put into the purgatory, it will have a held reference

      // and hence cannot be garbage collected; hence we clear its data here in

      // order to let GC re-claim its memory since it is already appended to log

      produceRequest.clearPartitionRecords()

    }

  }

这里会调用replicaManager.appendMessages处理Kafka message的保存和备份,也就是leader和备份节点上。

3.Replication subsystem

我们进入replicaManager.appendMessages的代码。

这个方法会将消息放到leader分区上，并复制到备份分区上。在超时或者根据required acks的值及时返回response。

def appendMessages(......) {

    if (isValidRequiredAcks(requiredAcks)) {

	  val localProduceResults = appendToLocalLog(internalTopicsAllowed, messagesPerPartition, requiredAcks)

      val produceStatus = localProduceResults.map { case (topicAndPartition, result) =>

        topicAndPartition ->

                ProducePartitionStatus(

                  result.info.lastOffset + 1, // required offset

                  ProducerResponseStatus(result.errorCode, result.info.firstOffset)) // response status

      }

      if (delayedRequestRequired(requiredAcks, messagesPerPartition, localProduceResults)) {

        // create delayed produce operation

        val produceMetadata = ProduceMetadata(requiredAcks, produceStatus)

        val delayedProduce = new DelayedProduce(timeout, produceMetadata, this, responseCallback)

        // create a list of (topic, partition) pairs to use as keys for this delayed produce operation

        val producerRequestKeys = messagesPerPartition.keys.map(new TopicPartitionOperationKey(_)).toSeq

        // try to complete the request immediately, otherwise put it into the purgatory

        // this is because while the delayed produce operation is being created, new

        // requests may arrive and hence make this operation completable.

        delayedProducePurgatory.tryCompleteElseWatch(delayedProduce, producerRequestKeys)

      } else {

        // we can respond immediately

        val produceResponseStatus = produceStatus.mapValues(status => status.responseStatus)

        responseCallback(produceResponseStatus)

      }

    } else {

      // If required.acks is outside accepted range, something is wrong with the client

      // Just return an error and don't handle the request at all

      val responseStatus = messagesPerPartition.map {

        case (topicAndPartition, messageSet) =>

          (topicAndPartition ->

                  ProducerResponseStatus(Errors.INVALID_REQUIRED_ACKS.code,

                    LogAppendInfo.UnknownLogAppendInfo.firstOffset))

      }

      responseCallback(responseStatus)

    }

  }

注意复制是ReplicaFetcherManager通过ReplicaFetcherThread线程完成的。

To publish a message to a partition, the client first finds the leader of the partition from Zookeeper and sends the message to the leader. The leader writes the message to its local log. Each follower constantly pulls new messages from the leader using a single socket channel. That way, the follower receives all messages in the same order as written in the leader. The follower writes each received message to its own log and sends an acknowledgment back to the leader. Once the leader receives the acknowledgment from all replicas in ISR, the message is committed. The leader advances the HW and sends an acknowledgment to the client. For better performance, each follower sends an acknowledgment after the message is written to memory. So, for each committed message, we guarantee that the message is stored in multiple replicas in memory. However, there is no guarantee that any replica has persisted the commit message to disks though. Given that correlated failures are relatively rare, this approach gives us a good balance between response time and durability. In the future, we may consider adding options that provide even stronger guarantees. The leader also periodically broadcasts the HW to all followers. The broadcasting can be piggybacked on the return value of the fetch requests from the followers. From time to time, each replica checkpoints its HW to its disk.

4. Log subsystem

LogManager负责管理Kafka的Log(Kafka消息)，包括log/Log文件夹的创建，获取和清理。它也会通过定时器检查内存中的log是否要缓存到磁盘中。

重要的类包括LogManager(@threadsafe)和Log。

5.offsetManager

负责管理offset，提供offset的读写。（kafka.client.ClientUtils#channelToOffsetManager）

6.DynamicConfigManager

它负责动态改变Topic\Client的配置属性。（已经改成了kafka.server.DynamicConfigManager,）

如果某个topic的配置属性改变了，Kafka会在ZooKeeper上创建一个类似/kafka10/config/changes/config_change_13321的节点， DynamicConfigManager会监控这些节点，获得属性改变的topics并处理.

7.其它类

还有一些其它的重要的类，包括KafkaController, KafkaScheduler,ConsumerCoordinator,KafkaHealthcheck等。

将这四个类都做一遍解析。

二、Metrics

kafka/metrics，Kafka使用metrics进行性能的度量。原先是yammer metrics,现在独立成dropwizard metrics.目前这个框架的package名字比较乱，但是性能监控的功能却是非常的强大。

三、Producer

0.8版本（可能是线程安全）

0.8版本的kafka.producer.Producer定义了两种类型的Producer:

sync
async
基本上都是通过 eventHandler.handle(messages)处理消息, 只不过async会通过一个线程，以LinkedBlockingQueue为缓冲发送消息。

0.10 版本（线程安全）

The producer is thread safe and sharing a single producer instance across threads will generally be faster than having multiple instances.

send方法是异步的，batch.size是缓存partition的大小，每个partition对应一个batch size。
linger.ms是Producer用来等待批量数据的到来的时间，在这个时间内，它期待有新的数据到来。我的理解是这是一个超时的时间用来控制发送条件的。（1. 消息达到batch 2.时间超时）
buffer.memory 代表Producer缓存的所有partition的内存大小。这个buffer是Producer端的，如果填满，将阻塞这么长的时间max.block.ms。超过这个时间，如果buffer仍然是full的，将抛出TimeoutExeception

证实：0.10版本Producer send只有异步的，没有同步的方法。就算在send之后里面调用get()方法，也就是模拟阻塞，并不是同步。（区分好异步阻塞、异步非阻塞）

四、Consumer

0.8版本（可能，线程安全）

kafka.consumer.SimpleConsumer提供了Simple Consumer API.它通过一个BlockingChannel发送消息，接收Response完成任务。

kafka.javaapi.consumer.SimpleConsumer则提供了java接口。

High level consumer实际由ZookeeperConsumerConnector完成，它将consumer信息记录在zookeeper中，提供KafkaStream获取Kafka消息。

OLS中对Consumer的使用是一个Consumer示例，创建了多线程去读取Consumer的List<KafkaStream<byte[], byte[]>的。

0.10版本（线程不安全）

Consumer 通过方法poll 拉取数据。提交Consumer Offset的方法如下：（后面两种是手动管理的Offset）

自动提交
阻塞提交：commitSync
非阻塞提交：commitAsync

官方的代码还没看文档。下次看看 2016-11-08

Kafka 0.8 sever：源代码High level分析的更多相关文章

Kafka 0.10 SocketServer源代码分析
1概要设计 Kafka SocketServer是基于Java NIO来开发的,采用了Reactor的模式,其中包含了1个Acceptor负责接受客户端请求,N个Processor负责读写数据,M个H ...
kafka 0.8.1 新producer 源码简单分析
1 背景最近由于项目需要,需要使用kafka的producer.但是对于c++,kafka官方并没有很好的支持. 在kafka官网上可以找到0.8.x的客户端.可以使用的客户端有C版本客户端,此客户 ...
Kafka 0.10 KafkaConsumer流程简述
ConsumerConfig.scala 储存Consumer的配置按照我的理解,0.10的Kafka没有专门的SimpleConsumer,仍然是沿用0.8版本的. 1.从poll开始消费的规则 ...
kafka C客户端librdkafka producer源码分析
from:http://www.cnblogs.com/xhcqwl/p/3905412.html kafka C客户端librdkafka producer源码分析简介 kafka网站上提供了C语 ...
es6-promise源代码重点难点分析
摘要 vue和axios都可以使用es6-promise来实现f1().then(f2).then(f3)这样的连写形式,es6-promise其实现代浏览器已经支持,无需加载外部文件.由于promi ...
AXIOS源代码重点难点分析
摘要 vue使用axios进行http通讯,类似jquery/ajax的作用,类似angular http的作用,axios功能强大,使用方便,是一个优秀的http软件,本文旨在分享axios源代码重 ...
Kafka 0.11.0.0 实现 producer的Exactly-once 语义（官方DEMO）
<dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka-clients&l ...
Kafka 0.11.0.0 实现 producer的Exactly-once 语义（中文）
很高兴地告诉大家,具备新的里程碑意义的功能的Kafka 0.11.x版本(对应 Confluent Platform 3.3)已经release,该版本引入了exactly-once语义,本文阐述的内 ...
Kafka 0.11.0.0 实现 producer的Exactly-once 语义（英文）
Exactly-once Semantics are Possible: Here’s How Kafka Does it I’m thrilled that we have hit an excit ...

随机推荐

Neo4j 第四篇：使用C#更新和查询Neo4j
本文使用的IDE是Visual Studio 2015 ,驱动程序是Neo4j官方的最新版本:Neo4j Driver 1.3.0 ,创建的类库工程(Project)要求安装 .NET Framewo ...
在Windows商店应用中使用浅色主题
在开发商店应用时会遇到这样的情况,设计师给我们的设计是浅色背景/深色文本,而商店应用默认是深色背景/浅色文本.那我们需要在每个页面去显式声明背景色和前景色吗,这显然是不理想的.这时就需要设置应用的主题 ...
Asp.Net_Get跟Post
1. Get(即使用QueryString显式传递) 方式:在url后面跟参数. 特点:简单.方便. 缺点:字符串长度最长为255个字符:数据泄漏在url中. 适用数据 ...
Zookeeper 通知更新可靠吗？解读源码找答案！
欢迎大家前往腾讯云+社区,获取更多腾讯海量技术实践干货哦~ 本文由特鲁门发表于云+社区专栏导读: 遇到Keepper通知更新无法收到的问题,思考节点变更通知的可靠性,通过阅读源码解析了解到zk Wa ...
PAT甲题题解-1003. Emergency (25)-最短路径+路径数目
给出n个城市,m条边,起始点c1和目的点c2接下来给出n个城市的队伍数以及m条双向边问你求c1到c2的所有最短路径数目,以及其中经过的最多队伍数先最短路dijkstra,同时建立vector数组pr ...
注解Annotation实现原理与自定义注解例子
什么是注解? 对于很多初次接触的开发者来说应该都有这个疑问?Annontation是Java5开始引入的新特征,中文名称叫注解.它提供了一种安全的类似注释的机制,用来将任何的信息或元数据(metada ...
mysql左外连接
左外连接的概念性不说了,这次就说一说两个表之间的查询步骤是怎么样的? 例如 SELECT ut.id,ut.name,ut.age, ut.sex,ut.status,st.score,st.subj ...
第二个spring冲刺第7天
今天因为停电,所以没什么进展,延迟一天工作,今天当作休息
我们的团队-IT梦想队
IT梦想队队长:李遇塘队员:王长.周兴荣.朱岭杰.马婧婧团队宣言: 一匹狼战斗力低,但一群狼的我们无所畏惧!李遇塘http://www.cnblogs.com/Liyutang/ 王长htt ...
Docker(八)-Docker创建Nginx容器
获取Nginx镜像最简单的方法就是通过 docker pull nginx 命令来创建 Nginx容器. $ sudo docker pull nginx 或者: $ sudo docker pul ...

Kafka 0.8 sever：源代码High level分析