上次留下来的问题

如果消息是发给很多不同的topic的， async producer如何在按batch发送的同时区分topic的
它是如何用key来做partition的？
是如何实现对消息成批量的压缩的？

async producer如何在按batch发送的同时区分topic的

　　这个问题的答案是： DefaultEventHandler会把发给它的一个batch的消息（实际上是Seq[KeyedMessage[K,V]]类型）拆开，确定每条消息该发送给哪个broker。对发给每个broker的消息，会按topic和partition来组合。即：拆包=>根据metaData组装

这个功能是通过partitionAndCollate方法实现的

def partitionAndCollate(messages: Seq[KeyedMessage[K,Message]]): Option[Map[Int, collection.mutable.Map[TopicAndPartition, Seq[KeyedMessage[K,Message]]]]]

　　它返回一个Option对象，这个Option的元素是一个Map，Key是brokerId，value是发给这个broker的消息。对每一条消息，先确定它要被发给哪一个topic的哪个parition。然后确定这个parition的leader broker，然后去Map[Int, collection.mutable.Map[TopicAndPartition, Seq[KeyedMessage[K,Message]]]]这个Map里找到对应的broker,然后把这条消息填充给对应的topic+partition对应的Seq[KeyedMessage[K,Message]]。这样就得到了最后的结果。这个结果表示了哪些消息要以怎样的结构发给一个broker。真正发送的时候，会按照brokerId的不同，把打包好的消息发给不同的broker。

首先，看一下kafka protocol里对于Producer Request结构的说明：

ProduceRequest => RequiredAcks Timeout [TopicName [Partition MessageSetSize MessageSet]]

  RequiredAcks => int16

  Timeout => int32

  Partition => int32

  MessageSetSize => int32

发给一个broker的消息就是这样的结构。

同时，在kafka wiki里对于Produce API 有如下说明：

The produce API is used to send message sets to the server. For efficiency it allows sending message sets intended for many topic partitions in a single request.

即在一个produce request里，可以同时发消息给多个topic+partition的组合。当然一个produce request是发给一个broker的。

使用

send(brokerid, messageSetPerBroker)

　　把消息set发给对应的brokerid。

它是如何用key来做partition的？

首先看下KeyedMessage类的定义：

case class KeyedMessage[K, V](val topic: String, val key: K, val partKey: Any, val message: V) {

  if(topic == null)

    throw new IllegalArgumentException("Topic cannot be null.")

  def this(topic: String, message: V) = this(topic, null.asInstanceOf[K], null, message)

  def this(topic: String, key: K, message: V) = this(topic, key, key, message)

  def partitionKey = {

    if(partKey != null)

      partKey

    else if(hasKey)

      key

    else

      null

  }

  def hasKey = key != null

}

　　当使用三个参数的构造函数时， partKey会等于key。partKey是用来做partition的，但它不会最当成消息的一部分被存储。

前边提到了，在确定一个消息应该发给哪个broker之前，要先确定它发给哪个partition,这样才能根据paritionId去找到对应的leader所在的broker。

val topicPartitionsList = getPartitionListForTopic(message) //获取这个消息发送给的topic的partition信息
val partitionIndex = getPartition(message.topic, message.partitionKey, topicPartitionsList)//确定这个消息发给哪个partition

　　注意传给getPartition方法中时使用的是partKey。getPartition方法为：

  private def getPartition(topic: String, key: Any, topicPartitionList: Seq[PartitionAndLeader]): Int = {

    val numPartitions = topicPartitionList.size

    if(numPartitions <= 0)

      throw new UnknownTopicOrPartitionException("Topic " + topic + " doesn't exist")

    val partition =

      if(key == null) {

        // If the key is null, we don't really need a partitioner

        // So we look up in the send partition cache for the topic to decide the target partition

        val id = sendPartitionPerTopicCache.get(topic)

        id match {

          case Some(partitionId) =>

            // directly return the partitionId without checking availability of the leader,

            // since we want to postpone the failure until the send operation anyways

            partitionId

          case None =>

            val availablePartitions = topicPartitionList.filter(_.leaderBrokerIdOpt.isDefined)

            if (availablePartitions.isEmpty)

              throw new LeaderNotAvailableException("No leader for any partition in topic " + topic)

            val index = Utils.abs(Random.nextInt) % availablePartitions.size

            val partitionId = availablePartitions(index).partitionId

            sendPartitionPerTopicCache.put(topic, partitionId)

            partitionId

        }

      } else

        partitioner.partition(key, numPartitions)

　　当partKey为null时，首先它从sendParitionPerTopicCache里取这个topic缓存的partitionId，这个cache是一个Map.如果之前己经使用sendPartitionPerTopicCache.put(topic, partitionId)缓存了一个，就直接取出它。否则就随机从可用的partitionId里取出一个，把它缓存到sendParitionPerTopicCache。这就使得当sendParitionPerTopicCache里有一个可用的partitionId时，很多消息都会被发送给这同一个partition。因此若所有消息的partKey都为空，在一段时间内只会有一个partition能收到消息。之所以会说“一段”时间，而不是永久，是因为handler隔一段时间会重新获取它发送过的消息对应的topic的metadata，这个参数通过topic.metadata.refresh.interval.ms来设置。当它重新获取metadata之后，会消空一些缓存，就包括这个sendParitionPerTopicCache。因此，接下来就会生成另一个随机的被缓存的partitionId。

  if (topicMetadataRefreshInterval >= 0 &&

          SystemTime.milliseconds - lastTopicMetadataRefreshTime > topicMetadataRefreshInterval) {  //若该refresh topic metadata 了，do the refresh

        Utils.swallowError(brokerPartitionInfo.updateInfo(topicMetadataToRefresh.toSet, correlationId.getAndIncrement))

        sendPartitionPerTopicCache.clear()

        topicMetadataToRefresh.clear

        lastTopicMetadataRefreshTime = SystemTime.milliseconds

      }

　　当partKey不为null时，就用传给handler的partitioner的partition方法，根据partKey和numPartitions来确定这个消息被发给哪个partition。注意这里的numPartition是topicPartitionList.size获取的，有可能会有parition不存在可用的leader。这样的问题将留给send时解决。实际上发生这种情况时，partitionAndCollate会将这个消息分派给brokerId为-1的broker。而send方法会在发送前判断brokerId

    if(brokerId < 0) {

      warn("Failed to send data since partitions %s don't have a leader".format(messagesPerTopic.map(_._1).mkString(",")))

      messagesPerTopic.keys.toSeq

　　当brokerId<0时，就返回一个非空的Seq，包括了所有没有leader的topic+partition的组合，如果重试了指定次数还不能发送，将最终导致handle方法抛出一个 FailedToSendMessageException异常。

是如何实现对消息成批量的压缩的？

这个是在

private def groupMessagesToSet(messagesPerTopicAndPartition: collection.mutable.Map[TopicAndPartition, Seq[KeyedMessage[K,Message]]])

中处理。

说明为：

/** enforce the compressed.topics config here.
* If the compression codec is anything other than NoCompressionCodec,
* Enable compression only for specified topics if any
* If the list of compressed topics is empty, then enable the specified compression codec for all topics
* If the compression codec is NoCompressionCodec, compression is disabled for all topics
*/

即，如果没有设置压缩，就所有topic对应的消息集都不压缩。如果设置了压缩，并且没有设置对个别topic启用压缩，就对所有topic都使用压缩；否则就只对设置了压缩的topic压缩。

在这个gruopMessageToSet中，并不有具体的压缩逻辑。而是返回一个ByteBufferMessageSet对象。它的注释为：

/**
* A sequence of messages stored in a byte buffer
*
* There are two ways to create a ByteBufferMessageSet
*
* Option 1: From a ByteBuffer which already contains the serialized message set. Consumers will use this method.
*
* Option 2: Give it a list of messages along with instructions relating to serialization format. Producers will use this method.

看来它是对于消息集进行序列化和反序列化的工具。

在它的实现里用到了CompressionFactory对象。从它的实现里可以看到Kafka只支持GZIP和Snappy两种压缩方式。

compressionCodec match {

      case DefaultCompressionCodec => new GZIPOutputStream(stream)

      case GZIPCompressionCodec => new GZIPOutputStream(stream)

      case SnappyCompressionCodec =>

        import org.xerial.snappy.SnappyOutputStream

        new SnappyOutputStream(stream)

      case _ =>

        throw new kafka.common.UnknownCodecException("Unknown Codec: " + compressionCodec)

Kafka 之 async producer (2) kafka.producer.async.DefaultEventHandler的更多相关文章

Kafka深度解析（如何在producer中指定partition）（转）
原文链接:Kafka深度解析背景介绍 Kafka简介 Kafka是一种分布式的,基于发布/订阅的消息系统.主要设计目标如下: 以时间复杂度为O(1)的方式提供消息持久化能力,即使对TB级以上数据也能 ...
kafka 0.8.2 消息生产者 producer
package com.hashleaf.kafka; import java.util.Properties; import kafka.javaapi.producer.Producer; imp ...
Kafka 0.11.0.0 实现 producer的Exactly-once 语义（官方DEMO）
<dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka-clients&l ...
Kafka 0.11.0.0 实现 producer的Exactly-once 语义（中文）
很高兴地告诉大家,具备新的里程碑意义的功能的Kafka 0.11.x版本(对应 Confluent Platform 3.3)已经release,该版本引入了exactly-once语义,本文阐述的内 ...
Kafka 0.11.0.0 实现 producer的Exactly-once 语义（英文）
Exactly-once Semantics are Possible: Here’s How Kafka Does it I’m thrilled that we have hit an excit ...
Kafka 详解（三）------Producer生产者
在第一篇博客我们了解到一个kafka系统,通常是生产者Producer 将消息发送到 Broker,然后消费者 Consumer 去 Broker 获取,那么本篇博客我们来介绍什么是生产者Produc ...
Kafka学习（四）-------- Kafka核心之Producer
通过https://www.cnblogs.com/tree1123/p/11243668.html 已经对consumer有了一定的了解.producer比consumer要简单一些. 一.旧版本p ...
Apache Kafka（六）- High Throughput Producer
High Throughput Producer 在有大量消息需要发送的情况下,默认的Kafka Producer配置可能无法达到一个可观的的吞吐.在这种情况下,我们可以考虑调整两个方面,以提高Pro ...
Apache Kafka（五）- Safe Kafka Producer
Kafka Safe Producer 在应用Kafka的场景中,需要考虑到在异常发生时(如网络异常),被发送的消息有可能会出现丢失.乱序.以及重复消息. 对于这些情况,我们可以创建一个“safe p ...
《Apache kafka实战》读书笔记-kafka集群监控工具
<Apache kafka实战>读书笔记-kafka集群监控工具作者:尹正杰版权声明:原创作品,谢绝转载!否则将追究法律责任. 如官网所述,Kafka使用基于yammer metric ...

随机推荐

AIDL实现Android IPC
1.AIDL文本解释在软件工程中,接口定义语言(IDL)已经成为通用术语,是用来描述软件组件接口的特定语言.在Android中,该IDL被称为Android接口定义语言(AIDL),它是纯文本文件, ...
contentProvider-联系人的CURD
1.联系人的查找返回一个ArrayList<HashMap<String, String>>类型 //通过管理联系人的URI获取游标对象 Cursor cursor= ge ...
第三十篇、iOS开发中常用的宏
//字符串是否为空 #define kStringIsEmpty(str) ([str isKindOfClass:[NSNull class]] || str == nil || [str leng ...
Cocos开发中Visual Studio下HttpClient开发环境设置
Cocos2d-x 3.x将与网络通信相关的类集成到libNetwork类库工程中,这其中包括了HttpClient类.我们需要在Visual Studio解决方案中添加libNetwork类库工程. ...
在swift中使用oc 的代码
就是需要一个桥文件, 方法一:在swift项目中,新建一个oc的类,这时候,会弹出一个对话框,你点默认的那个选项就行了.然后在新生成的桥文件中导入你所需要的oc代码的头文件就行了. 方法二:但是有时候 ...
OC12_自动释放池
// // Dog.h // OC12_自动释放池 // // Created by zhangxueming on 15/6/18. // Copyright (c) 2015年 zhangxuem ...
使用CSS修改HTML5 input placeholder颜色( 转载 )
问题:Chrome支持input=[type=text]占位文本属性,但下列CSS样式却不起作用: input[placeholder], [placeholder], *[placeholder] ...
htmlcleaner
String xpath = "//div"; Object[] myNodes = node.evaluateXPath(xpath); for (Object obj : my ...
用AJAX自定义日历
需求分析在一些购物网站中,都会有促销活动,这些活动都在日历上标注出来,如何通过Ajax让日历通过读取数据库中的信息,正确的把促销活动标注在日历上,本文通过自定义日历来实现这个问题. 技术难点日 ...
centos6.5 最小化安装无法上网
在VMware里装了个centos 6.5. 最小化安装后无法上网.在 google里找到答案第一步:执行命令启动网卡 (最小化安装不是自动启动的) [root@localhost]# ifcon ...

Kafka 之 async producer (2) kafka.producer.async.DefaultEventHandler

上次留下来的问题

async producer如何在按batch发送的同时区分topic的

它是如何用key来做partition的？

是如何实现对消息成批量的压缩的？

Kafka 之 async producer (2) kafka.producer.async.DefaultEventHandler的更多相关文章

随机推荐

热门专题