key为null时Kafka会将消息发送给哪个分区?

当你编写kafka Producer时，会生成KeyedMessage对象。

1	KeyedMessage<K, V> keyedMessage = new KeyedMessage<>(topicName, key, message)

这里的key值可以为空，在这种情况下， kafka会将这个消息发送到哪个分区上呢？依据Kafka官方的文档，默认的分区类会随机挑选一个分区：

The third property "partitioner.class" defines what class to use to determine which Partition in the Topic the message is to be sent to. This is optional, but for any non-trivial implementation you are going to want to implement a partitioning scheme. More about the implementation of this class later. If you include a value for the key but haven't defined a partitioner.class Kafka will use the default partitioner. If the key is null, then the Producer will assign the message to a random Partition.

但是这句话相当的误导人。

从字面上来讲，这句话没有问题，但是这里的随机是指在参数"topic.metadata.refresh.ms"刷新后随机选择一个，这个时间段内总是使用唯一的分区。默认情况下每十分钟才可能重新选择一个新的分区。但是相信大部分的程序员和我一样，都理解成每个消息都会随机选择一个分区。
可以查看相关的代码：

key为null时Kafka会将消息发送给哪个分区?

当你编写kafka Producer时，会生成KeyedMessage对象。

1	KeyedMessage<K, V> keyedMessage = new KeyedMessage<>(topicName, key, message)

这里的key值可以为空，在这种情况下， kafka会将这个消息发送到哪个分区上呢？依据Kafka官方的文档，默认的分区类会随机挑选一个分区：

The third property "partitioner.class" defines what class to use to determine which Partition in the Topic the message is to be sent to. This is optional, but for any non-trivial implementation you are going to want to implement a partitioning scheme. More about the implementation of this class later. If you include a value for the key but haven't defined a partitioner.class Kafka will use the default partitioner. If the key is null, then the Producer will assign the message to a random Partition.

但是这句话相当的误导人。

private def getPartition(topic: String, key: Any, topicPartitionList: Seq[PartitionAndLeader]): Int = {

val numPartitions = topicPartitionList.size

if(numPartitions <= 0)

throw new UnknownTopicOrPartitionException("Topic " + topic + " doesn't exist")

val partition =

if(key == null) {

// If the key is null, we don't really need a partitioner

// So we look up in the send partition cache for the topic to decide the target partition

val id = sendPartitionPerTopicCache.get(topic)

id match {

case Some(partitionId) =>

// directly return the partitionId without checking availability of the leader,

// since we want to postpone the failure until the send operation anyways

partitionId

case None =>

val availablePartitions = topicPartitionList.filter(_.leaderBrokerIdOpt.isDefined)

if (availablePartitions.isEmpty)

throw new LeaderNotAvailableException("No leader for any partition in topic " + topic)

val index = Utils.abs(Random.nextInt) % availablePartitions.size

val partitionId = availablePartitions(index).partitionId

sendPartitionPerTopicCache.put(topic, partitionId)

partitionId

}

} else

partitioner.partition(key, numPartitions)

if(partition < 0 || partition >= numPartitions)

throw new UnknownTopicOrPartitionException("Invalid partition id: " + partition + " for topic " + topic +

"; Valid values are in the inclusive range of [0, " + (numPartitions-1) + "]")

trace("Assigning message of topic %s and key %s to a selected partition %d".format(topic, if (key == null) "[none]" else key.toString, partition))

partition

}

如果key为null, 它会从sendPartitionPerTopicCache查选缓存的分区，如果没有，随机选择一个分区，否则就用缓存的分区。

LinkedIn工程师Guozhang Wang在邮件列表中解释了这一问题，
最初kafka是按照大部分用户理解的那样每次都随机选择一个分区，后来改成了定期选择一个分区，这是为了减少服务器段socket的数量。不过这的确很误导用户，据称0.8.2版本后又改回了每次随机选取。但是我查看0.8.2的代码还没看到改动。

所以，如果有可能，还是为KeyedMessage设置一个key值吧。

LinkedIn工程师Guozhang Wang在邮件列表中解释了这一问题，如果key为null, 它会从sendPartitionPerTopicCache查选缓存的分区，如果没有，随机选择一个分区，否则就用缓存的分区。

最初kafka是按照大部分用户理解的那样每次都随机选择一个分区，后来改成了定期选择一个分区，这是为了减少服务器段socket的数量。不过这的确很误导用户，据称0.8.2版本后又改回了每次随机选取。但是我查看0.8.2的代码还没看到改动。

所以，如果有可能，还是为KeyedMessage设置一个key值吧。

from：http://colobu.com/2015/01/22/which-kafka-partition-will-keyedMessages-be-sent-to-if-key-is-null/

kafka负载均衡相关资料收集（一）的更多相关文章

kafka负载均衡相关资料收集（三）
apache kafka系列之Producer处理逻辑下文是转载的,原文链接地址:点这儿 [转] Kafka ProducerKafka Producer处理逻辑kafka生产者处理逻辑apache ...
kafka负载均衡相关资料收集（二）
[转]关于kafka producer 分区策略的思考 from:http://blog.csdn.net/ouyang111222/article/details/51086037 今天跑了一个简单 ...
AssetBundle机制相关资料收集
原地址:http://www.cnblogs.com/realtimepixels/p/3652075.html AssetBundle机制相关资料收集最近网友通过网站搜索Unity3D在手机及其他 ...
FastAdmin 导出 Excel 相关资料收集（2018-08-14）
FastAdmin 导出 Excel 相关资料收集导出 Excel 文件时身份证号变成科学计数法怎么办? https://forum.fastadmin.net/thread/1346 姊妹篇 Fa ...
FastAdmin 导入 Excel 相关资料收集（2018-08-14）
FastAdmin 导入 Excel 相关资料收集新版本一键CRUD后自带导入功能,但是默认被禁用,如何启动 https://forum.fastadmin.net/thread/540 Excel ...
Kafka 负载均衡在 vivo 的落地实践
vivo 互联网服务器团队-You Shuo 副本迁移是Kafka最高频的操作,对于一个拥有几十万个副本的集群,通过人工去完成副本迁移是一件很困难的事情.Cruise Control作为Kafka的 ...
iOS10以及xCode8相关资料收集
兼容iOS 10 资料整理笔记源文:http://www.jianshu.com/p/0cc7aad638d9 1.Notification(通知) 自从Notification被引入之后,苹果就不 ...
nginx 负载均衡相关知识
Nginx ("engine x") 是一个高性能的 HTTP 和反向代理服务器,也是一个 IMAP/POP3/SMTP 代理服务器. Nginx 是由 Igor Sysoev ...
F5 负载均衡相关资源
F5负载均衡之检查命令的说明http://net.zdnet.com.cn/network_security_zone/2010/0505/1730942.shtml F5培训http://wenku ...

随机推荐

Struts2之server端验证
声明:在我的教程中有些东西,没有提及到.不是我不知道,而是在我个人来看对你们不是太重要的知识点.你们在看课本时有了解到即可.我不会面面俱到的都给你们提及.我写博文的目的是把我这一年的开发经验通过学习s ...
form表单的reset
$(':input','#myform') .not(':button, :submit, :reset, :hidden') .val('') .removeAttr('checked') .rem ...
机器学习之深入理解SVM
在浏览本篇博客之前,最好先查看一下我写的还有一篇文章机器学习之初识SVM(点击可查阅哦).这样能够更好地为了结以下内容做铺垫! 支持向量机学习方法包括构建由简至繁的模型:线性可分支持向量机.线性支持向 ...
Mybatis源码分析之Mapper执行SQL过程（三）
上两篇已经讲解了SqlSessionFactory的创建和SqlSession创建过程.今天我们来分析myabtis的sql是如何一步一步走到Excutor. 还是之前的demo public ...
js - 关于循环
ES5:for,foreach,for..in ES6:for..of // ES5(可参见ES6的析构写法) const value = ["a", "b", ...
Windows服务安装与控制
Windows服务安装与控制 1.建立服务 (1)定义一个ServiceInstaller using System; using System.Collections.Generic; using ...
iOS开发，更改状态栏（StatusBar）文字颜色为白色
详细实现步骤 1.如图在Info.plist中进行设置,主要用于处理启动画面中状态栏(StatusBar)文字颜色. watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5u ...
Spark的运行模式(2)--Yarn-Cluster和Yarn-Client
3. Yarn-Cluster Yarn是一种统一资源管理机制,可以在上面运行多种计算框架.Spark on Yarn模式分为两种:Yarn-Cluster和Yarn-Client,前者Driver运 ...
用Gearman分发PHP应用程序的工作负载
文章来源:PHP开发学习门户地址:http://www.phpthinking.com/archives/518 虽然一个 Web 应用程序的大部分内容都与表示有关,但它的价值与竞争优势却可能体如今 ...
开源 JSON 库解析性能对比( Jackson / Json.simple / Gson )
Json 已成为当前服务器与 web 应用之间数据传输的公认标准. 微服务及分布式架构经常会使用 Json 来传输此类文件,因为这已经是 webAPI 的事实标准. 不过正如许多我们习以为常的事情一样 ...

kafka负载均衡相关资料收集（一）

key为null时Kafka会将消息发送给哪个分区?

key为null时Kafka会将消息发送给哪个分区?

kafka负载均衡相关资料收集（一）的更多相关文章

随机推荐

热门专题