When we are talking about performance of Kafka Producer, we are really talking about two different things:

  • latency: how much time passes from the time KafkaProducer.send() was called until the message shows up in a Kafka broker.
  • throughput: how many messages can the producer send to Kafka each second.

Many years ago, I was in a storage class taught by scalability expert James Morle. One of the students asked why we need to worry about both latency and throughput – after all, if processing a message takes 10ms (latency), then clearly throughput is limited to 100 messages per second. When looking at things this way, it may look like higher latency == higher throughput. However, the relation between latency and throughput is not this trivial.

Lets start our discussion with agreeing that we are only talking about the new Kafka Producer (the one in org.apache.kafka.clients package). It makes things simpler and there’s no reason to use the old producer at this point.

Kafka Producer allows to send message batches. Suppose that due to network roundtrip times, it takes 2ms to send a single Kafka message. By sending one message at a time, we have latency of 2ms and throughput of 500 messages per second. But suppose that we are in no big hurry, and are willing to wait few milliseconds and send a larger batch – lets say we decided to wait 8ms and managed to accumulate 1000 messages. Our latency is now 10ms, but our throughput is up to 100,000 messages per second! Thats the main reason I love microbatches so much. By adding a tiny delay, and 10ms is usually acceptable even for financial applications, our throughput is 200 times greater. This type of trade-off is not unique to Kafka, btw. Network and storage subsystem use this kind of “micro batching”  all the time.

Sometimes latency and throughput interact in even funnier ways. One day Ted Malaskacomplained that with Flafka, he can get 20ms latency when sending 100,000 messages per second, but huge 1-3s latency when sending just 100 messages a second. This made no sense at all, until we remembered that to save CPU, if Flafka doesn’t find messages to read from Kafka it will back off and retry later. Backoff times started at 0.5s and steadily increased. Ted kindly improved Flume to avoid this issue in FLUME-2729.

Anyway, back to the Kafka Producer. There are few settings you can modify to improve latency or throughput in Kafka Producer:

  • batch.size – This is an upper limit of how many messages Kafka Producer will attempt to batch before sending – specified in bytes (Default is 16K bytes – so 16 messages if each message is 1K in size). Kafka may send batches before this limit is reached (so latency doesn’t change by modifying this parameter), but will always send when this limit is reached. Therefore setting this limit too low will hurt throughput without improving latency. The main reason to set this low is lack of memory – Kafka will always allocate enough memory for the entire batch size, even if latency requirements cause it to send half-empty batches.
  • linger.ms – How long will the producer wait before sending in order to allow more messages to get accumulated in the same batch. Normally the producer will not wait at all, and simply send all the messages that accumulated while the previous send was in progress (2 ms in the example above), but as we’ve discussed, sometimes we are willing to wait a bit longer in order to improve the overall throughput at the expense of a little higher latency. In this case tuning linger.ms to a higher value will make sense. Note that if batch.size is low and the batch if full before linger.ms time passes, the batch will send early, so it makes sense to tune batch.size and linger.ms together.

Other than tuning these parameters, you will  want to avoid waiting on the future of the send method (i.e. the result from Kafka brokers), and instead send data continuously to Kafka. You can simply ignore the result (if success of sending messages is not critical), but its probably better to use a callback. You can find an example of how to do this in my github (look at produceAsync method).

If sending is still slow and you are trying to understand what is going on, you will want to check if the send thread is fully utilized through jvisualsm (it is called kafka-producer-network-thread) or keep an eye on average batch size metric. If you find that you can’t fill the buffer fast enough and the sender is idle, you can try adding application threads that share the same producer and increase throughput this way.

Another concern can be that the Producer will send all the batches that go to the same broker together when at least one of them is full – if you have one very busy topic and others that are less busy, you may see some skew in throughput this way.

Sometimes you will notice that the producer performance doesn’t scale as you add more partitions to a topic. This can happen because, as we mentioned, there is a send buffer for each partition. When you add more partitions, you have more send buffers, so perhaps the configuration you set to keep the buffers full before (# of threads, linger.ms) is no longer sufficient and buffers are sent half-empty (check the batch sizes). In this case you will need to add threads or increase linger.ms to improve utilization and scale your throughput.

Got more tips on ingesting data into Kafka? comments are welcome!

Kafka生产者性能优化之吞吐量VS延迟的更多相关文章

  1. 一文告诉你,Kafka在性能优化方面做了哪些举措!

    很多粉丝私信问我Kafka在性能优化方面做了哪些举措,对于相关问题的答案其实我早就写过了,就是没有系统的整理一篇,最近思考着花点时间来整理一下,下次再有粉丝问我相关的问题我就可以潇洒的甩个链接了.这个 ...

  2. 1002-谈谈ELK日志分析平台的性能优化理念

    在生产环境中,我们为了更好的服务于业务,通常会通过优化的手段来实现服务对外的性能最大化,节省系统性能开支:关注我的朋友们都知道,前段时间一直在搞ELK,同时也记录在了个人的博客篇章中,从部署到各个服务 ...

  3. 5种kafka消费端性能优化方法

    摘要:带你了解基于FusionInsight HD&MRS的5种kafka消费端性能优化方法. 本文分享自华为云社区<FusionInsight HD&MRSkafka消费端性能 ...

  4. 漫游Kafka设计篇之性能优化

    Kafka在提高效率方面做了很大努力.Kafka的一个主要使用场景是处理网站活动日志,吞吐量是非常大的,每个页面都会产生好多次写操作.读方面,假设每个消息只被消费一次,读的量的也是很大的,Kafka也 ...

  5. 漫游Kafka设计篇之性能优化(7)

    Kafka在提高效率方面做了很大努力.Kafka的一个主要使用场景是处理网站活动日志,吞吐量是非常大的,每个页面都会产生好多次写操作.读方面,假设每个消息只被消费一次,读的量的也是很大的,Kafka也 ...

  6. apache kafka系列之性能优化架构分析

    apache kafka中国社区QQ群:162272557 Apache kafka性能优化架构分析 应用程序优化:数据压缩 watermark/2/text/aHR0cDovL2Jsb2cuY3Nk ...

  7. kafka的并行度与JStorm性能优化

    kafka的并行度与JStorm性能优化 > Consumers Messaging traditionally has two models: queuing and publish-subs ...

  8. Kafka权威指南 读书笔记之(三)Kafka 生产者一一向 Kafka 写入数据

    不管是把 Kafka 作为消息队列.消息总线还是数据存储平台来使用 ,总是需要有一个可以往 Kafka 写入数据的生产者和一个从 Kafka 读取数据的消费者,或者一个兼具两种角色的应用程序. 开发者 ...

  9. kafka 基础知识梳理-kafka是一种高吞吐量的分布式发布订阅消息系统

    一.kafka 简介 今社会各种应用系统诸如商业.社交.搜索.浏览等像信息工厂一样不断的生产出各种信息,在大数据时代,我们面临如下几个挑战: 如何收集这些巨大的信息 如何分析它 如何及时做到如上两点 ...

随机推荐

  1. 使用Gerrit发送测试邮件

    使用Gerrit发送测试邮件 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任. 一.安装HTTP服务 1>.安装HTTP服务 [root@gerrit.yinzhengjie.o ...

  2. poj1734 Sightseeing trip(Floyd求无向图最小环)

    #include <iostream> #include <cstring> #include <cstdio> #include <algorithm> ...

  3. 小程序之程序构造器App()

    onLaunch / onShow / onHide 三个回调是App实例的生命周期函数 “小程序”指的是产品层面的程序,而“程序”指的是代码层面的程序实例,为了避免误解,下文采用App来代替代码层面 ...

  4. linux不同版本jdk,用脚本进行切换

    服务器中已经部署了一个项目,现在又要部署另一个项目在服务器上.以前的项目是jdk7,新的项目是jdk8,所以启动前就要配置对应的jdk环境变量.所以写了一个shell脚本进行执行切换. 先下载两个jd ...

  5. PageHelper分页正确用法

    依赖和配置就不说了,说用法 Page<Object> page = PageHelper.startPage(pageNum, pageSize); List<SysRoleDTO& ...

  6. LeetCode 1079. Letter Tile Possibilities

    原题链接在这里:https://leetcode.com/problems/letter-tile-possibilities/ 题目: You have a set of tiles, where ...

  7. JavaScript基础06——字符串

    字符串的创建: 字符串的创建: var str = "hello world"; //常量,基本类型创建 var str2 = new String("hello wor ...

  8. Window IDEA开发工具 杀死指定端口 cmd 命令行 taskkill

    Windows平台   两步方法 :  1 查询端口占用,2 强行杀死进程 netstat -aon|findstr "8080" taskkill /pid 4136-t -f ...

  9. 14-ESP8266 SDK开发基础入门篇--上位机串口控制 Wi-Fi输出PWM的占空比,调节LED亮度,8266程序编写

    https://www.cnblogs.com/yangfengwu/p/11102026.html 首先规定下协议  ,CRC16就不加了哈,最后我会附上CRC16的计算程序,大家有兴趣自己加上 上 ...

  10. PDO和MySQLi区别和数度;到底用哪个?

    当用PHP访问数据库时,除了PHP自带的数据库驱动,我们一般还有两种比较好的选择:PDO和MySQLi.在实际开发过程中要决定选择哪一种首先要对二者有一个比较全面的了解.本文就针对他们的不同点进行分析 ...