When we are talking about performance of Kafka Producer, we are really talking about two different things:

  • latency: how much time passes from the time KafkaProducer.send() was called until the message shows up in a Kafka broker.
  • throughput: how many messages can the producer send to Kafka each second.

Many years ago, I was in a storage class taught by scalability expert James Morle. One of the students asked why we need to worry about both latency and throughput – after all, if processing a message takes 10ms (latency), then clearly throughput is limited to 100 messages per second. When looking at things this way, it may look like higher latency == higher throughput. However, the relation between latency and throughput is not this trivial.

Lets start our discussion with agreeing that we are only talking about the new Kafka Producer (the one in org.apache.kafka.clients package). It makes things simpler and there’s no reason to use the old producer at this point.

Kafka Producer allows to send message batches. Suppose that due to network roundtrip times, it takes 2ms to send a single Kafka message. By sending one message at a time, we have latency of 2ms and throughput of 500 messages per second. But suppose that we are in no big hurry, and are willing to wait few milliseconds and send a larger batch – lets say we decided to wait 8ms and managed to accumulate 1000 messages. Our latency is now 10ms, but our throughput is up to 100,000 messages per second! Thats the main reason I love microbatches so much. By adding a tiny delay, and 10ms is usually acceptable even for financial applications, our throughput is 200 times greater. This type of trade-off is not unique to Kafka, btw. Network and storage subsystem use this kind of “micro batching”  all the time.

Sometimes latency and throughput interact in even funnier ways. One day Ted Malaskacomplained that with Flafka, he can get 20ms latency when sending 100,000 messages per second, but huge 1-3s latency when sending just 100 messages a second. This made no sense at all, until we remembered that to save CPU, if Flafka doesn’t find messages to read from Kafka it will back off and retry later. Backoff times started at 0.5s and steadily increased. Ted kindly improved Flume to avoid this issue in FLUME-2729.

Anyway, back to the Kafka Producer. There are few settings you can modify to improve latency or throughput in Kafka Producer:

  • batch.size – This is an upper limit of how many messages Kafka Producer will attempt to batch before sending – specified in bytes (Default is 16K bytes – so 16 messages if each message is 1K in size). Kafka may send batches before this limit is reached (so latency doesn’t change by modifying this parameter), but will always send when this limit is reached. Therefore setting this limit too low will hurt throughput without improving latency. The main reason to set this low is lack of memory – Kafka will always allocate enough memory for the entire batch size, even if latency requirements cause it to send half-empty batches.
  • linger.ms – How long will the producer wait before sending in order to allow more messages to get accumulated in the same batch. Normally the producer will not wait at all, and simply send all the messages that accumulated while the previous send was in progress (2 ms in the example above), but as we’ve discussed, sometimes we are willing to wait a bit longer in order to improve the overall throughput at the expense of a little higher latency. In this case tuning linger.ms to a higher value will make sense. Note that if batch.size is low and the batch if full before linger.ms time passes, the batch will send early, so it makes sense to tune batch.size and linger.ms together.

Other than tuning these parameters, you will  want to avoid waiting on the future of the send method (i.e. the result from Kafka brokers), and instead send data continuously to Kafka. You can simply ignore the result (if success of sending messages is not critical), but its probably better to use a callback. You can find an example of how to do this in my github (look at produceAsync method).

If sending is still slow and you are trying to understand what is going on, you will want to check if the send thread is fully utilized through jvisualsm (it is called kafka-producer-network-thread) or keep an eye on average batch size metric. If you find that you can’t fill the buffer fast enough and the sender is idle, you can try adding application threads that share the same producer and increase throughput this way.

Another concern can be that the Producer will send all the batches that go to the same broker together when at least one of them is full – if you have one very busy topic and others that are less busy, you may see some skew in throughput this way.

Sometimes you will notice that the producer performance doesn’t scale as you add more partitions to a topic. This can happen because, as we mentioned, there is a send buffer for each partition. When you add more partitions, you have more send buffers, so perhaps the configuration you set to keep the buffers full before (# of threads, linger.ms) is no longer sufficient and buffers are sent half-empty (check the batch sizes). In this case you will need to add threads or increase linger.ms to improve utilization and scale your throughput.

Got more tips on ingesting data into Kafka? comments are welcome!

Kafka生产者性能优化之吞吐量VS延迟的更多相关文章

  1. 一文告诉你,Kafka在性能优化方面做了哪些举措!

    很多粉丝私信问我Kafka在性能优化方面做了哪些举措,对于相关问题的答案其实我早就写过了,就是没有系统的整理一篇,最近思考着花点时间来整理一下,下次再有粉丝问我相关的问题我就可以潇洒的甩个链接了.这个 ...

  2. 1002-谈谈ELK日志分析平台的性能优化理念

    在生产环境中,我们为了更好的服务于业务,通常会通过优化的手段来实现服务对外的性能最大化,节省系统性能开支:关注我的朋友们都知道,前段时间一直在搞ELK,同时也记录在了个人的博客篇章中,从部署到各个服务 ...

  3. 5种kafka消费端性能优化方法

    摘要:带你了解基于FusionInsight HD&MRS的5种kafka消费端性能优化方法. 本文分享自华为云社区<FusionInsight HD&MRSkafka消费端性能 ...

  4. 漫游Kafka设计篇之性能优化

    Kafka在提高效率方面做了很大努力.Kafka的一个主要使用场景是处理网站活动日志,吞吐量是非常大的,每个页面都会产生好多次写操作.读方面,假设每个消息只被消费一次,读的量的也是很大的,Kafka也 ...

  5. 漫游Kafka设计篇之性能优化(7)

    Kafka在提高效率方面做了很大努力.Kafka的一个主要使用场景是处理网站活动日志,吞吐量是非常大的,每个页面都会产生好多次写操作.读方面,假设每个消息只被消费一次,读的量的也是很大的,Kafka也 ...

  6. apache kafka系列之性能优化架构分析

    apache kafka中国社区QQ群:162272557 Apache kafka性能优化架构分析 应用程序优化:数据压缩 watermark/2/text/aHR0cDovL2Jsb2cuY3Nk ...

  7. kafka的并行度与JStorm性能优化

    kafka的并行度与JStorm性能优化 > Consumers Messaging traditionally has two models: queuing and publish-subs ...

  8. Kafka权威指南 读书笔记之(三)Kafka 生产者一一向 Kafka 写入数据

    不管是把 Kafka 作为消息队列.消息总线还是数据存储平台来使用 ,总是需要有一个可以往 Kafka 写入数据的生产者和一个从 Kafka 读取数据的消费者,或者一个兼具两种角色的应用程序. 开发者 ...

  9. kafka 基础知识梳理-kafka是一种高吞吐量的分布式发布订阅消息系统

    一.kafka 简介 今社会各种应用系统诸如商业.社交.搜索.浏览等像信息工厂一样不断的生产出各种信息,在大数据时代,我们面临如下几个挑战: 如何收集这些巨大的信息 如何分析它 如何及时做到如上两点 ...

随机推荐

  1. Python - 2和3的区别

    编码: Python2的默认编码是ASCII码,这是导致Python2中经常遇到编码问题的主要原因之一,至于原因,在于Python这门语言出现的时候,还没有Unicode! Python3默认编码是U ...

  2. Celery(异步任务,定时任务,周期任务)

    1.什么是Celery Celery是基于Python实现的模块,用于异步.定时.周期任务的. 组成结构: 1.用户任务 app 2.管道broker 用于存储任务 官方推荐 redis/rabbit ...

  3. less-5

    首先输入id=1和id=1’未报错,均显示You are in.....(如下图所示) 由上图可以看到,如果运行返回结果正确的时候只返回you are in...,不会返回数据库当中的信息了,所以我们 ...

  4. 爬虫 - 请求库之selenium

    介绍 官方文档 selenium最初是一个自动化测试工具,而爬虫中使用它主要是为了解决requests无法直接执行JavaScript代码的问题 selenium本质是通过驱动浏览器,完全模拟浏览器的 ...

  5. http服务读取配置文件,交叉编译

    因为是在windows上编译的代码,在linux上不可执行,所以需要交叉编译 package main import ( "flag" "gopkg.in/ini.v1& ...

  6. WinDbg常用命令系列---输入内存值的命令e*

    e, ea, eb, ed, eD, ef, ep, eq, eu, ew, eza (Enter Values) e*命令将您指定的值输入内存.不要将此命令与~e(Thread-Specific C ...

  7. P1071 潜伏者

    //Pro:NOIP2009 T1 P1071 潜伏者 #include<iostream> #include<cstdio> #include<cstring> ...

  8. QML学习(三)——<QML命名规范>

    QML对象声明 QML对象特性一般使用下面的顺序进行构造: id 属性声明 信号声明 JavaScript函数 对象属性 子对象 状态 状态切换 为了获取更好的可读性,建议在不同部分之间添加一个空行. ...

  9. Stringbuilde方法的用法以及其作用

    Stringbuilde的方法有以下几种(常用的):(java中的语法) 在程序开发过程中,我们常常碰到字符串连接的情况,方便和直接的方式是通过"+"符号来实现,但是这种方式达到目 ...

  10. clion下批量删除断点