When we are talking about performance of Kafka Producer, we are really talking about two different things:

  • latency: how much time passes from the time KafkaProducer.send() was called until the message shows up in a Kafka broker.
  • throughput: how many messages can the producer send to Kafka each second.

Many years ago, I was in a storage class taught by scalability expert James Morle. One of the students asked why we need to worry about both latency and throughput – after all, if processing a message takes 10ms (latency), then clearly throughput is limited to 100 messages per second. When looking at things this way, it may look like higher latency == higher throughput. However, the relation between latency and throughput is not this trivial.

Lets start our discussion with agreeing that we are only talking about the new Kafka Producer (the one in org.apache.kafka.clients package). It makes things simpler and there’s no reason to use the old producer at this point.

Kafka Producer allows to send message batches. Suppose that due to network roundtrip times, it takes 2ms to send a single Kafka message. By sending one message at a time, we have latency of 2ms and throughput of 500 messages per second. But suppose that we are in no big hurry, and are willing to wait few milliseconds and send a larger batch – lets say we decided to wait 8ms and managed to accumulate 1000 messages. Our latency is now 10ms, but our throughput is up to 100,000 messages per second! Thats the main reason I love microbatches so much. By adding a tiny delay, and 10ms is usually acceptable even for financial applications, our throughput is 200 times greater. This type of trade-off is not unique to Kafka, btw. Network and storage subsystem use this kind of “micro batching”  all the time.

Sometimes latency and throughput interact in even funnier ways. One day Ted Malaskacomplained that with Flafka, he can get 20ms latency when sending 100,000 messages per second, but huge 1-3s latency when sending just 100 messages a second. This made no sense at all, until we remembered that to save CPU, if Flafka doesn’t find messages to read from Kafka it will back off and retry later. Backoff times started at 0.5s and steadily increased. Ted kindly improved Flume to avoid this issue in FLUME-2729.

Anyway, back to the Kafka Producer. There are few settings you can modify to improve latency or throughput in Kafka Producer:

  • batch.size – This is an upper limit of how many messages Kafka Producer will attempt to batch before sending – specified in bytes (Default is 16K bytes – so 16 messages if each message is 1K in size). Kafka may send batches before this limit is reached (so latency doesn’t change by modifying this parameter), but will always send when this limit is reached. Therefore setting this limit too low will hurt throughput without improving latency. The main reason to set this low is lack of memory – Kafka will always allocate enough memory for the entire batch size, even if latency requirements cause it to send half-empty batches.
  • linger.ms – How long will the producer wait before sending in order to allow more messages to get accumulated in the same batch. Normally the producer will not wait at all, and simply send all the messages that accumulated while the previous send was in progress (2 ms in the example above), but as we’ve discussed, sometimes we are willing to wait a bit longer in order to improve the overall throughput at the expense of a little higher latency. In this case tuning linger.ms to a higher value will make sense. Note that if batch.size is low and the batch if full before linger.ms time passes, the batch will send early, so it makes sense to tune batch.size and linger.ms together.

Other than tuning these parameters, you will  want to avoid waiting on the future of the send method (i.e. the result from Kafka brokers), and instead send data continuously to Kafka. You can simply ignore the result (if success of sending messages is not critical), but its probably better to use a callback. You can find an example of how to do this in my github (look at produceAsync method).

If sending is still slow and you are trying to understand what is going on, you will want to check if the send thread is fully utilized through jvisualsm (it is called kafka-producer-network-thread) or keep an eye on average batch size metric. If you find that you can’t fill the buffer fast enough and the sender is idle, you can try adding application threads that share the same producer and increase throughput this way.

Another concern can be that the Producer will send all the batches that go to the same broker together when at least one of them is full – if you have one very busy topic and others that are less busy, you may see some skew in throughput this way.

Sometimes you will notice that the producer performance doesn’t scale as you add more partitions to a topic. This can happen because, as we mentioned, there is a send buffer for each partition. When you add more partitions, you have more send buffers, so perhaps the configuration you set to keep the buffers full before (# of threads, linger.ms) is no longer sufficient and buffers are sent half-empty (check the batch sizes). In this case you will need to add threads or increase linger.ms to improve utilization and scale your throughput.

Got more tips on ingesting data into Kafka? comments are welcome!

TIPS FOR IMPROVING PERFORMANCE OF KAFKA PRODUCER的更多相关文章

  1. Apache Kafka(五)- Safe Kafka Producer

    Kafka Safe Producer 在应用Kafka的场景中,需要考虑到在异常发生时(如网络异常),被发送的消息有可能会出现丢失.乱序.以及重复消息. 对于这些情况,我们可以创建一个“safe p ...

  2. 【原创】Kafka producer原理 (Scala版同步producer)

    本文分析的Kafka代码为kafka-0.8.2.1.另外,由于Kafka目前提供了两套Producer代码,一套是Scala版的旧版本:一套是Java版的新版本.虽然Kafka社区极力推荐大家使用J ...

  3. 【转】Kafka producer原理 (Scala版同步producer)

    转载自:http://www.cnblogs.com/huxi2b/p/4583249.html     供参考 本文分析的Kafka代码为kafka-0.8.2.1.另外,由于Kafka目前提供了两 ...

  4. Kafka Producer相关代码分析【转】

    来源:https://www.zybuluo.com/jewes/note/63925 @jewes 2015-01-17 20:36 字数 1967 阅读 1093 Kafka Producer相关 ...

  5. kafka producer源码

    producer接口: /** * Licensed to the Apache Software Foundation (ASF) under one or more * contributor l ...

  6. kafka producer生产数据到kafka异常:Got error produce response with correlation id 16 on topic-partition...Error: NETWORK_EXCEPTION

      kafka producer生产数据到kafka异常:Got error produce response with correlation id 16 on topic-partition... ...

  7. kafka producer 0.8.2.1 示例

    package test_kafka; import java.util.Properties; import java.util.concurrent.atomic.AtomicInteger; i ...

  8. 关于Kafka producer管理TCP连接的讨论

    在Kafka中,TCP连接的管理交由底层的Selector类(org.apache.kafka.common.network)来维护.Selector类定义了很多数据结构,其中最核心的当属java.n ...

  9. Kettle安装Kafka Consumer和Kafka Producer插件

    1.从github上下载kettle的kafka插件,地址如下 Kafka Consumer地址: https://github.com/RuckusWirelessIL/pentaho-kafka- ...

随机推荐

  1. SET NOCOUNT { ON | OFF }

    当 SET NOCOUNT 为 ON 时,不返回计数(表示受 Transact-SQL 语句影响的行数) SET NOCOUNT 为 ON 时,也更新 @@ROWCOUNT 函数. 当 SET NOC ...

  2. [PHP]代码执行和生命周期

    PHP代码的执行:1.和大部分程序一样,接收数据,处理数据,输出结果2.编写的代码就是输入的数据,php内核进行处理,返回相应的输出3.php作为业务程序和编译语言的区别就是,php多了一步把用户代码 ...

  3. Verification and validation

    Verification Verification is the process to make sure the product satisfies the conditions imposed a ...

  4. css3巧用选择器配合伪元素

    一 . 前言 有时我们在写底部导航栏时,会有很多超链接,每个链接间用“|”分割,如下图: 可能刚入门的朋友会想到这样完成,再单独设置span的样式, 今天主要介绍怎么样用css3简单快速的完成这个效果 ...

  5. Github被微软收购,这里整理了16个替代品

    微软斥资75亿美元收购以后,鉴于微软和开源竞争的历史,很多开发者都感到惊恐.毕竟,互联网上最大的一块可以自由的净土被微软染指,宝宝不开森.如果你真的担心微软会对Github有所动作,那么这里我列举了1 ...

  6. 处理JavaScript异常的正确姿势

    译者按: 错误是无法避免的,妥善处理它才是最重要的! 原文: A Guide to Proper Error Handling in JavaScript Related Topics: 译者: Fu ...

  7. 小tips:JS中typeof与instanceof用法

    介绍 typeof typeof用以获取一个变量或者表达式的类型,typeof一般只能返回如下几个结果: number boolean string function(函数) object(NULL, ...

  8. readLine()的注意点

    我在用socket做即时通讯的时候,读取服务器返回的信息用了BufferedReader,用起来挺方便的. BufferedReader br = new BufferedReader(new Inp ...

  9. Java synchronized解析

    多线程三大特性: 可见性.原子性.有序性 synchronize的特性: 1.同一时刻只有一个线程访问临界资源 2.其它未获取到锁执行权的线程必须排队等待 3.保证共享资源的原子性.可见性和有序性 4 ...

  10. Kotlin入门(28)Application单例化

    Application是Android的又一大组件,在App运行过程中,有且仅有一个Application对象贯穿应用的整个生命周期,所以适合在Application中保存应用运行时的全局变量.而开展 ...