TIPS FOR IMPROVING PERFORMANCE OF KAFKA PRODUCER
When we are talking about performance of Kafka Producer, we are really talking about two different things:
- latency: how much time passes from the time KafkaProducer.send() was called until the message shows up in a Kafka broker.
- throughput: how many messages can the producer send to Kafka each second.
Many years ago, I was in a storage class taught by scalability expert James Morle. One of the students asked why we need to worry about both latency and throughput – after all, if processing a message takes 10ms (latency), then clearly throughput is limited to 100 messages per second. When looking at things this way, it may look like higher latency == higher throughput. However, the relation between latency and throughput is not this trivial.
Lets start our discussion with agreeing that we are only talking about the new Kafka Producer (the one in org.apache.kafka.clients package). It makes things simpler and there’s no reason to use the old producer at this point.
Kafka Producer allows to send message batches. Suppose that due to network roundtrip times, it takes 2ms to send a single Kafka message. By sending one message at a time, we have latency of 2ms and throughput of 500 messages per second. But suppose that we are in no big hurry, and are willing to wait few milliseconds and send a larger batch – lets say we decided to wait 8ms and managed to accumulate 1000 messages. Our latency is now 10ms, but our throughput is up to 100,000 messages per second! Thats the main reason I love microbatches so much. By adding a tiny delay, and 10ms is usually acceptable even for financial applications, our throughput is 200 times greater. This type of trade-off is not unique to Kafka, btw. Network and storage subsystem use this kind of “micro batching” all the time.
Sometimes latency and throughput interact in even funnier ways. One day Ted Malaskacomplained that with Flafka, he can get 20ms latency when sending 100,000 messages per second, but huge 1-3s latency when sending just 100 messages a second. This made no sense at all, until we remembered that to save CPU, if Flafka doesn’t find messages to read from Kafka it will back off and retry later. Backoff times started at 0.5s and steadily increased. Ted kindly improved Flume to avoid this issue in FLUME-2729.
Anyway, back to the Kafka Producer. There are few settings you can modify to improve latency or throughput in Kafka Producer:
- batch.size – This is an upper limit of how many messages Kafka Producer will attempt to batch before sending – specified in bytes (Default is 16K bytes – so 16 messages if each message is 1K in size). Kafka may send batches before this limit is reached (so latency doesn’t change by modifying this parameter), but will always send when this limit is reached. Therefore setting this limit too low will hurt throughput without improving latency. The main reason to set this low is lack of memory – Kafka will always allocate enough memory for the entire batch size, even if latency requirements cause it to send half-empty batches.
- linger.ms – How long will the producer wait before sending in order to allow more messages to get accumulated in the same batch. Normally the producer will not wait at all, and simply send all the messages that accumulated while the previous send was in progress (2 ms in the example above), but as we’ve discussed, sometimes we are willing to wait a bit longer in order to improve the overall throughput at the expense of a little higher latency. In this case tuning linger.ms to a higher value will make sense. Note that if batch.size is low and the batch if full before linger.ms time passes, the batch will send early, so it makes sense to tune batch.size and linger.ms together.
Other than tuning these parameters, you will want to avoid waiting on the future of the send method (i.e. the result from Kafka brokers), and instead send data continuously to Kafka. You can simply ignore the result (if success of sending messages is not critical), but its probably better to use a callback. You can find an example of how to do this in my github (look at produceAsync method).
If sending is still slow and you are trying to understand what is going on, you will want to check if the send thread is fully utilized through jvisualsm (it is called kafka-producer-network-thread) or keep an eye on average batch size metric. If you find that you can’t fill the buffer fast enough and the sender is idle, you can try adding application threads that share the same producer and increase throughput this way.
Another concern can be that the Producer will send all the batches that go to the same broker together when at least one of them is full – if you have one very busy topic and others that are less busy, you may see some skew in throughput this way.
Sometimes you will notice that the producer performance doesn’t scale as you add more partitions to a topic. This can happen because, as we mentioned, there is a send buffer for each partition. When you add more partitions, you have more send buffers, so perhaps the configuration you set to keep the buffers full before (# of threads, linger.ms) is no longer sufficient and buffers are sent half-empty (check the batch sizes). In this case you will need to add threads or increase linger.ms to improve utilization and scale your throughput.
Got more tips on ingesting data into Kafka? comments are welcome!
TIPS FOR IMPROVING PERFORMANCE OF KAFKA PRODUCER的更多相关文章
- Apache Kafka(五)- Safe Kafka Producer
Kafka Safe Producer 在应用Kafka的场景中,需要考虑到在异常发生时(如网络异常),被发送的消息有可能会出现丢失.乱序.以及重复消息. 对于这些情况,我们可以创建一个“safe p ...
- 【原创】Kafka producer原理 (Scala版同步producer)
本文分析的Kafka代码为kafka-0.8.2.1.另外,由于Kafka目前提供了两套Producer代码,一套是Scala版的旧版本:一套是Java版的新版本.虽然Kafka社区极力推荐大家使用J ...
- 【转】Kafka producer原理 (Scala版同步producer)
转载自:http://www.cnblogs.com/huxi2b/p/4583249.html 供参考 本文分析的Kafka代码为kafka-0.8.2.1.另外,由于Kafka目前提供了两 ...
- Kafka Producer相关代码分析【转】
来源:https://www.zybuluo.com/jewes/note/63925 @jewes 2015-01-17 20:36 字数 1967 阅读 1093 Kafka Producer相关 ...
- kafka producer源码
producer接口: /** * Licensed to the Apache Software Foundation (ASF) under one or more * contributor l ...
- kafka producer生产数据到kafka异常:Got error produce response with correlation id 16 on topic-partition...Error: NETWORK_EXCEPTION
kafka producer生产数据到kafka异常:Got error produce response with correlation id 16 on topic-partition... ...
- kafka producer 0.8.2.1 示例
package test_kafka; import java.util.Properties; import java.util.concurrent.atomic.AtomicInteger; i ...
- 关于Kafka producer管理TCP连接的讨论
在Kafka中,TCP连接的管理交由底层的Selector类(org.apache.kafka.common.network)来维护.Selector类定义了很多数据结构,其中最核心的当属java.n ...
- Kettle安装Kafka Consumer和Kafka Producer插件
1.从github上下载kettle的kafka插件,地址如下 Kafka Consumer地址: https://github.com/RuckusWirelessIL/pentaho-kafka- ...
随机推荐
- .NET CORE 实践(2)--对Ubuntu下安装SDK的记录
根据官网Ubuntu安装SDK操作如下: allen@allen-Virtual-Machine:~$ sudo apt-key adv --keyserver apt-mo.trafficmanag ...
- C# MVC 基于From的身份验证
前言 昨天和一个技术比较好的前辈聊了聊,发现有的时候自己的学习方式有些问题,不知道有没有和我一样的越学习越感觉到知识的匮乏不过能认识到这个问题的同学们,也不要太心急路是一步一步走的饭是一口一口吃的认识 ...
- Packet for query is too large (12238 > 1024). You can change this value
MySQL max_allowed_packet 设置过小导致记录写入失败 mysql根据配置文件会限制server接受的数据包大小. 有时候大的插入和更新会受max_allowed_packet 参 ...
- 【Oracle 11gR2】静默安装 db_install.rsp文件详解
#################################################################### ## Copyright(c) Oracle Corporat ...
- Could not get JDBC connection
想学习下JavaWeb,手头有2017年有活动的时候买的一本书,还是全彩的,应该很适合我这种菜鸟技术渣. 只可惜照着书搭建了一套Web环境,代码和db脚本都是拷贝的光盘里的,也反复检查了数据库的连接情 ...
- Java 原生网络编程.
一.概念 Java 语言从其诞生开始,就和网络紧密联系在一起.在 1995 年的 Sun World 大会上,当时占浏览器市场份额绝对领先的网景公司宣布在浏览器中支持Java,从而引起一系列的公司产品 ...
- Js 控制随机数概率
如: 取 1~10 之间的随机数,那么他们的取值范围是: 整数 区间 概率 1 [0,1) 0.1 2 [1,2) 0.1 3 [2,3) 0.1 4 [3,4) 0.1 5 [4,5) 0.1 6 ...
- SpringBoot项目打war包部署Tomcat教程
一.简介 正常来说SpringBoot项目就直接用jar包来启动,使用它内部的tomcat实现微服务,但有些时候可能有部署到外部tomcat的需求,本教程就讲解一下如何操作 二.修改pom.xml 将 ...
- win10电脑怎么录制视频 电脑录制视频软件
win10电脑怎么录制视频?相信不少网友正在面临这个疑惑.现如今是网络信息科技时代,快速传播信息的途径和方式有很多种.其中,通过录制电脑视频,可以制作视频教程.游戏解说,还可以录制在线视频存储影视资源 ...
- Android为TV端助力 EventBus.getDefault()开源框架
在onCreate里面执行 EventBus.getDefault().register(this);意思是让EventBus扫描当前类,把所有onEvent开头的方法记录下来,如何记录呢?使用Map ...