kafka性能测试(转)KAFKA 0.8 PRODUCER PERFORMANCE
来自:http://blog.liveramp.com/2013/04/08/kafka-0-8-producer-performance-2/
At LiveRamp, we constantly face scaling challenges as the volume of data that our infrastructure must deal with continues to grow. One such challenge involves the logging system. At present we useScribe as the transport mechanism to get logs from our webapp servers into our HDFS cluster. Scribe has served us well, but we are looking for alternatives because it has the following shortcomings:
- It provides no support for compression
- Consumers run in batches (map-reduce jobs) so real-time stats are not possible
- It is no longer in active development
One of the most promising alternatives to Scribe that addresses all of the above is Kafka. We used Kafka to build a real-time stats system prototype during our last Hackweek, and saw enough promise to do some more in-depth testing. In this post we will focus on producer performance and scaling. Since we intend to put producers in our webapp servers, we are interested in both high overall throughput and low latency when sending individual messages.
WHY KAFKA 0.8
At the time of this writing, Kafka 0.8 has not been released, and documentation for it is scarce. However, since it is a backwards incompatible release that introduces a number of important features, it would make little sense for anyone just getting started with Kafka to invest development effort in the previous version.
All tests in this post were run on this revision of the 0.8 branch.
SETUP
BROKERS
We are starting with a modestly sized cluster of three machines. The specs are as follows:
|
1
2
3
4
5
6
7
8
9
10
11
12
|
Num CPUs: 2
CPU Model: Intel(R) Xeon(R) CPU E5620 @ 2.40GHz
CPU Speed: 2400 MHz
Memory MB: 32768
Disk Controller Config
Layout: RAID-1
Size: 1,862.50 GB (1999844147200 bytes)
Layout: RAID-1
Size: 1,862.50 GB (1999844147200 bytes)
Disk config for controller 0:
4 Capacity: 1,862.50 GB (1999844147200 bytes)
7200 RPM 64MB Cache
|
Each machine has two pairs of disks in a mirroring configuration (RAID-1), which allow us to take advantage of the new multiple data directories feature introduced in Kafka 0.8. This makes it possible for a topic to have separate partitions on different disks, which should significantly increase the throughput per broker. This behavior is configured in the log.dirs setting as shown in the broker configuration below. We used default values for most other settings.
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
broker.id=1
port=9092
num.network.threads=2
num.io.threads=2
socket.send.buffer.bytes=1048576
socket.receive.buffer.bytes=1048576
socket.request.max.bytes=104857600
log.dirs=/data1/kafka,/data2/kafka
num.partitions=1
log.flush.interval.messages=10000
log.flush.interval.ms=3000
log.retention.hours=168
log.segment.bytes=536870912
log.cleanup.interval.mins=1
enable.zookeeper=true
zk.connect=zookeeper01:2181,zookeeper02:2181,zookeeper03:2181/kafka
zk.connectiontimeout.ms=1000000
kafka.metrics.polling.interval.secs=5
kafka.metrics.reporters=kafka.metrics.KafkaCSVMetricsReporter
kafka.csv.metrics.dir=/tmp/kafka_metrics
kafka.csv.metrics.reporter.enabled=false
|
As recommended by the Kafka documentation, we use a separate cluster of three dedicated machines for ZooKeeper. All machines are connected with gigabit links.
PRODUCERS
Our real use case involves a number of webapp servers each producing a relatively modest volume of logs. For this test, however, we used only a few dedicated producer machines using a custom-made tool that simulates the real load. Each producer was configured as follows:
|
1
2
3
4
5
6
7
|
Properties props = new Properties();
props.put("broker.list", "kafka01:9092,kafka02:9092,kafka03:9092");
props.put("serializer.class", "kafka.serializer.StringEncoder");
props.put("producer.type", "async");
props.put("queue.enqueue.timeout.ms", "-1");
props.put("batch.num.messages", "200");
props.put("compression.codec", "1");
|
The most important setting here is producer.type, which we set toasync. Asynchronous mode is essential to get the most out of Kafka in terms of throughput. In this mode, each producer keeps an in-memory queue of messages that are sent in batch to the broker when a pre-configured batch size or time interval has been reached. This makes compression much more efficient, especially in a use case like ours in which log lines have string representations of JSON objects, and the same keys are repeated over and over across lines. Having fewer, larger messages also helps to achieve better network utilization.
PERFORMANCE TOOLS
The Kafka distribution provides a producer performance tool that can be invoked with the script bin/kafka-producer-perf-test.sh. While this tool is very useful and flexible, we only used it to corroborate that the results obtained with our own custom tool made sense. This is due to the following reasons:
- Our tool is written in Java and uses the producer from the Java API.
- While the message size is adjustable in the Kafka tool, we wanted to use messages with the same content structure as our real production logs.
- Not all configuration parameters are exposed by the Kafka tool.
- Our tool makes it possible to set a target throughput, which limits the rate at which threads push messages to the brokers. This is necessary to evaluate latency under realistic load conditions.
THROUGHPUT RESULTS
BASELINE PERFORMANCE
The Kafka documentation claims that producers can push about 50MB/sec through a system with a single broker as long as the batch size is not too small (the default value of 200 should be large enough). We were able to verify this claim very quickly for Kafka 0.7.2 by running the following command on a fresh installation
|
1
|
bin/kafka-producer-perf-test.sh --brokerinfo broker.list=0:localhost:9092 --messages 10000000 --topic test --threads 10 --message-size 1000 --batch-size 200 --compression-codec 1 --async
|
and obtaining the following results:
|
1
2
|
start.time, end.time, compression, message.size, batch.size, total.data.sent.in.MB, MB.sec, total.data.sent.in.nMsg, nMsg.sec
2013-04-09 11:52:43:192, 2013-04-09 11:56:06:136, 1, 1000, 200, 9536.74, 46.9920, 10000000, 49274.6768
|
Running an equivalent command on a fresh installation of Kafka 0.8, however, gave us markedly worse results:
|
1
|
bin/kafka-producer-perf-test.sh --broker-list=localhost:9092 --messages 10000000 --topic test --threads 10 --message-size 1000 --batch-size 200 --compression-codec 1
|
|
1
2
|
start.time, end.time, compression, message.size, batch.size, total.data.sent.in.MB, MB.sec, total.data.sent.in.nMsg, nMsg.sec
2013-04-02 17:16:51:933, 2013-04-02 17:24:04:916, 1, 1000, 200, 9536.74, 22.0257, 10000000, 23095.5950
|
This is because in an effort to increase availability and durability, version 0.8 introduced intra-cluster replication support, and by default a producer waits for an acknowledgement response from the broker on every message (or batch of messages if async mode is used). It is possible to mimic the old behavior, but we were not very interested in that given that we intend to use replication in production.
Performance degraded further once we started using a sample of real ~1KB sized log messages rather than the synthetic messages produced by the Kafka tool, resulting in a throughput of about 10 MB/sec.
All throughput numbers refer to uncompressed data.
NUMBER OF PRODUCERS
Our first test consisted in evaluating the impact of adding producer machines.

By adding identically configured producer machines, each pushing as many messages as it can, the overall throughput increases slightly. We also observed that throughput was distributed very evenly across the machines.
NUMBER OF PARTITIONS
Next, using all ten machines at our disposal we tested the effect of using different numbers of partitions.

Throughput increases very markedly at first as more brokers and disks on them start hosting different partitions. Once all brokers and disks are used though, adding additional partitions does not seem to have any effect.
NUMBER OF REPLICAS
As we saw in the baseline performance tests, even using a single replica represents a big performance hit when compared to the old system which had no support for replication at all. We were interested in knowing how much of an additional hit we would get when using two and three replicas.

Fortunately, the extra performance hit turned out to be quite small.
NUMBER OF TOPICS
Finally, we tested the effect of increasing the number of topics. Our use case requires only a handful of topics, so we only experimented with small numbers.

Update: Michael G. Noll (see comment below) kindly pointed out that throughput could be improved by disabling ack messages, and provided this post. as a reference of what could be expected. I rerun some of the tests and here are some preliminary results:
- Using the most realistic scenario (10 partitions, 10 producer machines, 3 replicas, and 1-10 topics, same as the last chart above), I only obtained a very modest 12% increase on average throughput.
- Since this is very different from the ~2x mentioned in the post, I did some more digging and found the following:
- Using one producer machine and a topic with 10 partitions and 3 replicas, I was able to reproduce the 2x improvement (21 to 44 MB/sec) with both Kafka's and our own tool (setting it to use synthetic messages)
- When switching our tool back to real messages (a sample of production logs), that 2x became ~12%
- Therefore, it appears that the ack message is no longer a big bottleneck once real messages are used.
LATENCY RESULTS
Having an idea of what is the maximum throughput that can be achieved, we investigated the average and maximum latency of sending an individual message, which directly impacts the loading time on a browser hitting our webapp servers (this is the time for a thread using the Kafka producer to return from a call to send, NOT the full producer-broker-consumer cycle). To do this, we configured our tool to limit the rate at which it pushes messages according to a target throughput, and monitored latency for different values of throughput.




The average latency is consistently below 0.02 ms for as long as the target throughput does not reach the maximum throughput. Unfortunately, the maximum latency hovers around 120 ms even for very low values of throughput. Once the producers start trying to push more messages than the brokers can handle, both average and maximum latency increase very dramatically.
Finally, we set queue.enqueue.timeout.ms to 0 in an attempt to prevent the Kafka producer from ever blocking on a call to send, hoping that this would decrease the maximum latency. Unfortunately, this had no effect whatsoever. We got identical results to the graphs above. The only difference was that, as expected, producers started throwing exceptions (kafka.common.QueueFullException) when the target throughput reached the maximum throughput. Also, we observed that once exceptions were thrown, the producers would hang indefinitely despite invoking the close method, and a call toSystem.exit was required to force the application to quit.
CONCLUSIONS
Based on the numbers obtained above, we can draw the following preliminary conclusions:
- Kafka 0.8 improves availability and durability at the expense of some performance.
- Throughput seems to scale very well as the number of brokers and/or disks per broker increases.
- Moderate numbers of producer machines and topics have no negative effect on throughput compared to a single producer and topic.
- When configured in async mode, producers have very low average latency for each message sent, but there are outliers that take over 100 ms, even when operating at low overall throughput. This poses a problem for our use case.
- Trying to push more data than the brokers can handle for any sustained period of time has catastrophic consequences, regardless of what timeout settings are used. In our use case this means that we need to either ensure we have spare capacity for spikes, or use something on top of Kafka to absorb spikes.
NEXT STEPS
We have just scratched the surface and there is still a lot of work to be done. Following is a list of some of the things we will probably look into:
- perform a similar analysis on consumers to make sure high throughput can be sustained regardless of how many consumers are active.
- experiment with custom partitioners so that each producer needs to communicate with only a subset of the brokers (If/when we add more broker nodes to the cluster).
- set up a mirroring configuration in which separate Kafka clusters from multiple cloud regions send their traffic to a master cluster.
FEEDBACK WELCOME
It is our hope that the information we provided will be useful for people considering using Kafka for the first time or switching from 0.7 to 0.8. If you have any questions, comments or suggestions please leave them below.
kafka性能测试(转)KAFKA 0.8 PRODUCER PERFORMANCE的更多相关文章
- Kafka深度解析(如何在producer中指定partition)(转)
原文链接:Kafka深度解析 背景介绍 Kafka简介 Kafka是一种分布式的,基于发布/订阅的消息系统.主要设计目标如下: 以时间复杂度为O(1)的方式提供消息持久化能力,即使对TB级以上数据也能 ...
- Kafka 0.11.0.0 实现 producer的Exactly-once 语义(英文)
Exactly-once Semantics are Possible: Here’s How Kafka Does it I’m thrilled that we have hit an excit ...
- kafka性能测试1.0.0
kafka提供工具kafka-producer-perf-test.sh用以压测, 参数 说明 messages 生产者发送总的消息数量 message-size 每条消息大小 batch-size ...
- Kafka 0.11.0.0 实现 producer的Exactly-once 语义(中文)
很高兴地告诉大家,具备新的里程碑意义的功能的Kafka 0.11.x版本(对应 Confluent Platform 3.3)已经release,该版本引入了exactly-once语义,本文阐述的内 ...
- Apache kafka原理与特性(0.8V)
前言: kafka是一个轻量级的/分布式的/具备replication能力的日志采集组件,通常被集成到应用系统中,收集"用户行为日志"等,并可以使用各种消费终端(consumer) ...
- 【转载】Apache kafka原理与特性(0.8V)
http://blog.csdn.net/xiaolang85/article/details/37821209 前言: kafka是一个轻量级的/分布式的/具备replication能力的日志采集组 ...
- Kafka 详解(三)------Producer生产者
在第一篇博客我们了解到一个kafka系统,通常是生产者Producer 将消息发送到 Broker,然后消费者 Consumer 去 Broker 获取,那么本篇博客我们来介绍什么是生产者Produc ...
- Kafka:ZK+Kafka+Spark Streaming集群环境搭建(九)安装kafka_2.11-1.1.0
如何搭建配置centos虚拟机请参考<Kafka:ZK+Kafka+Spark Streaming集群环境搭建(一)VMW安装四台CentOS,并实现本机与它们能交互,虚拟机内部实现可以上网.& ...
- Apache Kafka(六)- High Throughput Producer
High Throughput Producer 在有大量消息需要发送的情况下,默认的Kafka Producer配置可能无法达到一个可观的的吞吐.在这种情况下,我们可以考虑调整两个方面,以提高Pro ...
随机推荐
- 禁止tomcat生成catalina.out、localhost_access_log、host-manager.log、localhost.log、manager.log这些文件
1.找到tomcat目录conf,logging.properties把这个文件重命名(推荐:也可以根据自己的需要更改里面的配置文件) 其实删掉也可以但是不建义 这样就不会生成host-manager ...
- 清道夫第一季/全集Ray Donovan迅雷下载
清道夫 第一季 Ray Donovan Season 1 (2013)本季看点:Ray Donovan.一位专职于为洛杉矶的名人和富豪服务的神秘人士.他可以巧妙的解决这个城市中富豪们的那些最麻烦同时又 ...
- Mac下使用XLD转换无损音乐Ape
最近想要给音乐库增加一些音乐,下载了一些Ape格式的无损音乐,但是无法直接导入到iTunes中,必须经过其他工具转换成苹果的无损格式,XLD就是这样一款工具.XLD的下载和安装非常方便,直接将APP拖 ...
- Java多线程知识-Callable和Future
Callable和Future出现的原因 创建线程的2种方式,一种是直接继承Thread,另外一种就是实现Runnable接口. 这2种方式都有一个缺陷就是:在执行完任务之后无法获取执行结果. 如果需 ...
- nginx源代码分析之内存池实现原理
建议看本文档时结合nginx源代码. 1.1 什么是内存池?为什么要引入内存池? 内存池实质上是接替OS进行内存管理.应用程序申请内存时不再与OS打交道.而是从内存池中申请内存或者释放内存到内存池 ...
- codeforce 192 div2解题报告
今天大家一起做的div2,怎么说呢,前三题有点坑,好多特判.... A. Cakeminator 题目的意思是说,让你吃掉cake,并且是一行或者一列下去,但是必须没有草莓的存在.这道题目,就是判断一 ...
- python将控制台输出保存至文件
很多时候在Linux系统下运行python程序时,控制台会输出一些有用的信息.为了方便保存这些信息,有时需要对这些信息进行保存.这里介绍几种将控制台输出保存到文件中的方式:1 重定向标准输出流重定向标 ...
- jquery的$.extend和$.fn.extend作用及区别,兼它们的一些小细节
$.extend(obj);是为了扩展jquery本身,为类添加新的方法 $.fn.extend(obj);给JQUERY对象添加方法.如(1): $.extend({ add:function( ...
- 【Spark】Spark-reduceByKey-深入理解
Spark-reduceByKey-深入理解 spark.apache.org_百度搜索 Apache Spark™ - Lightning-Fast Cluster Computing reduce ...
- Linux Kernel系列三:Kernel编译和链接中的linker script语法详解
先要讲讲这个问题是怎么来的.(咱们在分析一个技术的时候,先要考虑它是想解决什么问题,或者学习新知识的时候,要清楚这个知识的目的是什么). 我在编译内核的时候,发现arch/arm/kernel目录下有 ...