Understanding Kafka Consumer Groups and Consumer Lag
In this post, we will dive into the consumer side of this application ecosystem, which means looking closely at Kafka consumer group monitoring. Read on to find out more.
In our previous blog, we talked about monitoring Kafka as a broker service, looking at ways to think about disk utilization and replication problems. But the Kafka brokers sit in the middle of an ecosystem, with Kafka producers on one side writing data, and Kafka consumers on the other side reading data. In this post, we will dive into the consumer side of this application ecosystem, which means looking closely at Kafka consumer group monitoring.
What Is a Consumer Group?
Kafka as a broker service has a very simple API, and could practically be used with many kinds of applications and application architectures leveraging the brokers for i/o and queueing messages. However, it turns out that there is a common architecture pattern: a group of application nodes collaborates to consume messages, often scaling out as message volume goes up, and handling the scenario where nodes crash or drop out. This pattern generally keeps the data and messages flowing with reliability and with certainty even as application nodes come and go.
There’s also a reference implementation for this architecture based on decades of hard won experience with high performance distributed systems, called the Kafka Consumer Group. This reference implementation is shipped with Apache Kafka as a JAR and is well documented, although it is possible to implement a Consumer Group application in any language.
A Consumer Group’s Relationship to Partitions
While the Consumer Group uses the broker APIs, it is more of an application pattern or a set of behaviors embedded into your application. The Kafka brokers are an important part of the puzzle but do not provide the Consumer Group behavior directly. A Consumer Group based application may run on several nodes, and when they start up they coordinate with each other in order to split up the work. This is slightly imperfect because the work, in this case, is a set of partitions defined by the Producer. Each Consumer node can read a partition and one can split up the partitions to match the number of consumer nodes as needed. If the number of Consumer Group nodes is more than the number of partitions, the excess nodes remain idle. This might be desirable to handle failover. If there are more partitions than Consumer Group nodes, then some nodes will be reading more than one partition.
Reading Multiple Partitions on One Node
There are a couple of tricky things to consider as one designs a Consumer Group. If a consumer node takes multiple partitions or ends up taking multiple partitions on failover, those partitions will appear intermingled, if viewed as a single stream of messages. So a Consumer Group application could get row #100 from partition 3, then row #90 from partition 4, then back to partition 3 for row #101. Nothing in Kafka can guarantee order across partitions, as only messages within a partition are in order. So either order should not matter to the consumer application, or the consumer application is able to order these partitions by splitting the stream appropriately.
Multiple Topics Within a Consumer Group
The other tricky design consideration is that each member of a Consumer Group may subscribe to some, but not all, of the topics being handled in the group. This makes thinking about distribution a little complex. In a simple case of a Consumer Group handling one and only one topic, all nodes would subscribe to that topic and distribution of work would be uniform. If there are two topics, and only some nodes subscribe to Topic-1, then those Topic-1 partitions will only be assigned to the subscribing nodes, and if one goes down it will be reassigned only to one of the remaining subscribing nodes, if there are any. Think of this Consumer Group design like a group of groups, where each subgroup is pooled and balanced independently.
The Rebalancing Phase
As nodes in a Consumer Group come and go, the running nodes decide how to divide up the partitions. In the reference implementation, each partition is assigned one owner in a rebalancing phase. Rebalancing triggers under different circumstances, but think of it as the phase that happens when an application scales up and down. When an application crashes, all the well-behaved nodes stop work, unsubscribe from their partitions, and their former partitions will be available to be reassigned. Those well-behaved nodes will then wait for all the partitions to reach this state. The less-well-behaved nodes, such as the one that suddenly crashed, will of course not unsubscribe to their partitions.
Rebalancing Timeouts
In this failure case, where some nodes are waiting patiently and some other nodes are gone, wedged, or otherwise non-responsive, two timeouts start ticking. One is a timeout for the Kafka client, which might be something like zookeeper.session.timeout.ms. This is a heartbeat window which is used for detecting that a node hasn’t reported back in a timely manner. This is tested all the time and used to evict bad nodes. The other timeout is rebalance.backoff.ms * rebalance.max.retries.
This is the largest window allowed for the rebalancing phase, where clients are not reading anything from Kafka. But if this window is smaller than the Kafka client session timer, rebalancing could fail due to a crashed node and you’d have a stopped Consumer Group. And if the Kafka client session timer is too small, you could evict application nodes by mistake and trigger unnecessary rebalancing. So thinking carefully about these two timeout windows is necessary to keep your application running well.
In Part II, we'll look at Kafka Coordinator, lag as a KPI, and more!
Assignment Algorithm
Looking a little deeper into rebalancing, one might wonder how these assignments between clients and partitions happen. This turns out to be an interesting area for Kafka’s roadmap, as you can imagine different strategies for assigning work can be quite useful to different kinds of applications. One might, for example, have specialist nodes that are better for some kinds of work within the group, and it might be nice to try to push the right data to them. Today that can be done at the Topic level, as mentioned above, essentially dividing a Consumer Group into a bunch of subgroups.
Or one might want some assignment that results in uniform workloads, based on the number of messages in each partition. But until we have pluggable assignment functions, the reference implementation has a straightforward assignment strategy called Range Assignment. There is also a newer Round Robin assignor which is useful for applications like Mirror Maker, but most applications just use the default assignment algorithm.
The Range Assignor tries to land on a uniform distribution of partitions, at least within each topic, while at the same time avoiding the need to coordinate and bargain between nodes. This last goal, independent assignment, is done by each node executing a fixed algorithm: sort the partitions, sort the consumers, then for each topic take same-sized ranges of partitions for each consumer. Where the sizes cannot be the same, the consumers at the beginning of the sorted list will end up with one extra partition. With this algorithm, each application node can see the entire layout by itself, and from there take up the right assignments.
Let’s look at an example from comments in the source code:
*Forexample,suppose there are two consumers C0 andC1,two topics t0 andt1,andeachtopic has3partitions,
*resulting inpartitions t0p0,t0p1,t0p2,t1p0,t1p1,andt1p2.
*The assignment will be:
*C0:[t0p0,t0p1,t1p0,t1p1]
*C1:[t0p2,t1p2]
Notice that each topic is broken up into ranges regardless of the other topics, so the first application node, in this case, ends up with one extra partition from Topic-1 and one extra partition from Topic-2. This could be twice the work for our unbalanced node, as it has four partitions while the second node has only two. But if a third node were added, everything would become perfectly balanced, as each node would have one partition from each topic. And if a fourth node were added, you’d have one idle node doing nothing, because no topic has four partitions.
Kafka Coordinator
You might be wondering at this point where all of these assignment decisions are stored. In earlier versions of the Consumer Group reference implementation, Zookeeper was used to store all of this kind of meta data. Since then, newer versions of Kafka have a set of APIs to support storing Consumer Group metadata in the brokers themselves. Each Consumer Group can sync up with one of the brokers that will take on the role of Coordinator for that group.
While all the decision making is still down in the application nodes, the Coordinator can fulfill a JoinGroup request and supply metadata about the Consumer Group, like assignments and offsets. This Coordinator node is also responsible for the heartbeat timer mentioned above, so if the Consumer Group application node that is leading group decisions disappears, the Coordinator could kick everyone out and essentially require the Consumer Group to be reformed by the remaining nodes. An important part of Consumer Group behavior, then, is electing leader nodes and working with the Coordinator to read and write metadata about assignments and partitions.
System Review
This is a lot of complex behavior that you “get for free” when you use a Consumer Group, so it is important to understand not just how to configure and set up your application, but also how to get operational insight into the various systems. To cover the application ecosystem end to end, you must monitor at least Zookeeper, Brokers / Coordinators, Producers, and Consumers. Zookeeper is at least used to bootstrap everything else, but often is also used to store Consumer Group assignments and offset updates. Brokers / Coordinators must be fully functional of course as every message must pass through them. It is possible to see Brokers in a degraded state while the Producers and Consumers are working correctly, but it typically cannot last this way for a long time without eventually starting to impact throughput or error rates at least on the Producers.
Monitoring Producers is like monitoring a simpler Kafka application, which just wants to write to a partition. And we can see Producer behavior holistically from the Consumer’s point of view, as it is possible to tell from Broker metadata how much data is being added to each of the partitions under a Consumer Group. So even though the Producers are not necessarily coordinated or aware of a Consumer Group, the Consumer Group can naturally tell if Producers have sudden spikes or drops in traffic.
Lag as a KPI
Just by looking at the metadata of a Consumer Group, we can determine a few key metrics: how many messages are being written to the partitions within the group, how many messages are being read from those partitions, and what is the difference? The difference is called Lag, and represents how far the Consumer Group application is behind the producers. Producer offsets are kept in the Kafka Broker in charge of that partition, which can tell you the last offset in the partition. Consumer offsets are kept either in Zookeeper or the Kafka Coordinator, and tell you the most recently read offset in each partition.
Note that these offsets are eventually consistent, and synchronized on different heartbeats by different application clusters, so they may not make perfect sense at all times. For example, you could counterintuitively have a Consumer offset that is greater than a Producer offset, but if you waited another heartbeat cycle or two and then updated the Producer offset, it should normally be ahead of the previous Consumer offset. In aggregate, total application lag is the sum of all the partition lags. For a normal Consumer Group, lag should be close to zero or at least somewhat flat and stable, which would mean the application is keeping up with the producers. Total lag is the number of messages behind real time. For an application that wants to be near real time it is important to monitor lag as a key performance indicator, and to drive lag down.
Monitoring Consumer Lag With OpsClarity
As you can see, the mechanics of consumer lag and monitoring can be complex and difficult. Most monitoring solutions offer Kafka Broker monitoring and leave it to the user to collect application metrics around Consumer Groups. In the next blog, we’ll look at Consumer Group monitoring with open source solutions like Burrow, and compare to how we monitor Kafka at OpsClarity. OpsClarity has automated monitoring of the entire Kafka ecosystem, from Producers to Brokers to Consumer Groups, integrated with surrounding systems critical to your application.
Understanding Kafka Consumer Groups and Consumer Lag的更多相关文章
- Scalability of Kafka Messaging using Consumer Groups
May 10, 2018 By Suhita Goswami No Comments Categories: Data Ingestion Flume Kafka Use Case Tradition ...
- Kafka的Producer和Consumer源码学习
先解释下两个概念: high watermark (HW) 它表示已经被commited的最后一个message offset(所谓commited, 应该是ISR中所有replica都已写入),HW ...
- 设计Kafka的High Level Consumer
原文:https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example 为什么使用High Level Consumer ...
- .net Kafka.Client多个Consumer Group对Topic消费不能完全覆盖研究总结(一)
我们知道Kafka支持Consumer Group的功能,但是最近在应用Consumer Group时发现了一个Topic 的Partition不能100%覆盖的问题. 程序部署后,发现Kafka在p ...
- .net Kafka.Client多个Consumer Group对Topic消费不能完全覆盖研究总结(二)
依据Partition和Consumer的Rebalance策略,找到Kafka.Client Rebalance代码块,还原本地环境,跟踪调试,发现自定义Consumer Group 的Consum ...
- Kafka学习整理五(Consumer配置)
Property Default Description group.id 用来唯一标识consumer进程所在组的字符串,如果设置同样的group id,表示这些processes都是属于同一个 ...
- Kafka客户端Producer与Consumer
Kafka客户端Producer与Consumer 一.pom.xml 二.相关配置文件 producer.properties log4j.properties base.properties 三. ...
- 线上kafka消息堆积,consumer掉线,怎么办?
线上kafka消息堆积,所有consumer全部掉线,到底怎么回事? 最近处理了一次线上故障,具体故障表现就是kafka某个topic消息堆积,这个topic的相关consumer全部掉线. 整体排查 ...
- Kafka查看topic、consumer group状态命令
最近工作中遇到需要使用kafka的场景,测试消费程序启动后,要莫名的过几十秒乃至几分钟才能成功获取到到topic的partition和offset,而后开始消费数据,于是学习了一下查看kafka br ...
随机推荐
- How to raise exceptions in Delphi
uses SysUtils; procedure RaiseMyException; begin raise Exception.Create('Hallo World!'); end;
- 24. javacript高级程序设计-最佳实践
1. 最佳实践 l 来自其他语言的代码约定可以用于决定何时进行注释,以及如何进行缩进,不过JavaScript需要针对其松散类型的性质创造一些特殊的约定 l javascript应该定义行为,html ...
- openal-1.13 静态编译(mingw32)
1.CMakeLists.txt SET(LIBTYPE SHARED) 改成 SET(LIBTYPE STATIC) 2.include/al/al.h 删除 dllexport 3.include ...
- Django~Test View
https://docs.djangoproject.com/en/1.9/topics/testing/ http://docs.seleniumhq.org/ Automated testing ...
- Match:Censored!(AC自动机+DP+高精度)(POJ 1625)
Censored! 题目大意:给定一些字符,将这些字符组成一个固定长度的字符串,但是字符串不能包含一些禁词,问你有多少种组合方式. 这是一道好题,既然出现了“一些”禁词,那么这题肯定和AC自动机有点 ...
- springmvc 文件下传、上载、预览。以二进制形式存放到数据库(转载)
springmvc 文件上传.下载.预览.以二进制形式存放到数据库.数据库中的关于传入附件的字段我写了2个:一个存放内容accessory,一个存放文件的后缀filetype 上传:首先需要2个必须的 ...
- 【leetcode】Spiral Matrix II (middle)
Given an integer n, generate a square matrix filled with elements from 1 to n2 in spiral order. For ...
- 【python】入门学习(八)
异常处理: python在遇到问题时会自动引发异常,也可以用raise故意引发异常,异常种类必须是已有的 >>> raise IOError('This is a test.') T ...
- 【Git】标签管理
来源:廖雪峰 为什么要标签: 发布一个版本时,我们通常先在版本库中打一个标签(tag),这样,就唯一确定了打标签时刻的版本.将来无论什么时候,取某个标签的版本,就是把那个打标签的时刻的历史版本取出来. ...
- 真机测试无缘无故finish了。程序也没有启动
去钥匙串里边把多余的证书删除, 然后reset xcode - preference - 选中你的appleID - iOS Development - reset