Why do Kafka consumers connect to zookeeper, and producers get metadata from brokers?


|
Why is it that consumers connect to zookeeper to retrieve the partition locations? And kafka producers have to connect to one of the brokers to retrieve metadata. My point is, what exactly is the use of zookeeper when every broker already has all the necessary metadata to tell producers the location to send their messages? Couldn't the brokers send this same information to the consumers? I can understand why brokers have the metadata, to not have to make a connection to zookeeper each time a new message is sent to them. Is there a function that zookeeper has that I'm missing? I'm finding it hard to think of a reason why zookeeper is really needed within a kafka cluster. |
|||
|
First of all, zookeeper is needed only for high level consumer. The main reason zookeeper is needed for a high level consumer is to track consumed offsets and handle load balancing. Now in more detail. Regarding offset tracking, imagine following scenario: you start a consumer, consume 100 messages and shut the consumer down. Next time you start your consumer you'll probably want to resume from your last consumed offset (which is 100), and that means you have to store the maximum consumed offset somewhere. Here's where zookeeper kicks in: it stores offsets for every group/topic/partition. So this way next time you start your consumer it may ask "hey zookeeper, what's the offset I should start consuming from?". Kafka is actually moving towards being able to store offsets not only in zookeeper, but in other storages as well (for now only Regarding load balancing, the amount of messages produced can be quite large to be handled by 1 machine and you'll probably want to add computing power at some point. Lets say you have a topic with 100 partitions and to handle this amount of messages you have 10 machines. There are several questions that arise here actually:
And again, here's where zookeeper kicks in: it tracks all consumers in group and each high level consumer is subscribed for changes in this group. The point is that when a consumer appears or disappears, zookeeper notifies all consumers and triggers rebalance so that they split partitions near-equally (e.g. to balance load). This way it guarantees if one of consumer dies others will continue processing partitions that were owned by this consumer. |
|||||
|


Why do Kafka consumers connect to zookeeper, and producers get metadata from brokers?的更多相关文章
- kafka集群和zookeeper集群的部署,kafka的java代码示例
来自:http://doc.okbase.net/QING____/archive/19447.html 也可参考: http://blog.csdn.net/21aspnet/article/det ...
- 使用不同的namespace让不同的kafka/Storm连接同一个zookeeper
背景介绍: 需要部署2个kafka独立环境,但是只有一个zookeeper集群. 需要部署2个独立的storm环境,但是只有一个zookeeper集群. ----------------------- ...
- org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within
org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within timeo ...
- Unable to connect to zookeeper server within timeout: 5000
错误 严重: StandardWrapper.Throwable org.springframework.beans.factory.BeanCreationException: Error crea ...
- CentOS7 搭建Kafka(一)zookeeper篇
CentOS7 搭建Kafka(一)zookeeper篇 近几年当红小生Kafka备受各路英雄好汉追捧,一点不比老前辈RabbitMQ和ActiveMQ差,因为流行,所以你就得学啊:我这么懒,肯定是不 ...
- Caused by: org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within timeout: 5000
org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'brandControl ...
- apache kafka系列之在zookeeper中存储结构
1.topic注册信息 /brokers/topics/[topic] : 存储某个topic的partitions所有分配信息 Schema: { "version": ...
- Kafka学习之(五)搭建kafka集群之Zookeeper集群搭建
Zookeeper是一种在分布式系统中被广泛用来作为:分布式状态管理.分布式协调管理.分布式配置管理.和分布式锁服务的集群.kafka增加和减少服务器都会在Zookeeper节点上触发相应的事件kaf ...
- kafka集群与zookeeper集群 配置过程
Kafka的集群配置一般有三种方法,即 (1)Single node – single broker集群: (2)Single node – multiple broker集群: (3)Mult ...
随机推荐
- [转]Node.js中koa使用redis数据库
本文转自:https://blog.csdn.net/offbye/article/details/52452322 Redis是一个常用的Nosql数据库,一般用来代替Memcached做缓存服务, ...
- 【转载】阿里云ECS服务器监控资源使用情况
在阿里云Ecs服务器运维过程中,无论是Centos系统还是Windows系统,有时候我们需要监控分析最新的服务器资源利用率等运行情况,例如最近3个小时CPU使用率情况.内存使用率.网络流入带宽.网络流 ...
- 无法将文件“..\bin\Debug \**.dll”复制到“bin\**.dll”。对路径“bin \**.dll”的访问被拒绝。
1.方法一: 将bin的只读属性去掉,就OK. 2.方法二: 直接关掉项目,重新打开.
- Centos 7.6配置nginx反向代理负载均衡集群
一,实验介绍 利用三台centos7虚拟机搭建简单的nginx反向代理负载集群, 三台虚拟机地址及功能介绍 192.168.2.76 nginx负载均衡器 192.168.2.82 web ...
- T-SQL基础(三)之子查询与表表达式
子查询 在嵌套查询中,最外面查询结果集返回给调用方,称为外部查询.嵌套在外部查询内的查询称为子查询,子查询的结果集供外部查询使用. 根据是否依赖外部查询,可将子查询分为自包含子查询和相关子查询.自包含 ...
- 操作失败: 无法更改关系,因为一个或多个外键属性不可以为 null
报错:操作失败: 无法更改关系,因为一个或多个外键属性不可以为 null . 同时修改主表和从表的数据,想用EF主表T_ReviewPlan中某个对象item删除item对应的从表T_ReviewS ...
- [angularjs] angularjs系列笔记(五)Service
AngularJs中你可以使用自己的服务或使用内建服务,服务是一个函数或对象,以下代码试验$location服务,$http服务,$timeout服务,$intverval服务,创建自定义服务 < ...
- SpringBoot+kafka+ELK分布式日志收集
一.背景 随着业务复杂度的提升以及微服务的兴起,传统单一项目会被按照业务规则进行垂直拆分,另外为了防止单点故障我们也会将重要的服务模块进行集群部署,通过负载均衡进行服务的调用.那么随着节点的增多,各个 ...
- JavaScript中的三种弹窗
---恢复内容开始--- 1.警告窗口(alert) <script type="text/javascript"> alert("我是警告弹框!" ...
- Redirection
Typically, the syntax of these characters is as follows, using < to redirect input, and > to r ...
