Why do Kafka consumers connect to zookeeper, and producers get metadata from brokers?


Why is it that consumers connect to zookeeper to retrieve the partition locations? And kafka producers have to connect to one of the brokers to retrieve metadata. My point is, what exactly is the use of zookeeper when every broker already has all the necessary metadata to tell producers the location to send their messages? Couldn't the brokers send this same information to the consumers? I can understand why brokers have the metadata, to not have to make a connection to zookeeper each time a new message is sent to them. Is there a function that zookeeper has that I'm missing? I'm finding it hard to think of a reason why zookeeper is really needed within a kafka cluster. |
|||
First of all, zookeeper is needed only for high level consumer. The main reason zookeeper is needed for a high level consumer is to track consumed offsets and handle load balancing. Now in more detail. Regarding offset tracking, imagine following scenario: you start a consumer, consume 100 messages and shut the consumer down. Next time you start your consumer you'll probably want to resume from your last consumed offset (which is 100), and that means you have to store the maximum consumed offset somewhere. Here's where zookeeper kicks in: it stores offsets for every group/topic/partition. So this way next time you start your consumer it may ask "hey zookeeper, what's the offset I should start consuming from?". Kafka is actually moving towards being able to store offsets not only in zookeeper, but in other storages as well (for now only Regarding load balancing, the amount of messages produced can be quite large to be handled by 1 machine and you'll probably want to add computing power at some point. Lets say you have a topic with 100 partitions and to handle this amount of messages you have 10 machines. There are several questions that arise here actually:
And again, here's where zookeeper kicks in: it tracks all consumers in group and each high level consumer is subscribed for changes in this group. The point is that when a consumer appears or disappears, zookeeper notifies all consumers and triggers rebalance so that they split partitions near-equally (e.g. to balance load). This way it guarantees if one of consumer dies others will continue processing partitions that were owned by this consumer. |
|||||
|


Why do Kafka consumers connect to zookeeper, and producers get metadata from brokers?的更多相关文章
- kafka集群和zookeeper集群的部署,kafka的java代码示例
来自:http://doc.okbase.net/QING____/archive/19447.html 也可参考: http://blog.csdn.net/21aspnet/article/det ...
- 使用不同的namespace让不同的kafka/Storm连接同一个zookeeper
背景介绍: 需要部署2个kafka独立环境,但是只有一个zookeeper集群. 需要部署2个独立的storm环境,但是只有一个zookeeper集群. ----------------------- ...
- org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within
org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within timeo ...
- Unable to connect to zookeeper server within timeout: 5000
错误 严重: StandardWrapper.Throwable org.springframework.beans.factory.BeanCreationException: Error crea ...
- CentOS7 搭建Kafka(一)zookeeper篇
CentOS7 搭建Kafka(一)zookeeper篇 近几年当红小生Kafka备受各路英雄好汉追捧,一点不比老前辈RabbitMQ和ActiveMQ差,因为流行,所以你就得学啊:我这么懒,肯定是不 ...
- Caused by: org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within timeout: 5000
org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'brandControl ...
- apache kafka系列之在zookeeper中存储结构
1.topic注册信息 /brokers/topics/[topic] : 存储某个topic的partitions所有分配信息 Schema: { "version": ...
- Kafka学习之(五)搭建kafka集群之Zookeeper集群搭建
Zookeeper是一种在分布式系统中被广泛用来作为:分布式状态管理.分布式协调管理.分布式配置管理.和分布式锁服务的集群.kafka增加和减少服务器都会在Zookeeper节点上触发相应的事件kaf ...
- kafka集群与zookeeper集群 配置过程
Kafka的集群配置一般有三种方法,即 (1)Single node – single broker集群: (2)Single node – multiple broker集群: (3)Mult ...
随机推荐
- [nodejs] nodejs开发个人博客(五)分配数据
使用回掉大坑进行取数据 能看明白的就看,看不明白的手动滑稽 /** * 首页控制器 */ var router=express.Router(); /*每页条数*/ var pageSize=5; r ...
- 捕获未处理的Promise错误
译者按: 通过监听unhandledrejection事件,可以捕获未处理的Promise错误. 原文: Tracking unhandled rejected Promises 译者: Fundeb ...
- Java自动内存管理机制学习(一):Java内存区域与内存溢出异常
备注:本文引用自<深入理解Java虚拟机第二版> 2.1 运行时数据区域 Java虚拟机在执行Java程序的过程中把它所管理的内存划分为若干个不同的数据区域.这些区域都有各自的用途,以及创 ...
- crontab工具安装和检查
什么是crontab?crontab 是一个用于设置周期性执行任务的工具 重启crond守护进程 systemctl restart crond 查看当前crond状态 systemctl statu ...
- 关系数据库标准语言SQL——概述
SQL是一种介于关系代数与关系演算之间的结构化查询语言,其功能并不仅仅是查询.SQL是一个通用的.功能极强的关系数据库语言.SQL(Structured Query Language)结构化查询语 ...
- Android在程序崩溃或者捕获异常之后重新启动app
在Android应用开发中,偶尔会因为测试的不充分导致一些异常没有被捕获,这时应用会出现异常并强制关闭,这样会导致很不好的用户体验,为了解决这个问题,我们需要捕获相关的异常并做处理. 首先捕获程序崩溃 ...
- Pycharm启动后加载anaconda一直updating indices造成Pycharm闪退甚至电脑崩溃
可能跟anaconda文件夹有一定关系 网上找找解决方案,似乎很多人有同样的困扰! 知乎-pycharm启动后总是不停的updating indices...indexing? stackoverfl ...
- 章节四、4-For循环
一.For循环格式 package introduction5; public class ForLoopDemo { public static void main(String[] args) { ...
- vue的diff算法
前言 我的目标是写一个非常详细的关于diff的干货,所以本文有点长.也会用到大量的图片以及代码举例,目的让看这篇文章的朋友一定弄明白diff的边边角角. 先来了解几个点... 1. 当数据发生变化时, ...
- IPD咨询如何才能真正落地?
文/资深顾问 杨学明 IPD作为先进的产品开发理念,思想起源于PRTM公司,PACE,培思的力量,首先在IBM和波音公司迅速完善,中国是深圳华为公司. 1992年IBM公司利润停止增长,财务困难,IB ...