Why is it that consumers connect to zookeeper to retrieve the partition locations? And kafka producers have to connect to one of the brokers to retrieve metadata.

My point is, what exactly is the use of zookeeper when every broker already has all the necessary metadata to tell producers the location to send their messages? Couldn't the brokers send this same information to the consumers?

I can understand why brokers have the metadata, to not have to make a connection to zookeeper each time a new message is sent to them. Is there a function that zookeeper has that I'm missing? I'm finding it hard to think of a reason why zookeeper is really needed within a kafka cluster.

asked Jan 13 '15 at 8:49
Luckl507

8516
 

2 Answers

First of all, zookeeper is needed only for high level consumer. SimpleConsumer does not require zookeeper to work.

The main reason zookeeper is needed for a high level consumer is to track consumed offsets and handle load balancing.

Now in more detail.

Regarding offset tracking, imagine following scenario: you start a consumer, consume 100 messages and shut the consumer down. Next time you start your consumer you'll probably want to resume from your last consumed offset (which is 100), and that means you have to store the maximum consumed offset somewhere. Here's where zookeeper kicks in: it stores offsets for every group/topic/partition. So this way next time you start your consumer it may ask "hey zookeeper, what's the offset I should start consuming from?". Kafka is actually moving towards being able to store offsets not only in zookeeper, but in other storages as well (for now only zookeeper and kafka offset storages are available and i'm not sure kafka storage is fully implemented).

Regarding load balancing, the amount of messages produced can be quite large to be handled by 1 machine and you'll probably want to add computing power at some point. Lets say you have a topic with 100 partitions and to handle this amount of messages you have 10 machines. There are several questions that arise here actually:

  • how should these 10 machines divide partitions between each other?
  • what happens if one of machines die?
  • what happens if you want to add another machine?

And again, here's where zookeeper kicks in: it tracks all consumers in group and each high level consumer is subscribed for changes in this group. The point is that when a consumer appears or disappears, zookeeper notifies all consumers and triggers rebalance so that they split partitions near-equally (e.g. to balance load). This way it guarantees if one of consumer dies others will continue processing partitions that were owned by this consumer.

answered Jan 13 '15 at 9:46
serejja

10k22749
 
1  
Thanks for the answer, this clears it up, it's what i guessed but i couldn't find it anywhere. I also just read that version 0.9 the consumers will no longer use zookeeper, and it is only used by the brokers for leader election etc. – Luckl507 Jan 13 '15 at 9:56 

With kafka 0.9+ the new Consumer API was introduced. New consumers do not need connection to Zookeeper since group balancing is provided by kafka itself.

Why do Kafka consumers connect to zookeeper, and producers get metadata from brokers?的更多相关文章

  1. kafka集群和zookeeper集群的部署,kafka的java代码示例

    来自:http://doc.okbase.net/QING____/archive/19447.html 也可参考: http://blog.csdn.net/21aspnet/article/det ...

  2. 使用不同的namespace让不同的kafka/Storm连接同一个zookeeper

    背景介绍: 需要部署2个kafka独立环境,但是只有一个zookeeper集群. 需要部署2个独立的storm环境,但是只有一个zookeeper集群. ----------------------- ...

  3. org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within

    org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within timeo ...

  4. Unable to connect to zookeeper server within timeout: 5000

    错误 严重: StandardWrapper.Throwable org.springframework.beans.factory.BeanCreationException: Error crea ...

  5. CentOS7 搭建Kafka(一)zookeeper篇

    CentOS7 搭建Kafka(一)zookeeper篇 近几年当红小生Kafka备受各路英雄好汉追捧,一点不比老前辈RabbitMQ和ActiveMQ差,因为流行,所以你就得学啊:我这么懒,肯定是不 ...

  6. Caused by: org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within timeout: 5000

    org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'brandControl ...

  7. apache kafka系列之在zookeeper中存储结构

    1.topic注册信息 /brokers/topics/[topic] : 存储某个topic的partitions所有分配信息 Schema:   {    "version": ...

  8. Kafka学习之(五)搭建kafka集群之Zookeeper集群搭建

    Zookeeper是一种在分布式系统中被广泛用来作为:分布式状态管理.分布式协调管理.分布式配置管理.和分布式锁服务的集群.kafka增加和减少服务器都会在Zookeeper节点上触发相应的事件kaf ...

  9. kafka集群与zookeeper集群 配置过程

    Kafka的集群配置一般有三种方法,即 (1)Single node – single broker集群: (2)Single node – multiple broker集群:    (3)Mult ...

随机推荐

  1. [转]php模拟post提交请求,调用接口

    本文转自:https://www.cnblogs.com/jiqing9006/p/3949190.html /** * 模拟post进行url请求 * @param string $url * @p ...

  2. SQL 注入漏洞

    首先要知道sql注入形成的原因:用户输入的数据被sql解释器执行 sql注入又分:数字型,字符型,cookie 注入,post注入,延时注入,搜索注入,base64注入 如何甄别一个模块是否有sql注 ...

  3. 模拟获取post数据的方式

    使用下面两种方法可以获取post数据 .通过$HTTP_RAW_POST_DATA获取 $post=$GLOBALS['HTTP_RAW_POST_DATA']; 但需要修改相应的php.ini指令 ...

  4. 【Spring】28、Spring中基于Java的配置@Configuration和@Bean用法.代替xml配置文件

    Spring中为了减少xml中配置,可以生命一个配置类(例如SpringConfig)来对bean进行配置. 一.首先,需要xml中进行少量的配置来启动Java配置: <?xml version ...

  5. linux /mac 下 go环境变量配置

    安装了go语言之后,还要设置路径,如果不设置路径,则执行 go 的时候会提示 go: command not found,提示的意思是没有这个命令行.这个是因为还没有设置PATH路径. 设置路径的方式 ...

  6. js作用域面试题大全

    什么是作用域:浏览器给js的生存环境叫作用域. 什么是变量提升: Js代码执行前,浏览器会给一个全局作用域window Window分两个模块一个是存储模块一个是执行模块 存储模块找到所有的var和f ...

  7. Mybatis框架基础支持层——反射工具箱之Reflector&ReflectorFactory(3)

    说明:Reflector是Mybatis反射工具的基础,每个Reflector对应一个类,在Reflector中封装有该类的元信息, 以及基于类信息的一系列反射应用封装API public class ...

  8. JVM相关知识

    Java虚拟机学习分享最近主要在学习JVM相关知识,-知识主要来源<深入理解JAVA虚拟机>,深有感触,结合自己的理解,整理出一些经验,由于篇幅较长,就把链接帖出来,希望对大家有所帮助: ...

  9. linux下允许和禁止root远程登录的方法

    1.vi /etc/ssh/sshd_config,将PermitRootLogin的值改成yes,并保存 PermitRootLogin yes 另外需要添加 AllowUsers root SA ...

  10. VMWAR-workstatuon : 安装win10、server 2008 r2、server 2012 r2

    最新版的VMWAR 不是很文档,建议大家还是下载稳定版,截止当前最新版的为15,用了,有点问题. 换成14~ 可以了.(15创建虚拟机安装vmware tools 怎么都安装不了). 关于创建虚拟机, ...