Why is it that consumers connect to zookeeper to retrieve the partition locations? And kafka producers have to connect to one of the brokers to retrieve metadata.

My point is, what exactly is the use of zookeeper when every broker already has all the necessary metadata to tell producers the location to send their messages? Couldn't the brokers send this same information to the consumers?

I can understand why brokers have the metadata, to not have to make a connection to zookeeper each time a new message is sent to them. Is there a function that zookeeper has that I'm missing? I'm finding it hard to think of a reason why zookeeper is really needed within a kafka cluster.

asked Jan 13 '15 at 8:49
Luckl507

8516
 

2 Answers

First of all, zookeeper is needed only for high level consumer. SimpleConsumer does not require zookeeper to work.

The main reason zookeeper is needed for a high level consumer is to track consumed offsets and handle load balancing.

Now in more detail.

Regarding offset tracking, imagine following scenario: you start a consumer, consume 100 messages and shut the consumer down. Next time you start your consumer you'll probably want to resume from your last consumed offset (which is 100), and that means you have to store the maximum consumed offset somewhere. Here's where zookeeper kicks in: it stores offsets for every group/topic/partition. So this way next time you start your consumer it may ask "hey zookeeper, what's the offset I should start consuming from?". Kafka is actually moving towards being able to store offsets not only in zookeeper, but in other storages as well (for now only zookeeper and kafka offset storages are available and i'm not sure kafka storage is fully implemented).

Regarding load balancing, the amount of messages produced can be quite large to be handled by 1 machine and you'll probably want to add computing power at some point. Lets say you have a topic with 100 partitions and to handle this amount of messages you have 10 machines. There are several questions that arise here actually:

  • how should these 10 machines divide partitions between each other?
  • what happens if one of machines die?
  • what happens if you want to add another machine?

And again, here's where zookeeper kicks in: it tracks all consumers in group and each high level consumer is subscribed for changes in this group. The point is that when a consumer appears or disappears, zookeeper notifies all consumers and triggers rebalance so that they split partitions near-equally (e.g. to balance load). This way it guarantees if one of consumer dies others will continue processing partitions that were owned by this consumer.

answered Jan 13 '15 at 9:46
serejja

10k22749
 
1  
Thanks for the answer, this clears it up, it's what i guessed but i couldn't find it anywhere. I also just read that version 0.9 the consumers will no longer use zookeeper, and it is only used by the brokers for leader election etc. – Luckl507 Jan 13 '15 at 9:56 

With kafka 0.9+ the new Consumer API was introduced. New consumers do not need connection to Zookeeper since group balancing is provided by kafka itself.

Why do Kafka consumers connect to zookeeper, and producers get metadata from brokers?的更多相关文章

  1. kafka集群和zookeeper集群的部署,kafka的java代码示例

    来自:http://doc.okbase.net/QING____/archive/19447.html 也可参考: http://blog.csdn.net/21aspnet/article/det ...

  2. 使用不同的namespace让不同的kafka/Storm连接同一个zookeeper

    背景介绍: 需要部署2个kafka独立环境,但是只有一个zookeeper集群. 需要部署2个独立的storm环境,但是只有一个zookeeper集群. ----------------------- ...

  3. org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within

    org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within timeo ...

  4. Unable to connect to zookeeper server within timeout: 5000

    错误 严重: StandardWrapper.Throwable org.springframework.beans.factory.BeanCreationException: Error crea ...

  5. CentOS7 搭建Kafka(一)zookeeper篇

    CentOS7 搭建Kafka(一)zookeeper篇 近几年当红小生Kafka备受各路英雄好汉追捧,一点不比老前辈RabbitMQ和ActiveMQ差,因为流行,所以你就得学啊:我这么懒,肯定是不 ...

  6. Caused by: org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within timeout: 5000

    org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'brandControl ...

  7. apache kafka系列之在zookeeper中存储结构

    1.topic注册信息 /brokers/topics/[topic] : 存储某个topic的partitions所有分配信息 Schema:   {    "version": ...

  8. Kafka学习之(五)搭建kafka集群之Zookeeper集群搭建

    Zookeeper是一种在分布式系统中被广泛用来作为:分布式状态管理.分布式协调管理.分布式配置管理.和分布式锁服务的集群.kafka增加和减少服务器都会在Zookeeper节点上触发相应的事件kaf ...

  9. kafka集群与zookeeper集群 配置过程

    Kafka的集群配置一般有三种方法,即 (1)Single node – single broker集群: (2)Single node – multiple broker集群:    (3)Mult ...

随机推荐

  1. 第一册:lesson ninety one.

    原文:  Poor lan. Has lan sold his house yet? Yes,he has. He sold it last week. Has he moved to his new ...

  2. 行为型---状态者模式(State Pattern)

    状态者模式的介绍 每个对象都有其对应的状态,而每个状态又对应一些相应的行为,如果某个对象有多个状态时,那么就会对应很多的行为.那么对这些状态的判断和根据状态完成的行为,就会导致多重条件语句,并且如果添 ...

  3. .NET 配置文件实用指南

    我想大家对配置文件一定不会陌生,在大部分的项目中都会用到它,在此笔者给出一些配置文件的实用示例. XML配置文件 利用XML格式的配置文件储存连接字符串,再用反射技术读取. using System. ...

  4. PHP生成器细说

    之前写过关于生成器的文章,可能还不够详细,正好群里有朋友在讨论.觉得还是有必要再细说下,如果大家做过Python或者其他语言的,对于生成器应该不陌生.生成器是PHP 5.5.才引入的功能,也许大家觉得 ...

  5. 为什么需要把页面放在WEB-INF文件夹下面?

    1.基于不同的功能 JSP 被放置在不同的目录下 这种方法的问题是这些页面文件容易被偷看到源代码,或被直接调用.某些场合下这可能不是个大问题,可是在特定情形中却可能构成安全隐患.用户可以绕过Strut ...

  6. Codeforces389D(SummerTrainingDay01-J)

    D. Fox and Minimal path time limit per test:1 second memory limit per test:256 megabytes input:stand ...

  7. Angular6+ng-zorro实现登录页面

    一.效果图 二.html代码 <div class="login-container"> <div class="login-box"> ...

  8. Flex 项目属性:flex 布局示例

    flex属性: flex属性是flex-grow, flex-shrink 和 flex-basis的简写,默认值为0 1 auto.后两个属性可选. 该属性有两个快捷值:auto (1 1 auto ...

  9. 腾讯.NET&PHP面试题

    在整个面试过程中,作为面试者的你,角色就是小怪兽,面试官的角色则是奥特曼,更不幸的是,作为小怪兽的你是孤身一人,而奥特曼却往往有好几个助攻,你总是被虐得不要不要的~ 作为复读一年才考上专科的我,遗憾的 ...

  10. loadrunner 脚本优化-集合点设置

    脚本优化-集合点设置 by:授客 QQ:1033553122 添加集合点(Insert->Rendezvous) 当一个集合点被插入,VuGen往Vuser脚本中插入一个lr_rendezvou ...