Why is it that consumers connect to zookeeper to retrieve the partition locations? And kafka producers have to connect to one of the brokers to retrieve metadata.

My point is, what exactly is the use of zookeeper when every broker already has all the necessary metadata to tell producers the location to send their messages? Couldn't the brokers send this same information to the consumers?

I can understand why brokers have the metadata, to not have to make a connection to zookeeper each time a new message is sent to them. Is there a function that zookeeper has that I'm missing? I'm finding it hard to think of a reason why zookeeper is really needed within a kafka cluster.

asked Jan 13 '15 at 8:49
Luckl507

8516
 

2 Answers

First of all, zookeeper is needed only for high level consumer. SimpleConsumer does not require zookeeper to work.

The main reason zookeeper is needed for a high level consumer is to track consumed offsets and handle load balancing.

Now in more detail.

Regarding offset tracking, imagine following scenario: you start a consumer, consume 100 messages and shut the consumer down. Next time you start your consumer you'll probably want to resume from your last consumed offset (which is 100), and that means you have to store the maximum consumed offset somewhere. Here's where zookeeper kicks in: it stores offsets for every group/topic/partition. So this way next time you start your consumer it may ask "hey zookeeper, what's the offset I should start consuming from?". Kafka is actually moving towards being able to store offsets not only in zookeeper, but in other storages as well (for now only zookeeper and kafka offset storages are available and i'm not sure kafka storage is fully implemented).

Regarding load balancing, the amount of messages produced can be quite large to be handled by 1 machine and you'll probably want to add computing power at some point. Lets say you have a topic with 100 partitions and to handle this amount of messages you have 10 machines. There are several questions that arise here actually:

  • how should these 10 machines divide partitions between each other?
  • what happens if one of machines die?
  • what happens if you want to add another machine?

And again, here's where zookeeper kicks in: it tracks all consumers in group and each high level consumer is subscribed for changes in this group. The point is that when a consumer appears or disappears, zookeeper notifies all consumers and triggers rebalance so that they split partitions near-equally (e.g. to balance load). This way it guarantees if one of consumer dies others will continue processing partitions that were owned by this consumer.

answered Jan 13 '15 at 9:46
serejja

10k22749
 
1  
Thanks for the answer, this clears it up, it's what i guessed but i couldn't find it anywhere. I also just read that version 0.9 the consumers will no longer use zookeeper, and it is only used by the brokers for leader election etc. – Luckl507 Jan 13 '15 at 9:56 

With kafka 0.9+ the new Consumer API was introduced. New consumers do not need connection to Zookeeper since group balancing is provided by kafka itself.

Why do Kafka consumers connect to zookeeper, and producers get metadata from brokers?的更多相关文章

  1. kafka集群和zookeeper集群的部署,kafka的java代码示例

    来自:http://doc.okbase.net/QING____/archive/19447.html 也可参考: http://blog.csdn.net/21aspnet/article/det ...

  2. 使用不同的namespace让不同的kafka/Storm连接同一个zookeeper

    背景介绍: 需要部署2个kafka独立环境,但是只有一个zookeeper集群. 需要部署2个独立的storm环境,但是只有一个zookeeper集群. ----------------------- ...

  3. org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within

    org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within timeo ...

  4. Unable to connect to zookeeper server within timeout: 5000

    错误 严重: StandardWrapper.Throwable org.springframework.beans.factory.BeanCreationException: Error crea ...

  5. CentOS7 搭建Kafka(一)zookeeper篇

    CentOS7 搭建Kafka(一)zookeeper篇 近几年当红小生Kafka备受各路英雄好汉追捧,一点不比老前辈RabbitMQ和ActiveMQ差,因为流行,所以你就得学啊:我这么懒,肯定是不 ...

  6. Caused by: org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within timeout: 5000

    org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'brandControl ...

  7. apache kafka系列之在zookeeper中存储结构

    1.topic注册信息 /brokers/topics/[topic] : 存储某个topic的partitions所有分配信息 Schema:   {    "version": ...

  8. Kafka学习之(五)搭建kafka集群之Zookeeper集群搭建

    Zookeeper是一种在分布式系统中被广泛用来作为:分布式状态管理.分布式协调管理.分布式配置管理.和分布式锁服务的集群.kafka增加和减少服务器都会在Zookeeper节点上触发相应的事件kaf ...

  9. kafka集群与zookeeper集群 配置过程

    Kafka的集群配置一般有三种方法,即 (1)Single node – single broker集群: (2)Single node – multiple broker集群:    (3)Mult ...

随机推荐

  1. 水晶报表Crystal 无效索引

    这几天项目用到水晶报表做报表打印,没有前辈指导,都自己摸着石头过河,真是痛并快乐着.其实水晶报表还是挺好用的,对初学者也并不难(我就是初学者).昨天遇到一个问题:无效索引 ……开始以为是报表设置的问题 ...

  2. SVN、CVS、VSS区别

    废话不多说,撸起袖子敲黑板 !~~ #首先向大家简要描述一下SVN与CVS.VSS的介绍与对比: 介绍: 三种都是版本控制软件, 多数用于源代码管理1.CVS(Concurrent Version S ...

  3. mysql使用存储过程&函数实现批量插入

    写这边文章的目的,是想结合mysql 存储过程+函数完成一个批量删除的功能吧...正好也好加深下对procedure和function的熟练操作吧...废话不多说,我就直接上表结构啦哈,如下: cre ...

  4. 支持MySelf

    编程思路分享,BUG上报,主推Java Web方向与软件架构设计,不定期推出系列针对性基础教程,项目均放置于GitHub,个人运营精力有限,感谢支持. 交流群:628793702 个人技术公众,欢迎关 ...

  5. Netty 系列七(那些开箱即用的 ChannelHandler).

    一.前言 Netty 为许多通用协议提供了编解码器和处理器,几乎可以开箱即用, 这减少了你在那些相当繁琐的事务上本来会花费的时间与精力.另外,这篇文章中,就不涉及 Netty 对 WebSocket协 ...

  6. CDRAF之Service mesh

    最近翻看一些网上的文章,偶然发现我们的CDRAF其实就是Service mesh的C++版本.不管从架构的理念上,或者功能的支持上面,基本完全符合.发几个简单的文章链接,等有时间的时候,再来详细描述. ...

  7. Debug始于71年前

    摘要: 纪念Grace Hopper发现世界上第一个计算机BUG! 1947年9月9日,Grace Hopper的计算科学团队在哈佛的哈弗Mark II电脑运行程序时遇到一个技术故障.她在发生故障的M ...

  8. 前端入门6-JavaScript客户端api&jQuery

    本篇文章已授权微信公众号 dasu_Android(大苏)独家发布 声明 本系列文章内容全部梳理自以下四个来源: <HTML5权威指南> <JavaScript权威指南> MD ...

  9. react-conponent-secondesElapsed

    <!DOCTYPE html> <html> <head> <script src="../../build/react.js">& ...

  10. es6 语法 (类与对象)

    { // 基本定义和生成实例 class Parent{ constructor(name='mukewang'){ this.name=name; } } let v_parent1=new Par ...