kafka4 副本机制

概述

每个分区有n个副本，可以承受n-1个节点故障。

每个副本都有自己的leader，其余都是follower。

zk中存放分区的leader和 follower replica的信息。（get /brokers/topics/mytest2/partitions/0/state）

每个副本存储消息的部分数据在本地的log和offset中，周期性同步到disk，确保消息写入全部副本或不写入任何一个。

leader故障时，消息或者在写入本地log，或者在producer收到ack消息前，resent partition到new leader。

kafka支持的副本模型

a)同步复制：producer从zk中找leader，并发送消息，消息立即写入本地log，而且follower开始pull消息。每个follower将消息写入各自的log后，向leader发送确认回执，leader在本地副本的写入工作均完成并且收到所有follower的确认回执后，再向producer发送确认回执。

b)异步复制：leader的本地log写入完成后即向producer发送确认回执。

原文——摘自《[PACKT]Apache Kafka.pdf》

Replication in Kafka

Before we talk about replication in Kafka, let's talk about message partitioning.In Kafka, message partitioning strategy is used at the Kafka broker end. The decisionabout how the message is partitioned is taken by the producer, and the broker storesthe messages in the same order as they arrive. The number of partitions can beconfigured for each topic within the Kafka broker.

Kafka replication is one of the very important features introduced in Kafka 0.8.Though Kafka is highly scalable, for better durability of messages and high availability of Kafka clusters, replication guarantees that the message will be published and consumed even in case of broker failure, which may be caused by any reason. Here, both producers and consumers are replication aware in Kafka.The following diagram explains replication in Kafka:

Let's discuss the preceding diagram in detail.

In replication, each partition of a message has n replicas and can afford n-1 failures to guarantee message delivery. Out of the n replicas, one replica acts as the lead replica for the rest of the replicas. ZooKeeper keeps the information about the lead replica and the current in-sync follower replica (lead replica maintains the list of all in-sync follower replicas).

Each replica stores its part of the message in local logs and offsets, and is periodically synced to the disk. This process also ensures that either a message is written to all the replicas or to none of them.

If the lead replica fails, either while writing the message partition to its local log or before sending the acknowledgement to the message producer, a message partition is resent by the producer to the new lead broker.

The process of choosing the new lead replica is that all followers' In-sync Replicas (ISRs) register themselves with ZooKeeper. The very first registered replica becomes the new lead replica, and the rest of the registered replicas become the followers.

Kafka supports the following replication modes:

• Synchronous replication: In synchronous replication, a producer first identifies the lead replica from ZooKeeper and publishes the message.As soon as the message is published, it is written to the log of the lead replica and all the followers of the lead start pulling the message, and by using a single channel,the order of messages is ensured. Each follower replica sends an acknowledgement to the lead replica once the message is written to its respective logs. Once replications are complete and all expected acknowledgements are received, the lead replica sends an acknowledgement to the producer.On the consumer side, all the pulling of messages is done from the lead replica.

• Asynchronous replication: The only difference in this mode is that as soon as a lead replica writes the message to its local log, it sends the acknowledgement to the message client and does not wait for the acknowledgements from follower replicas. But as a down side, this mode does not ensure the message delivery in case of broker failure.

翻译

kafka的副本机制
在我们讨论Kafka的副本机制之前，让我们来谈谈消息分区。
在Kafka中，消息分区发生在broker端。
如何分区消息是由producer决定的。
broker存储消息的顺序与它们到达的顺序相同。
可以为broker中的每个主题配置分区数量。

Kafka副本策略是Kafka 0.8中引入的非常重要的功能之一。
虽然Kafka具有高度可扩展性，但副本是为了kafka集群有更好的消息持久性和高可用性。
副本保证了即使在某个broker故障时也可以发布和消费消息。
在这里，生产者和消费者都可以在Kafka中识别副本。
下图说明了Kafka中的副本策略：

我们将详细讨论上图。

在副本策略中，消息的每个分区都有n个副本，并且可以承受n-1个节点故障，保证消息分发。
在n个副本中，1个副本充当其余副本的lead。
ZooKeeper保存关于lead副本和当前同步的follower副本的信息（lead副本维护所有的同步follower副本的列表）

每个副本将消息的一部分存储在本地logs和offsets中，并定期同步到磁盘。此过程还确保可以将消息写入所有副本或不写入任何副本。
如果lead副本故障，则在将消息分区写入其本地log时或在向producer发送确认之前，producer将消息分区重新分发给新的lead broker。
推选新的leader过程就是followers在ZooKeeper中的注册过程，第一个注册的就是leader，其余注册成为followers。

Kafka支持以下复制模式：
•同步复制：在同步复制中，producer首先从ZooKeeper中识别lead副本并发布消息。消息一发布，就会将其写入lead副本的日志中，并且所有follwers都会开始拉消息，并通过使用单信道，确保消息的顺序。一旦将消息写入其各自的日志，每个follwers副本就向该lead副本发送确认。一旦复制完成并且收到所有预期的确认，lead副本就向priducer发送确认。在consumer方面，所有消息的拉取都是从lead副本完成的。
•异步复制：此模式与同步复制的唯一区别是，只要lead副本将消息写入其本地日志，它就会将确认发送到消息客户端，而不会等待来自followers副本的确认。但缺点是，这种模式不能确保在broker故障的情况下传递消息。

练习

1.搭建单节点多broker的kafka后，启动zk和kafka。

[root@hadoop ~]# cd /usr/local/kafka

[root@hadoop kafka]# zookeeper-server-start.sh config/zookeeper.properties

...

[root@hadoop kafka]# kafka-server-start.sh config/server0.properties &

...

[root@hadoop kafka]# kafka-server-start.sh config/server1.properties &

...

[root@hadoop kafka]# kafka-server-start.sh config/server2.properties &

...

[root@hadoop ~]# jps

 QuorumPeerMain

 Kafka

 Kafka

 Kafka

 Jps

2.创建主题（3个分区2个副本）

[root@hadoop ~]# cd /usr/local/kafka

[root@hadoop kafka]# kafka-topics.sh --create --zookeeper localhost: --replication-factor  --partitions  --topic mytest2

Created topic "mytest2".

2.1 查看主题

[root@hadoop ~]# kafka-topics.sh --describe --zookeeper localhost: --topic mytest2

Topic:mytest2    PartitionCount:    ReplicationFactor:    Configs:

    Topic: mytest2    Partition:     Leader:     Replicas: ,    Isr: , #副本0在broker1和broker2上，leader是broker1

    Topic: mytest2    Partition:     Leader:     Replicas: ,    Isr: ,

    Topic: mytest2    Partition:     Leader:     Replicas: ,    Isr: ,

2.2 查看zk上面topic信息（与查看主题信息对应）

[root@hadoop ~]# cd /usr/local/kafka

[root@hadoop kafka]# zkCli.sh -server hadoop: #启动zk客户端

...

[zk: localhost:(CONNECTED) ] ls /brokers/topics/mytest2/partitions

[, , ]

[zk: localhost:(CONNECTED) ] get /brokers/topics/mytest2/partitions//state

{"controller_epoch":,"leader":,"version":,"leader_epoch":,"isr":[,]}

...

[zk: localhost:(CONNECTED) ] get /brokers/topics/mytest2/partitions//state

{"controller_epoch":,"leader":,"version":,"leader_epoch":,"isr":[,]}

...

[zk: localhost:(CONNECTED) ] get /brokers/topics/mytest2/partitions//state

{"controller_epoch":,"leader":,"version":,"leader_epoch":,"isr":[,]}

...

2.3 查看3个broker的日志目录

[root@hadoop ~]# ls /tmp/kafka-logs0

mytest2- mytest2- ...

[root@hadoop ~]# ls /tmp/kafka-logs1

mytest2- mytest2- ...

[root@hadoop ~]# ls /tmp/kafka-logs2

mytest2- mytest2- ...

可见主题是按照replication-factor 2 * partitions 3 = 6 然后在broker中均衡分配的

3.修改主题的副本位置

[root@hadoop kafka]# kafka-topics.sh --alter --zookeeper localhost: replica-assigment :,:,: --topic mytest2

再次查看主题，发现没变，按理说这个命令没错啊？未完待续。。。

kafka4 副本机制的更多相关文章

HDFS副本机制&负载均衡&机架感知&访问方式&健壮性&删除恢复机制&HDFS缺点
副本机制 1.副本摆放策略第一副本:放置在上传文件的DataNode上:如果是集群外提交,则随机挑选一台磁盘不太慢.CPU不太忙的节点上:第二副本:放置在于第一个副本不同的机架的节点上:第三副本:与 ...
kafka副本机制之数据可靠性
一.概述为了提升集群的HA,Kafka从0.8版本开始引入了副本(Replica)机制,增加副本机制后,每个副本可以有多个副本,针对每个分区,都会从副本集(Assigned Replica,AR)中 ...
hdfs深入：03、hdfs的架构以及副本机制和block块存储
HDFS分布式文件系统设计目标 1. 硬件错误由于集群很多时候由数量众多的廉价机组成,使得硬件错误成为常态 2. 数据流访问所有应用以流的方式访问数 ...
深入理解 Kafka 副本机制
一.Kafka集群二.副本机制 2.1 分区和副本 2.2 ISR机制 2.3 不完全的首领选举 2.4 最少同步副本 ...
Kafka 学习之路（五）—— 深入理解Kafka副本机制
一.Kafka集群 Kafka使用Zookeeper来维护集群成员(brokers)的信息.每个broker都有一个唯一标识broker.id,用于标识自己在集群中的身份,可以在配置文件server. ...
Kafka 系列（五）—— 深入理解 Kafka 副本机制
一.Kafka集群 Kafka 使用 Zookeeper 来维护集群成员 (brokers) 的信息.每个 broker 都有一个唯一标识 broker.id,用于标识自己在集群中的身份,可以在配置文 ...
大数据：Hadoop（HDFS 的设计思路、设计目标、架构、副本机制、副本存放策略）
一.HDFS 的设计思路 1)思路切分数据,并进行多副本存储: 2)如果文件只以多副本进行存储,而不进行切分,会有什么问题缺点不管文件多大,都存储在一个节点上,在进行数据处理的时候很难进行并行处 ...
C和C++中的副本机制
函数的形参.return 都有副本机制.数组没有副本机制 (为了节约内存) 函数形参和局部变量的生命周期.函数调用结束后就会被回收.
C语言副本机制
1.除了数组外,其他都有副本机制(包括结构体数组) 2.结构体作为参数具有副本机制,结构体返回值也有副本机制 . 3.函数的参数和返回值都有他的副本机制. #include<stdio.h> ...

随机推荐

[Converge] Training Neural Networks
CS231n Winter 2016: Lecture 5: Neural Networks Part 2 CS231n Winter 2016: Lecture 6: Neural Networks ...
BackgroundWorker学习笔记
1 简介 BackgroundWorker 类允许您在单独的专用线程上运行操作. 耗时的操作(如下载和数据库事务)在长时间运行时可能会导致用户界面 (UI) 似乎处于停止响应状态. 如果您需要能进行响 ...
OSG模拟鼠标事件影响操纵器
viewer->getEventQueue()->mouseButtonPress(0,0,1); viewer->getEventQueue()->mouseMotion(1 ...
[原]Jenkins(七)---jenkins项目编译测试发布由maven构建的web项目
/** * lihaibo * 文章内容都是根据自己工作情况实践得出. * 版权声明:本博客欢迎转发,但请保留原作者信息! http://www.cnblogs.com/horizonli/p/533 ...
kubernetes 创建nginx 容器
参考:http://blog.csdn.net/qq1010885678/article/details/48832067 一个简单的nginx服务器先决条件:你需要拥有的是一个部署完毕并可以正常运 ...
跟bWAPP学WEB安全(PHP代码)--终结篇：文件目录遍历、文件上传、SSRF、CSRF、XXE、文件包含
前言过年过的很不顺,家里领导和我本人接连生病,年前腊月29才都治好出院,大年初六家里的拉布拉多爱犬又因为细小医治无效离开了,没能过年回家,花了好多钱,狗狗还离世了.所以也就没什么心思更新博客.今天初 ...
秒杀应用的MySQL数据库优化
关于秒杀随着双11活动的不断发展,小米饥饿营销模式的兴起,“秒杀”已经成为一个热点词汇.在一些活动中,热销商品会以惊人的速度售罄,比如最近本人在抢购美图M4手机,12点开卖,1分钟之内就被售罄. 秒 ...
快速构建springmvc+spring+swagger2环境
快速构建springmvc+spring+swagger2环境开发工具:Intellij idea jdk: 1.8 开发步骤: 1.创建maven工程,如图建立工程结构 ...
文件下载报错：引发类型为“System.OutOfMemoryException”的异常-.Net 内存溢出
CSDN:http://blog.csdn.net/huwei2003/article/details/53559272 设置了也没有用,于是想到手动清理应用程序池,但又迁配置问题于是改成最后的方式! ...
ML.NET教程之出租车车费预测(回归问题)
理解问题出租车的车费不仅与距离有关,还涉及乘客数量,是否使用信用卡等因素(这是的出租车是指纽约市的).所以并不是一个简单的一元方程问题. 准备数据建立一控制台应用程序工程,新建Data文件夹,在其 ...

kafka4 副本机制

kafka4 副本机制的更多相关文章

随机推荐

热门专题