Configuring High Availability and Consistency for Apache Kafka
To achieve high availability and consistency targets, adjust the following parameters to meet your requirements:
Replication Factor
The default replication factor for new topics is one. For high availability production systems, Cloudera recommends setting the replication factor to at least three. This requires at least three Kafka brokers.
To change the replication factor, navigate to Kafka Service > Configuration > Service-Wide. Set Replication factor to 3, click Save Changes, and restart the Kafka service.
Preferred Leader Election
Kafka is designed with failure in mind. At some point in time, web communications or storage resources fail. When a broker goes offline, one of the replicas becomes the new leader for the partition. When the broker comes back online, it has no leader partitions. Kafka keeps track of which machine is configured to be the leader. Once the original broker is back up and in a good state, Kafka restores the information it missed in the interim and makes it the partition leader once more.
Preferred Leader Election is enabled by default, and should occur automatically unless you actively disable the feature. Typically, the leader is restored within five minutes of coming back online. If the preferred leader is offline for a very long time, though, it might need additional time to restore its required information from the replica.
There is a small possibility that some messages might be lost when switching back to the preferred leader. You can minimize the chance of lost data by setting the acks property on the Producer to all. See Acknowledgements.
Unclean Leader Election
Enable unclean leader election to allow an out-of-sync replica to become the leader and preserve the availability of the partition. With unclean leader election, messages that were not synced to the new leader are lost. This provides balance between consistency (guaranteed message delivery) and availability. With unclean leader election disabled, if a broker containing the leader replica for a partition becomes unavailable, and no in-sync replica exists to replace it, the partition becomes unavailable until the leader replica or another in-sync replica is back online.
To enable unclean leader election, navigate to Kafka Service > Configuration > Service-Wide. Check the box labeled Enable unclean leader election, click Save Changes, and restart the Kafka service.
Acknowledgements
When writing or configuring a Kafka producer, you can choose how many replicas commit a new message before the message is acknowledged using the acks property.
Set acks to 0 (immediately acknowledge the message without waiting for any brokers to commit), 1(acknowledge after the leader commits the message), or all (acknowledge after all in-sync replicas are committed) according to your requirements. Setting acks to all provides the highest consistency guarantee at the expense of slower writes to the cluster.
Minimum In-sync Replicas
You can set the minimum number of in-sync replicas (ISRs) that must be available for the producer to successfully send messages to a partition using the min.insync.replicas setting. If min.insync.replicas is set to 2 and acks is set to all, each message must be written successfully to at least two replicas. This guarantees that the message is not lost unless both hosts crash.
It also means that if one of the hosts crashes, the partition is no longer available for writes. Similar to the unclean leader election configuration, setting min.insync.replicas is a balance between higher consistency (requiring writes to more than one broker) and higher availability (allowing writes when fewer brokers are available).
The leader is considered one of the in-sync replicas. It is included in the count of total min.insync.replicas. However, leaders are special, in that producers and consumers can only interact with leaders in a Kafka cluster.
To configure min.insync.replicas at the cluster level, navigate to Kafka Service > Configuration > Service-Wide. Set Minimum number of replicas in ISR to the desired value, click Save Changes, and restart the Kafka service.
min.insync.replicas.per.topic=topic_name_1:value,topic_name_2:value
Replace topic_name_n with the topic names, and replace value with the desired minimum number of in-sync replicas.
/usr/bin/kafka-topics --alter --zookeeper zk01.example.com:2181 --topic topicname \
--config min.insync.replicas=2
Kafka MirrorMaker
Kafka mirroring enables maintaining a replica of an existing Kafka cluster. You can configure MirrorMaker directly in Cloudera Manager 5.4 and higher.
The most important configuration setting is Destination broker list. This is a list of brokers on the destination cluster. You should list more than one, to support high availability, but you do not need to list all brokers.
You can create a Topic whitelist that represents the exclusive set of topics to replicate. The Topic blacklist setting has been removed in CDK 2.0 and higher Powered By Apache Kafka.
Note: The Avoid Data Loss option from earlier releases has been removed in favor of automatically setting the following properties. Also note that MirrorMaker starts correctly if you enter the numeric values in the configuration snippet (rather than using "max integer" for retries and "max long" for max.block.ms).
- Producer settings
- acks=all
- retries=2147483647
- max.block.ms=9223372036854775807
- Consumer setting
- auto.commit.enable=false
- MirrorMaker setting
- abort.on.send.failure=true
Configuring High Availability and Consistency for Apache Kafka的更多相关文章
- Configuring Apache Kafka for Performance and Resource Management
Apache Kafka is optimized for small messages. According to benchmarks, the best performance occurs w ...
- Configuring Apache Kafka Security
This topic describes additional steps you can take to ensure the safety and integrity of your data s ...
- Understanding When to use RabbitMQ or Apache Kafka
https://content.pivotal.io/rabbitmq/understanding-when-to-use-rabbitmq-or-apache-kafka How do humans ...
- Apache Kafka - How to Load Test with JMeter
In this article, we are going to look at how to load test Apache Kafka, a distributed streaming plat ...
- Apache Kafka: Next Generation Distributed Messaging System---reference
Introduction Apache Kafka is a distributed publish-subscribe messaging system. It was originally dev ...
- Spring for Apache Kafka
官方文档详见:http://docs.spring.io/spring-kafka/docs/1.0.2.RELEASE/reference/htmlsingle/ Authors Gary Russ ...
- How Cigna Tuned Its Spark Streaming App for Real-time Processing with Apache Kafka
Explore the configuration changes that Cigna’s Big Data Analytics team has made to optimize the perf ...
- How-to: Do Real-Time Log Analytics with Apache Kafka, Cloudera Search, and Hue
Cloudera recently announced formal support for Apache Kafka. This simple use case illustrates how to ...
- Flafka: Apache Flume Meets Apache Kafka for Event Processing
The new integration between Flume and Kafka offers sub-second-latency event processing without the n ...
随机推荐
- 深入并发包 ConcurrentHashMap 源码解析
以前写过介绍HashMap的文章,文中提到过HashMap在put的时候,插入的元素超过了容量(由负载因子决定)的范围就会触发扩容操作,就是rehash,这个会重新将原数组的内容重新hash到新的扩容 ...
- asp.net core系列 40 Web 应用MVC 介绍与详细示例
一. MVC介绍 MVC架构模式有助于实现关注点分离.视图和控制器均依赖于模型. 但是,模型既不依赖于视图,也不依赖于控制器. 这是分离的一个关键优势. 这种分离允许模型独立于可视化展示进行构建和测试 ...
- transient和synchronized的使用
transient和synchronized这两个关键字没什么联系,这两天用到了它们,所以总结一下,两个关键字做个伴! transient 持久化时不被存储,当你的对象实现了Serializable接 ...
- Python的魔法函数
概要 如何定义一个类 类里通常包含什么 各个部分解释 类是怎么来的 type和object的关系 判断对象的类型 上下文管理器 类结构 #!/usr/bin/env python # -*- codi ...
- 记一次查询超时的解决方案The timeout period elapsed......
问题描述 在数据库中执行查询语句,大约1秒钟查询出来,在C#中用ado进行连接查询,一直等待很久未查出结果,最后抛出查询超时异常. 异常内容如下: Execution Timeout Expired. ...
- Perl多线程(2):数据共享和线程安全
线程数据共享 在介绍Perl解释器线程的时候一直强调,Perl解释器线程在被创建出来的时候,将从父线程中拷贝数据到子线程中,使得数据是线程私有的,并且数据是线程隔离的.如果真的想要在线程间共享数据,需 ...
- 权限管理系统之项目框架搭建并集成日志、mybatis和分页
前一篇博客中使用LayUI实现了列表页面和编辑页面的显示交互,但列表页面table渲染的数据是固定数据,本篇博客主要是将固定数据变成数据库数据. 一.项目框架 首先要解决的是项目框架问题,搭建什么样的 ...
- [T-SQL] NCL INDEX 欄位選擇效能影響-解析
因為這篇文章寫的比較長一些,我就將總結先列出來 總結 1. 除了WHERE條件外,JOINColumn除了記得建立索引,也要注意到選擇性的高低,如果真的找不到可用的Column,可以考慮在兩邊關聯的表 ...
- IT公司PM沟通那儿些事(一)
本质:传递信息 沟通是不同的行为主体,通过各种载体实现信息的双向流动,形成行为主体的感知,以达到特定目标的行为过程. 信息的准确性弥足珍贵,在工作中,沟通传递的是应该是信息本身,而非情绪. 目标:解决 ...
- 3 Redis 的常用五大数据类型
2016-12-21 14:54:20 该系列文章链接NoSQL 数据库简介Redis的安装及及一些杂项基础知识Redis 的常用五大数据类型(key,string,hash,list,set,zse ...