To achieve high availability and consistency targets, adjust the following parameters to meet your requirements:

Replication Factor

The default replication factor for new topics is one. For high availability production systems, Cloudera recommends setting the replication factor to at least three. This requires at least three Kafka brokers.

To change the replication factor, navigate to Kafka Service > Configuration > Service-Wide. Set Replication factor to 3, click Save Changes, and restart the Kafka service.

Preferred Leader Election

Kafka is designed with failure in mind. At some point in time, web communications or storage resources fail. When a broker goes offline, one of the replicas becomes the new leader for the partition. When the broker comes back online, it has no leader partitions. Kafka keeps track of which machine is configured to be the leader. Once the original broker is back up and in a good state, Kafka restores the information it missed in the interim and makes it the partition leader once more.

Preferred Leader Election is enabled by default, and should occur automatically unless you actively disable the feature. Typically, the leader is restored within five minutes of coming back online. If the preferred leader is offline for a very long time, though, it might need additional time to restore its required information from the replica.

There is a small possibility that some messages might be lost when switching back to the preferred leader. You can minimize the chance of lost data by setting the acks property on the Producer to all. See Acknowledgements.

Unclean Leader Election

Enable unclean leader election to allow an out-of-sync replica to become the leader and preserve the availability of the partition. With unclean leader election, messages that were not synced to the new leader are lost. This provides balance between consistency (guaranteed message delivery) and availability. With unclean leader election disabled, if a broker containing the leader replica for a partition becomes unavailable, and no in-sync replica exists to replace it, the partition becomes unavailable until the leader replica or another in-sync replica is back online.

To enable unclean leader election, navigate to Kafka Service > Configuration > Service-Wide. Check the box labeled Enable unclean leader election, click Save Changes, and restart the Kafka service.

Acknowledgements

When writing or configuring a Kafka producer, you can choose how many replicas commit a new message before the message is acknowledged using the acks property.

Set acks to 0 (immediately acknowledge the message without waiting for any brokers to commit), 1(acknowledge after the leader commits the message), or all (acknowledge after all in-sync replicas are committed) according to your requirements. Setting acks to all provides the highest consistency guarantee at the expense of slower writes to the cluster.

Minimum In-sync Replicas

You can set the minimum number of in-sync replicas (ISRs) that must be available for the producer to successfully send messages to a partition using the min.insync.replicas setting. If min.insync.replicas is set to 2 and acks is set to all, each message must be written successfully to at least two replicas. This guarantees that the message is not lost unless both hosts crash.

It also means that if one of the hosts crashes, the partition is no longer available for writes. Similar to the unclean leader election configuration, setting min.insync.replicas is a balance between higher consistency (requiring writes to more than one broker) and higher availability (allowing writes when fewer brokers are available).

The leader is considered one of the in-sync replicas. It is included in the count of total min.insync.replicas. However, leaders are special, in that producers and consumers can only interact with leaders in a Kafka cluster.

To configure min.insync.replicas at the cluster level, navigate to Kafka Service > Configuration > Service-Wide. Set Minimum number of replicas in ISR to the desired value, click Save Changes, and restart the Kafka service.

To set this parameter on a per-topic basis, navigate to Kafka Service > Configuration > Kakfa broker Default Group > Advanced, and add the following to the Kafka Broker Advanced Configuration Snippet (Safety Valve) for kafka.properties:

min.insync.replicas.per.topic=topic_name_1:value,topic_name_2:value

Replace topic_name_n with the topic names, and replace value with the desired minimum number of in-sync replicas.

You can also set this parameter using the /usr/bin/kafka-topics --alter command for each topic. For example:

/usr/bin/kafka-topics --alter --zookeeper zk01.example.com:2181 --topic topicname \
--config min.insync.replicas=2

Kafka MirrorMaker

Kafka mirroring enables maintaining a replica of an existing Kafka cluster. You can configure MirrorMaker directly in Cloudera Manager 5.4 and higher.

The most important configuration setting is Destination broker list. This is a list of brokers on the destination cluster. You should list more than one, to support high availability, but you do not need to list all brokers.

You can create a Topic whitelist that represents the exclusive set of topics to replicate. The Topic blacklist setting has been removed in CDK 2.0 and higher Powered By Apache Kafka.

Note: The Avoid Data Loss option from earlier releases has been removed in favor of automatically setting the following properties. Also note that MirrorMaker starts correctly if you enter the numeric values in the configuration snippet (rather than using "max integer" for retries and "max long" for max.block.ms).

  1. Producer settings

    • acks=all
    • retries=2147483647
    • max.block.ms=9223372036854775807
  2. Consumer setting
    • auto.commit.enable=false
  3. MirrorMaker setting
    • abort.on.send.failure=true

Configuring High Availability and Consistency for Apache Kafka的更多相关文章

  1. Configuring Apache Kafka for Performance and Resource Management

    Apache Kafka is optimized for small messages. According to benchmarks, the best performance occurs w ...

  2. Configuring Apache Kafka Security

    This topic describes additional steps you can take to ensure the safety and integrity of your data s ...

  3. Understanding When to use RabbitMQ or Apache Kafka

    https://content.pivotal.io/rabbitmq/understanding-when-to-use-rabbitmq-or-apache-kafka How do humans ...

  4. Apache Kafka - How to Load Test with JMeter

    In this article, we are going to look at how to load test Apache Kafka, a distributed streaming plat ...

  5. Apache Kafka: Next Generation Distributed Messaging System---reference

    Introduction Apache Kafka is a distributed publish-subscribe messaging system. It was originally dev ...

  6. Spring for Apache Kafka

    官方文档详见:http://docs.spring.io/spring-kafka/docs/1.0.2.RELEASE/reference/htmlsingle/ Authors Gary Russ ...

  7. How Cigna Tuned Its Spark Streaming App for Real-time Processing with Apache Kafka

    Explore the configuration changes that Cigna’s Big Data Analytics team has made to optimize the perf ...

  8. How-to: Do Real-Time Log Analytics with Apache Kafka, Cloudera Search, and Hue

    Cloudera recently announced formal support for Apache Kafka. This simple use case illustrates how to ...

  9. Flafka: Apache Flume Meets Apache Kafka for Event Processing

    The new integration between Flume and Kafka offers sub-second-latency event processing without the n ...

随机推荐

  1. 剖析HBase负载均衡和性能指标

    1.概述 在分布式系统中,负载均衡是一个非常重要的功能,在HBase中通过Region的数量来实现负载均衡,HBase中可以通过hbase.master.loadbalancer.class来实现自定 ...

  2. 【微信小程序云开发】从陌生到熟悉

    前言 微信小程序在9月10号正式上线了云开发的功能,弱化后端和运维概念,以前开发一个小程序需要申请一个小程序,准备一个https的域名,开发需要一个前端一个服务端,有了云开发只有申请一个小程序,一个前 ...

  3. LeetCode专题-Python实现之第20题:Valid Parentheses

    导航页-LeetCode专题-Python实现 相关代码已经上传到github:https://github.com/exploitht/leetcode-python 文中代码为了不动官网提供的初始 ...

  4. 设计模式总结篇系列:建造者模式(Builder)

    关于建造者模式网上有很多文章,也有些不同的理解.在此结合网上其他文章对建造者模式进行总结. 总体说来,建造者模式适合于一个具有较多的零件(属性)的产品(对象)的创建过程.根据产品创建过程中零件的构造是 ...

  5. MaxCompute/DataWorks权限问题排查建议

    MaxCompute/DataWorks权限问题排查建议 __前提:__MaxCompute与DataWorks为两个产品,在权限体系上既有交集又要一定的差别.在权限问题之前需了解两个产品独特的权限体 ...

  6. springboot情操陶冶-web配置(八)

    本文关注应用的安全方面,涉及校验以及授权方面,以springboot自带的security板块作为讲解的内容 实例 建议用户可直接路由至博主的先前博客spring security整合cas方案.本文 ...

  7. springboot情操陶冶-web配置(六)

    本文则针对数据库的连接配置作下简单的分析,方便笔者理解以及后续的查阅 栗子当先 以我们经常用的mybatis数据库持久框架来操作mysql服务为例 环境依赖 1.JDK v1.8+ 2.springb ...

  8. #7 Python代码调试

    前言 Python已经学了这么久了,你现在已经长大了,该学会自己调试代码了!相信大家在编写程序过程中会遇到大量的错误信息,我也不例外的啦-遇到这些问题该怎么解决呢?使用最多的方法就是使用print打印 ...

  9. Tomcat多实例部署

    前言 以前总是采用很Low的方式太同一台服务器上部署多个Web应用,步骤是这样的:Copy Tomcat目录-->更改conf/server.xml三个端口号----->部署war包--- ...

  10. Maven(十一)导入手动创建的Maven 工程

    传统的导入方式并不能导入手动创建的Maven工程 因为eclipse项目必须有如图所示文件,才被认为是Eclipse工程 使用Maven方式导入 导入选项中并没有把项目复制到工作空间的选项,这是与传统 ...