KafkaConsumer assign VS subscribe
背景
在kafka中,正常情况下,同一个group.id下的不同消费者不会消费同样的partition,也即某个partition在任何时刻都只能被具有相同group.id的consumer中的一个消费。
也正是这个机制才能保证kafka的重要特性:
- 1、可以通过增加partitions和consumer来提升吞吐量;
- 2、保证同一份消息不会被消费多次。
在KafkaConsumer类中(官方API),消费者可以通过assign和subscribe两种方式指定要消费的topic-partition。具体的源码可以参考下文,
这两个接口貌似是完成相同的功能,但是还有细微的差别,初次使用的同学可能感到困惑,下面就详细介绍下两者的区别。
对比结果
- KafkaConsumer.subscribe() : 为consumer自动分配partition,有内部算法保证topic-partition以最优的方式均匀分配给同group下的不同consumer。 
- KafkaConsumer.assign() : 为consumer手动、显示的指定需要消费的topic-partitions,不受group.id限制,相当与指定的group无效(this method does not use the consumer's group management)。 
测试代码
public class KafkaManualAssignTest {
    private static final Logger logger = LoggerFactory.getLogger(KafkaManualAssignTest.class);
    private static Properties props = new Properties();
    private static KafkaConsumer<String, String> c1, c2;
    private static final String brokerList = "localhost:9092";
    static {
        props.put("bootstrap.servers", brokerList);
        props.put("group.id", "assignTest");
        props.put("auto.offset.reset", "earliest");
        props.put("enable.auto.commit", "true");
        props.put("session.timeout.ms", "30000");
        props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        c1 = new KafkaConsumer<String, String>(props);
        c2 = new KafkaConsumer<String, String>(props);
    }
    public static void main(String[] args) {
        TopicPartition tp = new TopicPartition("topic", 0);
        // 采用assign方式显示的为consumer指定需要消费的topic, 具有相同group.id的两个消费者
        // 各自消费了一份数据, 出现了数据的重复消费
        c1.assign(Arrays.asList(tp));
        c2.assign(Arrays.asList(tp));
        // 采用subscribe方式, 利用broker为consumer自动分配topic-partitions,
        // 两个消费者各自消费一个partition, 数据互补, 无交叉.
        // c1.subscribe(Arrays.asList("topic"));
        // c2.subscribe(Arrays.asList("topic"));
        while (true) {
            ConsumerRecords<String, String> msg1 = c1.poll(1000L);
            if (msg1 != null) {
                for (ConsumerRecord m1 : msg1) {
                    logger.info("m1 offset : {} , value : {}", m1.offset(), m1.value());
                }
            }
           logger.info("=====================");
           ConsumerRecords<String, String> msg2 = c2.poll(1000L);
           if (msg2 != null) {
               for (ConsumerRecord m2 : msg2) {
                   logger.info("m2 offset : {} , value : {}", m2.offset(), m2.value());
               }
           }
           System.exit(0);
        }
    }
}
复制代码官方api
官方关于subscribe的解释:
/**
 * Subscribe to the given list of topics to get dynamically assigned partitions.
 * <b>Topic subscriptions are not incremental. This list will replace the current
 * assignment (if there is one).</b> It is not possible to combine topic subscription with group management
 * with manual partition assignment through {@link #assign(Collection)}.
 *
 * If the given list of topics is empty, it is treated the same as {@link #unsubscribe()}.
 *
 * <p>
 * This is a short-hand for {@link #subscribe(Collection, ConsumerRebalanceListener)}, which
 * uses a no-op listener. If you need the ability to seek to particular offsets, you should prefer
 * {@link #subscribe(Collection, ConsumerRebalanceListener)}, since group rebalances will cause partition offsets
 * to be reset. You should also provide your own listener if you are doing your own offset
 * management since the listener gives you an opportunity to commit offsets before a rebalance finishes.
 *
 * @param topics The list of topics to subscribe to
 * @throws IllegalArgumentException If topics is null or contains null or empty elements
 * @throws IllegalStateException If {@code subscribe()} is called previously with pattern, or assign is called
 *                               previously (without a subsequent call to {@link #unsubscribe()}), or if not
 *                               configured at-least one partition assignment strategy
 */
@Override
public void subscribe(Collection<String> topics) {
    subscribe(topics, new NoOpConsumerRebalanceListener());
}
复制代码官方关于assign的解释:
/**
 * Manually assign a list of partitions to this consumer. This interface does not allow for incremental assignment
 * and will replace the previous assignment (if there is one).
 * <p>
 * If the given list of topic partitions is empty, it is treated the same as {@link #unsubscribe()}.
 * <p>
 * Manual topic assignment through this method does not use the consumer's group management
 * functionality. As such, there will be no rebalance operation triggered when group membership or cluster and topic
 * metadata change. Note that it is not possible to use both manual partition assignment with {@link #assign(Collection)}
 * and group assignment with {@link #subscribe(Collection, ConsumerRebalanceListener)}.
 * <p>
 * If auto-commit is enabled, an async commit (based on the old assignment) will be triggered before the new
 * assignment replaces the old one.
 *
 * @param partitions The list of partitions to assign this consumer
 * @throws IllegalArgumentException If partitions is null or contains null or empty topics
 * @throws IllegalStateException If {@code subscribe()} is called previously with topics or pattern
 *                               (without a subsequent call to {@link #unsubscribe()})
 */
@Override
public void assign(Collection<TopicPartition> partitions) {
    acquireAndEnsureOpen();
    try {
        if (partitions == null) {
            throw new IllegalArgumentException("Topic partition collection to assign to cannot be null");
        } else if (partitions.isEmpty()) {
            this.unsubscribe();
        } else {
            Set<String> topics = new HashSet<>();
            for (TopicPartition tp : partitions) {
                String topic = (tp != null) ? tp.topic() : null;
                if (topic == null || topic.trim().isEmpty())
                    throw new IllegalArgumentException("Topic partitions to assign to cannot have null or empty topic");
                topics.add(topic);
            }
            // make sure the offsets of topic partitions the consumer is unsubscribing from
            // are committed since there will be no following rebalance
            this.coordinator.maybeAutoCommitOffsetsAsync(time.milliseconds());
            log.debug("Subscribed to partition(s): {}", Utils.join(partitions, ", "));
            this.subscriptions.assignFromUser(new HashSet<>(partitions));
            metadata.setTopics(topics);
        }
    } finally {
        release();
    }
}
复制代码建议
建议使用 subscribe() 函数来实现partition的分配。
除非各位同学清楚了解自己需要消费的topic-partitions(不是topic),而且能确定自己的消息全部在这些topic-partitions中,则可以使用assign。
KafkaConsumer assign VS subscribe的更多相关文章
- kafka consumer assign 和 subscribe模式差异分析
		转载请注明原创地址:http://www.cnblogs.com/dongxiao-yang/p/7200971.html 最近需要研究flink-connector-kafka的消费行为,发现fli ... 
- 九  assign和subscribe
		1 subscribe: 自动安排分区, 通过group自动重新的负载均衡: 关于Group的实验: 如果auto commit = true, 重新启动进程,如果是同样的groupID,从上次co ... 
- 利用Kafka的Assign模式实现超大群组(10万+)消息推送
		引言 IM即时通信场景下,最重要的一个能力就是推送:在线的直接通过长连接网关服务转发,离线的通过APNS或者极光等系统进行推送. 本文主要是针对在线用户推送场景来进行总结和探讨:如何利用Kafka ... 
- 【Kafka源码】KafkaConsumer
		[TOC] KafkaConsumer是从kafka集群消费消息的客户端.这是kafka的高级消费者,而SimpleConsumer是kafka的低级消费者.何为高级?何为低级? 我们所谓的高级,就是 ... 
- KafkaConsumer 简析
		使用方式 创建一个 KafkaConsumer 对象订阅主题并开始接收消息: Properties properties = new Properties(); properties.setPrope ... 
- kafka消费者客户端(0.9.0.1API)
		转自:http://orchome.com/203 kafka客户端从kafka集群消费消息(记录).它会透明地处理kafka集群中服务器的故障.它获取集群内数据的分区,也和服务器进行交互,允许消费者 ... 
- Kafka 0.10.0
		2.1 Producer API We encourage all new development to use the new Java producer. This client is produ ... 
- Kafka学习-Producer和Customer
		在上一篇kafka入门的基础之上,本篇主要介绍Kafka的生产者和消费者. Kafka 生产者 kafka Producer发布消息记录到Kakfa集群.生产者是线程安全的,可以在多个线程之间共享生产 ... 
- Kafka的CommitFailedException异常
		一.含义 CommitFailedException异常:位移提交失败时候抛出的异常.通常该异常被抛出时还会携带这样的一段话: Commit cannot be completed since the ... 
随机推荐
- cookie、session、csrf
			cookie的设置和获取 import time from tornado.web import RequestHandler class IndexHandle(RequestHandler): d ... 
- ThinkPHP3.1.2 使用cli命令行模式运行
			ThinkPHP3.1.2 使用cli命令行模式运行 标签(空格分隔): php 前言 thinkphp3.1.2 需要使用cli方法运行脚本 折腾了一天才搞定 3.1.2的版本真的很古老 解决 增加 ... 
- gcc/g++堆栈保护技术
			最近学习内存分布,通过gdb调试发现一些问题,栈空间变量地址应该是从高往低分布的,但是调试发现地址虽然是从高往低分布,但是变量地址的顺序是乱的,请教同事他说可能是gcc/g++默认启用了堆栈保护, ... 
- jvm入门及理解(三)——运行时数据区(程序计数器+本地方法栈)
			一.内存与线程 内存: 内存是非常重要的系统资源,是硬盘和cpu的中间仓库及桥梁,承载着操作系统和应用程序的实时运行.JVM内存布局规定了JAVA在运行过程中内存申请.分配.管理的策略,保证了JVM的 ... 
- 教你爬取腾讯课堂、网易云课堂、mooc等所有课程信息
			本文的所有代码都在GitHub上托管,想要代码的同学请点击这里 
- Pytest系列(20)- allure结合pytest,allure.step()、allure.attach的详细使用
			如果你还想从头学起Pytest,可以看看这个系列的文章哦! https://www.cnblogs.com/poloyy/category/1690628.html 前言 allure除了支持pyte ... 
- d3.js v4曲线图的拖拽功能实现Zoom
			zoom缩放案例 源码:https://github.com/HK-Kevin/d...:demo:https://hk-kevin.github.io/d3...: 原理:通过zoom事件来重新绘制 ... 
- 猜数字和飞机大战(Python零基础入门)
			前言 最近有很多零基础初学者问我,有没有适合零基础学习案例,毕竟零基础入门的知识点是非常的枯燥乏味的,如果没有实现效果展示出来,感觉学习起来特别的累,今天就给大家介绍两个零基础入门的基础案例:猜数字游 ... 
- win10下cuda安装以及利用anaconda安装pytorch-gpu过程
			安装环境:win10+2070super 1.Cuda的下载安装及配置 (1)测试本机独立显卡是否支持CUDA的安装,点击此处查询显卡是否在列表中. (2)查看自己是否能右键找到NVIDA控制面板,如 ... 
- linq  高集成化数据访问技术
			一: 新建名为linq的项目 创建 linq 1 在项目里添加文件夹 App_Code; 2 在文件夹(App_Code) 添加 名为db的 Linq To Sql 类 :一个Linq T ... 
