KafkaConsumer assign VS subscribe

背景

在kafka中，正常情况下，同一个group.id下的不同消费者不会消费同样的partition，也即某个partition在任何时刻都只能被具有相同group.id的consumer中的一个消费。
也正是这个机制才能保证kafka的重要特性：

1、可以通过增加partitions和consumer来提升吞吐量；
2、保证同一份消息不会被消费多次。

在KafkaConsumer类中（官方API），消费者可以通过assign和subscribe两种方式指定要消费的topic-partition。具体的源码可以参考下文，

这两个接口貌似是完成相同的功能，但是还有细微的差别，初次使用的同学可能感到困惑，下面就详细介绍下两者的区别。

对比结果

KafkaConsumer.subscribe() : 为consumer自动分配partition，有内部算法保证topic-partition以最优的方式均匀分配给同group下的不同consumer。
KafkaConsumer.assign() : 为consumer手动、显示的指定需要消费的topic-partitions，不受group.id限制，相当与指定的group无效（this method does not use the consumer's group management）。

测试代码

public class KafkaManualAssignTest {

    private static final Logger logger = LoggerFactory.getLogger(KafkaManualAssignTest.class);

    private static Properties props = new Properties();

    private static KafkaConsumer<String, String> c1, c2;

    private static final String brokerList = "localhost:9092";

    static {

        props.put("bootstrap.servers", brokerList);

        props.put("group.id", "assignTest");

        props.put("auto.offset.reset", "earliest");

        props.put("enable.auto.commit", "true");

        props.put("session.timeout.ms", "30000");

        props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");

        props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");

        c1 = new KafkaConsumer<String, String>(props);

        c2 = new KafkaConsumer<String, String>(props);

    }

    public static void main(String[] args) {

        TopicPartition tp = new TopicPartition("topic", 0);

        // 采用assign方式显示的为consumer指定需要消费的topic, 具有相同group.id的两个消费者

        // 各自消费了一份数据, 出现了数据的重复消费

        c1.assign(Arrays.asList(tp));

        c2.assign(Arrays.asList(tp));

        // 采用subscribe方式, 利用broker为consumer自动分配topic-partitions,

        // 两个消费者各自消费一个partition, 数据互补, 无交叉.

        // c1.subscribe(Arrays.asList("topic"));

        // c2.subscribe(Arrays.asList("topic"));

        while (true) {

            ConsumerRecords<String, String> msg1 = c1.poll(1000L);

            if (msg1 != null) {

                for (ConsumerRecord m1 : msg1) {

                    logger.info("m1 offset : {} , value : {}", m1.offset(), m1.value());

                }

            }

           logger.info("=====================");

           ConsumerRecords<String, String> msg2 = c2.poll(1000L);

           if (msg2 != null) {

               for (ConsumerRecord m2 : msg2) {

                   logger.info("m2 offset : {} , value : {}", m2.offset(), m2.value());

               }

           }

           System.exit(0);

        }

    }

}

复制代码

官方api

官方关于subscribe的解释：

/**

 * Subscribe to the given list of topics to get dynamically assigned partitions.

 * <b>Topic subscriptions are not incremental. This list will replace the current

 * assignment (if there is one).</b> It is not possible to combine topic subscription with group management

 * with manual partition assignment through {@link #assign(Collection)}.

 *

 * If the given list of topics is empty, it is treated the same as {@link #unsubscribe()}.

 *

 * <p>

 * This is a short-hand for {@link #subscribe(Collection, ConsumerRebalanceListener)}, which

 * uses a no-op listener. If you need the ability to seek to particular offsets, you should prefer

 * {@link #subscribe(Collection, ConsumerRebalanceListener)}, since group rebalances will cause partition offsets

 * to be reset. You should also provide your own listener if you are doing your own offset

 * management since the listener gives you an opportunity to commit offsets before a rebalance finishes.

 *

 * @param topics The list of topics to subscribe to

 * @throws IllegalArgumentException If topics is null or contains null or empty elements

 * @throws IllegalStateException If {@code subscribe()} is called previously with pattern, or assign is called

 *                               previously (without a subsequent call to {@link #unsubscribe()}), or if not

 *                               configured at-least one partition assignment strategy

 */

@Override

public void subscribe(Collection<String> topics) {

    subscribe(topics, new NoOpConsumerRebalanceListener());

}

复制代码

官方关于assign的解释:

/**

 * Manually assign a list of partitions to this consumer. This interface does not allow for incremental assignment

 * and will replace the previous assignment (if there is one).

 * <p>

 * If the given list of topic partitions is empty, it is treated the same as {@link #unsubscribe()}.

 * <p>

 * Manual topic assignment through this method does not use the consumer's group management

 * functionality. As such, there will be no rebalance operation triggered when group membership or cluster and topic

 * metadata change. Note that it is not possible to use both manual partition assignment with {@link #assign(Collection)}

 * and group assignment with {@link #subscribe(Collection, ConsumerRebalanceListener)}.

 * <p>

 * If auto-commit is enabled, an async commit (based on the old assignment) will be triggered before the new

 * assignment replaces the old one.

 *

 * @param partitions The list of partitions to assign this consumer

 * @throws IllegalArgumentException If partitions is null or contains null or empty topics

 * @throws IllegalStateException If {@code subscribe()} is called previously with topics or pattern

 *                               (without a subsequent call to {@link #unsubscribe()})

 */

@Override

public void assign(Collection<TopicPartition> partitions) {

    acquireAndEnsureOpen();

    try {

        if (partitions == null) {

            throw new IllegalArgumentException("Topic partition collection to assign to cannot be null");

        } else if (partitions.isEmpty()) {

            this.unsubscribe();

        } else {

            Set<String> topics = new HashSet<>();

            for (TopicPartition tp : partitions) {

                String topic = (tp != null) ? tp.topic() : null;

                if (topic == null || topic.trim().isEmpty())

                    throw new IllegalArgumentException("Topic partitions to assign to cannot have null or empty topic");

                topics.add(topic);

            }

            // make sure the offsets of topic partitions the consumer is unsubscribing from

            // are committed since there will be no following rebalance

            this.coordinator.maybeAutoCommitOffsetsAsync(time.milliseconds());

            log.debug("Subscribed to partition(s): {}", Utils.join(partitions, ", "));

            this.subscriptions.assignFromUser(new HashSet<>(partitions));

            metadata.setTopics(topics);

        }

    } finally {

        release();

    }

}

复制代码

建议

建议使用 subscribe（）函数来实现partition的分配。

除非各位同学清楚了解自己需要消费的topic-partitions（不是topic），而且能确定自己的消息全部在这些topic-partitions中，则可以使用assign。

KafkaConsumer assign VS subscribe的更多相关文章

kafka consumer assign 和 subscribe模式差异分析
转载请注明原创地址:http://www.cnblogs.com/dongxiao-yang/p/7200971.html 最近需要研究flink-connector-kafka的消费行为,发现fli ...
九 assign和subscribe
1 subscribe: 自动安排分区, 通过group自动重新的负载均衡: 关于Group的实验: 如果auto commit = true, 重新启动进程,如果是同样的groupID,从上次co ...
利用Kafka的Assign模式实现超大群组（10万+）消息推送
引言 IM即时通信场景下,最重要的一个能力就是推送:在线的直接通过长连接网关服务转发,离线的通过APNS或者极光等系统进行推送. 本文主要是针对在线用户推送场景来进行总结和探讨:如何利用Kafka ...
【Kafka源码】KafkaConsumer
[TOC] KafkaConsumer是从kafka集群消费消息的客户端.这是kafka的高级消费者,而SimpleConsumer是kafka的低级消费者.何为高级?何为低级? 我们所谓的高级,就是 ...
KafkaConsumer 简析
使用方式创建一个 KafkaConsumer 对象订阅主题并开始接收消息: Properties properties = new Properties(); properties.setPrope ...
kafka消费者客户端（0.9.0.1API）
转自:http://orchome.com/203 kafka客户端从kafka集群消费消息(记录).它会透明地处理kafka集群中服务器的故障.它获取集群内数据的分区,也和服务器进行交互,允许消费者 ...
Kafka 0.10.0
2.1 Producer API We encourage all new development to use the new Java producer. This client is produ ...
Kafka学习-Producer和Customer
在上一篇kafka入门的基础之上,本篇主要介绍Kafka的生产者和消费者. Kafka 生产者 kafka Producer发布消息记录到Kakfa集群.生产者是线程安全的,可以在多个线程之间共享生产 ...
Kafka的CommitFailedException异常
一.含义 CommitFailedException异常:位移提交失败时候抛出的异常.通常该异常被抛出时还会携带这样的一段话: Commit cannot be completed since the ...

随机推荐

Visual Studio Code 1.44 解决中文代码显示乱码问题（小白图文教程）
现今主流的计算机中文字符编码方案是:GBK和UTF-8. 不同编码方案使用不同的字符集,GBK字符集在中文字符长度和字符数量上存在绝对优势,但对国外字符并不支持.所以,完全面向国内的程序/网页使用的是 ...
IO 模型知多少
1. 引言同步异步I/O,阻塞非阻塞I/O是程序员老生常谈的话题了,也是自己一直以来懵懵懂懂的一个话题.比如:何为同步异步?何为阻塞与非阻塞?二者的区别在哪里?阻塞在何处?为什么会有多种IO模型,分 ...
MTK Android Camera运行流程
Android Camera 运行流程总体架构1.CameraService服务的注册2.Client端的应用层到JNI层Camera App-JNI3.Client到Service的连接4.HAL ...
Python爬虫系列（一）：从零开始，安装环境
在上一个系列,我们学会使用rabbitmq.本来接着是把公司的celery分享出来,但是定睛一看,celery4.0已经不再支持Windows.公司也逐步放弃了服役多年的celery项目.恰好,公司找 ...
GPS定位模块返回数据的处理
本项目采用的是微科的VK2828U7G5LF,根据NMEA0183协议,打算采用反馈GPGLL语句来进行数据的处理. 1. 首先,本GPS模块默认的波特率是9600,因此,我们仅需要设置打开GPGLL ...
pgsql中的事务隔离
pgsql中的事务隔离级别前言事物隔离级别在各个级别上被禁止出现的现象是脏读不可重复读幻读序列化异常读已提交隔离级别可重复读隔离级别可序列化隔离级别摘录 pgsql中的事务隔离级 ...
typename 关键字
1.class关键字的同义词 template <typename T> const T& max(const T& x, const T& y) { return ...
jquery的焦点图片无限循环关键思维
在循环的时候,关键的是按(下一页按钮)到最后一页的时候和按(上一页按钮)到到第一页的时候如何转换: 首先必须知道3个js方法,prepend().append()和clone(); prepend() ...
用threejs实现三维全景图
网络上看到了3D全景图,发现threejs里面有一个库竟然可以实现,一下我贴出代码: <!DOCTYPE html> <html> <head> <meta ...
matlab操作Excel数据
sheet是Excel的表格,xIRange是表格的列的范围指定xlRange,例如使用语法'C1:C2',其中C1和C2是定义要读取的区百域的两个度相对的角. 例如,'D2:H4'表示工作表上的两 ...