kafka主题offset各种需求修改方法
简要:开发中,常常因为需要我们要认为修改消费者实例对kafka某个主题消费的偏移量。具体如何修改?为什么可行?其实很容易,有时候只要我们换一种方式思考,如果我自己实现kafka消费者,我该如何让我们的消费者代码如何控制对某一个主题消费,以及我们该如何实现不同消费者组可以消费同一个主题的同一条消息,一个消费组下不同消费者消费同一个主题的不同消息。如果让你实现该框架该如何实现?
原文和作者一起讨论:http://www.cnblogs.com/intsmaze/p/6212913.html
可接网站开发,java开发。
新浪微博:intsmaze刘洋洋哥
微信:intsmaze

创建一个kafka主题名为intsmazX,指定分区数为3.
使用kafkaspout创建该主题的消费者实例(指定元数据存放zookeeper中的路径为/kafka-offset,指定实例id为onetest),启动storm可以观察到如下信息:
INFO storm.kafka.ZkCoordinator - Task [1/1] Refreshing partition manager connections
INFO storm.kafka.DynamicBrokersReader - Read partition info from zookeeper: GlobalPartitionInformation{partitionMap={0=hadoop002.icccuat.com:6667, 1=hadoop003.icccuat.com:6667, 2=hadoop001.icccuat.com:6667}}
INFO storm.kafka.KafkaUtils - Task [1/1] assigned [Partition{host=hadoop002.icccuat.com:6667, partition=0}, Partition{host=hadoop003.icccuat.com:6667, partition=1}, Partition{host=hadoop001.icccuat.com:6667, partition=2}]
INFO storm.kafka.ZkCoordinator - Task [1/1] Deleted partition managers: []
INFO storm.kafka.ZkCoordinator - Task [1/1] New partition managers: [Partition{host=hadoop002.icccuat.com:6667, partition=0}, Partition{host=hadoop003.icccuat.com:6667, partition=1}, Partition{host=hadoop001.icccuat.com:6667, partition=2}]
INFO storm.kafka.PartitionManager - Read partition information from: /kafka-offset/onetest/partition_0 --> null //这个地方会到zookeeper中该目录下读取,看是否存储有对该分区的消费信息
INFO storm.kafka.PartitionManager - No partition information found, using configuration to determine offset//没有分区信息,这个时候就会直接到kafka的broker中得到该分区的最大偏移量
INFO storm.kafka.PartitionManager - Last commit offset from zookeeper: 0
INFO storm.kafka.PartitionManager - Commit offset 0 is more than 9223372036854775807 behind, resetting to startOffsetTime=-2
INFO storm.kafka.PartitionManager - Starting Kafka hadoop002.icccuat.com:0 from offset 0
INFO storm.kafka.PartitionManager - Read partition information from: /kafka-offset/onetest/partition_1 --> null
INFO storm.kafka.PartitionManager - No partition information found, using configuration to determine offset
INFO storm.kafka.PartitionManager - Last commit offset from zookeeper: 0
INFO storm.kafka.PartitionManager - Commit offset 0 is more than 9223372036854775807 behind, resetting to startOffsetTime=-2
INFO storm.kafka.PartitionManager - Starting Kafka hadoop003.icccuat.com:1 from offset 0
INFO storm.kafka.PartitionManager - Read partition information from: /kafka-offset/onetest/partition_2 --> null
INFO storm.kafka.PartitionManager - No partition information found, using configuration to determine offset
INFO storm.kafka.PartitionManager - Last commit offset from zookeeper: 0
INFO storm.kafka.PartitionManager - Commit offset 0 is more than 9223372036854775807 behind, resetting to startOffsetTime=-2
INFO storm.kafka.PartitionManager - Starting Kafka hadoop001.icccuat.com:2 from offset 0
这个时候在zookeeper的/kafka-offset下没有生成名为onetest的目录,这是因为对应的intsmazeX还没有数据产生。
我们使用kafka消费者生产3条数据,然后去查看zookeeper中对应目录下的信息:

{"topology":{"id":"34e94ae4-a0a0-41e9-a360-d0ab648fe196","name":"intsmaze-20161222-143121"},"offset":1,"partition":1,"broker":{"host":"hadoop003.icccuat.com","port":6667},"topic":"intsmazeX"}
{"topology":{"id":"34e94ae4-a0a0-41e9-a360-d0ab648fe196","name":"intsmaze-20161222-143121"},"offset":1,"partition":2,"broker":{"host":"hadoop001.icccuat.com","port":6667},"topic":"intsmazeX"}
{"topology":{"id":"34e94ae4-a0a0-41e9-a360-d0ab648fe196","name":"intsmaze-20161222-143121"},"offset":1,"partition":0,"broker":{"host":"hadoop002.icccuat.com","port":6667},"topic":"intsmazeX"}
30秒(kafkaspout中设置提交zookeeper消费偏移量时间为30秒)之后,可以看到,会记录该实例对每一个分区消费的偏移量为1.
杀掉该拓扑,这个时候我们再向intsmazeX主题生产6条数据,这个时候,broker中该主题每个分区的最大偏移量为3了。
然后我们修改/kafka-offset/onttest/下每一个分区的offset为3.
这个时候,我们再次部署该拓扑,可以发现拓扑没有消费刚刚产生的6条消息。再发送3条消息,拓扑就会立马消费这三条消息。
杀掉该拓扑,这个时候该拓扑消费者实例对每个分区的消费偏移量就是4了,然后我们把offset修改为6,然后启动拓扑,这个时候broker中该主题每个分区的最大偏移量为4并不是6,让我们看看,消费分区的偏移量大于主题分区当前偏移量会有什么样的情况出现。
WARN storm.kafka.KafkaUtils - Got fetch request with offset out of range: [6]; retrying with default start offset time from configuration. configured start offset time: [-2]
WARN storm.kafka.PartitionManager - Using new offset: 4
WARN storm.kafka.KafkaUtils - Got fetch request with offset out of range: [6]; retrying with default start offset time from configuration. configured start offset time: [-2]
WARN storm.kafka.PartitionManager - Using new offset: 4
WARN storm.kafka.KafkaUtils - Got fetch request with offset out of range: [6]; retrying with default start offset time from configuration. configured start offset time: [-2]
WARN storm.kafka.PartitionManager - Using new offset: 4
这个时候我们看到,消费者的分区偏移量的记录将会自动同步为每一个分区当前最大的偏移量了,kafkaspout会先用偏移量6去拉去,发现拉去不到,就到broker中获取该主题对应分区的最大偏移量。。
{"topology":{"id":"818ab9cc-d56f-454f-88b2-06dd830d54c1","name":"intsmaze-20161222-150006"},"offset":4,"partition":0,"broker":{"host":"hadoop002.icccuat.com","port":6667},"topic":"intsmazeX"}
....
把offset的偏移量设置为7000,一样在拓扑启动后,会更新为每个分区的最大偏移量。
重新部署一个拓扑消费该主题,设置该拓扑的id为twotest,这个时候启动拓扑,我们发现,并没有启动拓扑前的消息数据,这是因为,拓扑启动后,要获得偏移量,而这个偏移量只能是当前主题每个分区的最大偏移量(因为分区的偏移量是递增,且
分区的数据会定时删除的,所以无法知道当前分区当前最开始的偏移量。)
Refreshing partition manager connections
Read partition info from zookeeper: GlobalPartitionInformation{partitionMap={0=hadoop002.icccuat.com:6667, 1=hadoop003.icccuat.com:6667, 2=hadoop001.icccuat.com:6667}}
assigned [Partition{host=hadoop002.icccuat.com:6667, partition=0}, Partition{host=hadoop003.icccuat.com:6667, partition=1}, Partition{host=hadoop001.icccuat.com:6667, partition=2}]
Deleted partition managers: []
New partition managers: [Partition{host=hadoop002.icccuat.com:6667, partition=0}, Partition{host=hadoop003.icccuat.com:6667, partition=1}, Partition{host=hadoop001.icccuat.com:6667, partition=2}]
Read partition information from: /kafka-offset/twotest/partition_0 --> null
No partition information found, using configuration to determine offset
Starting Kafka hadoop002.icccuat.com:0 from offset 7
Read partition information from: /kafka-offset/twotest/partition_1 --> null
No partition information found, using configuration to determine offset
Starting Kafka hadoop003.icccuat.com:1 from offset 7
Read partition information from: /kafka-offset/twotest/partition_2 --> null
No partition information found, using configuration to determine offset
Starting Kafka hadoop001.icccuat.com:2 from offset 7
Finished refreshing
Refreshing partition manager connections
Read partition info from zookeeper: GlobalPartitionInformation{partitionMap={0=hadoop002.icccuat.com:6667, 1=hadoop003.icccuat.com:6667, 2=hadoop001.icccuat.com:6667}}
assigned [Partition{host=hadoop002.icccuat.com:6667, partition=0}, Partition{host=hadoop003.icccuat.com:6667, partition=1}, Partition{host=hadoop001.icccuat.com:6667, partition=2}]
Deleted partition managers: []
New partition managers: []
Finished refreshing
Refreshing partition manager connections
Read partition info from zookeeper: GlobalPartitionInformation{partitionMap={0=hadoop002.icccuat.com:6667, 1=hadoop003.icccuat.com:6667, 2=hadoop001.icccuat.com:6667}}
assigned [Partition{host=hadoop002.icccuat.com:6667, partition=0}, Partition{host=hadoop003.icccuat.com:6667, partition=1}, Partition{host=hadoop001.icccuat.com:6667, partition=2}]
Deleted partition managers: []
New partition managers: []
Finished refreshing
Refreshing partition manager connections
Read partition info from zookeeper: GlobalPartitionInformation{partitionMap={0=hadoop002.icccuat.com:6667, 1=hadoop003.icccuat.com:6667, 2=hadoop001.icccuat.com:6667}}
assigned [Partition{host=hadoop002.icccuat.com:6667, partition=0}, Partition{host=hadoop003.icccuat.com:6667, partition=1}, Partition{host=hadoop001.icccuat.com:6667, partition=2}]
Deleted partition managers: []
New partition managers: []
Finished refreshing
Refreshing partition manager connections
Read partition info from zookeeper: GlobalPartitionInformation{partitionMap={0=hadoop002.icccuat.com:6667, 1=hadoop003.icccuat.com:6667, 2=hadoop001.icccuat.com:6667}}
assigned [Partition{host=hadoop002.icccuat.com:6667, partition=0}, Partition{host=hadoop003.icccuat.com:6667, partition=1}, Partition{host=hadoop001.icccuat.com:6667, partition=2}]
Deleted partition managers: []
New partition managers: []
发送三条信息,查看该实例目录如下。
{"topology":{"id":"3d6a5f80-357f-4591-8e5c-b3d4d2403dfe","name":"demo-20161222-152236"},"offset":8,"partition":0,"broker":{"host":"hadoop002.icccuat.com","port":6667},"topic":"intsmazeX"}
再启动一个拓扑,实例为twotest不变:
[INFO] Task [1/2] Refreshing partition manager connections
[INFO] Task [2/2] Refreshing partition manager connections
[INFO] Read partition info from zookeeper: GlobalPartitionInformation{partitionMap={0=hadoop002.icccuat.com:6667, 1=hadoop003.icccuat.com:6667, 2=hadoop001.icccuat.com:6667}}
[INFO] Read partition info from zookeeper: GlobalPartitionInformation{partitionMap={0=hadoop002.icccuat.com:6667, 1=hadoop003.icccuat.com:6667, 2=hadoop001.icccuat.com:6667}}
[INFO] Task [1/2] assigned [Partition{host=hadoop002.icccuat.com:6667, partition=0}, Partition{host=hadoop001.icccuat.com:6667, partition=2}]
[INFO] Task [2/2] assigned [Partition{host=hadoop003.icccuat.com:6667, partition=1}]
[INFO] Task [1/2] Deleted partition managers: []
[INFO] Task [2/2] Deleted partition managers: []
[INFO] Task [1/2] New partition managers: [Partition{host=hadoop002.icccuat.com:6667, partition=0}, Partition{host=hadoop001.icccuat.com:6667, partition=2}]
[INFO] Task [2/2] New partition managers: [Partition{host=hadoop003.icccuat.com:6667, partition=1}]
[INFO] Read partition information from: /kafka-offset/twotest/partition_0 --> {"topic":"intsmazeX","partition":0,"topology":{"id":"3d6a5f80-357f-4591-8e5c-b3d4d2403dfe","name":"demo-20161222-152236"},"broker":{"port":6667,"host":"hadoop002.icccuat.com"},"offset":8}
[INFO] Read partition information from: /kafka-offset/twotest/partition_1 --> {"topic":"intsmazeX","partition":1,"topology":{"id":"3d6a5f80-357f-4591-8e5c-b3d4d2403dfe","name":"demo-20161222-152236"},"broker":{"port":6667,"host":"hadoop003.icccuat.com"},"offset":8}
[INFO] Read last commit offset from zookeeper: 8; old topology_id: 3d6a5f80-357f-4591-8e5c-b3d4d2403dfe - new topology_id: 348af8da-994a-4cdb-a629-e4bf107348af
[INFO] Read last commit offset from zookeeper: 8; old topology_id: 3d6a5f80-357f-4591-8e5c-b3d4d2403dfe - new topology_id: 348af8da-994a-4cdb-a629-e4bf107348af
[INFO] Starting Kafka hadoop002.icccuat.com:0 from offset 8
[INFO] Starting Kafka hadoop003.icccuat.com:1 from offset 8
[INFO] Task [2/2] Finished refreshing
[INFO] Read partition information from: /kafka-offset/twotest/partition_2 --> {"topic":"intsmazeX","partition":2,"topology":{"id":"3d6a5f80-357f-4591-8e5c-b3d4d2403dfe","name":"demo-20161222-152236"},"broker":{"port":6667,"host":"hadoop001.icccuat.com"},"offset":8}
[INFO] Read last commit offset from zookeeper: 8; old topology_id: 3d6a5f80-357f-4591-8e5c-b3d4d2403dfe - new topology_id: 348af8da-994a-4cdb-a629-e4bf107348af
[INFO] Starting Kafka hadoop001.icccuat.com:2 from offset 8
[INFO] Task [1/2] Finished refreshing
[INFO] Task [2/2] Refreshing partition manager connections
[INFO] Read partition info from zookeeper: GlobalPartitionInformation{partitionMap={0=hadoop002.icccuat.com:6667, 1=hadoop003.icccuat.com:6667, 2=hadoop001.icccuat.com:6667}}
[INFO] Task [2/2] assigned [Partition{host=hadoop003.icccuat.com:6667, partition=1}]
[INFO] Task [1/2] Refreshing partition manager connections
[INFO] Task [2/2] Deleted partition managers: []
[INFO] Task [2/2] New partition managers: []
{"topology":{"id":"3d6a5f80-357f-4591-8e5c-b3d4d2403dfe","name":"demo-20161222-152236"},"offset":8,"partition":1,"broker":{"host":"hadoop003.icccuat.com","port":6667},"topic":"intsmazeX"}
然后发送消息,我们可以看到两个拓扑都会运行的,因为两个拓扑共用一个元数据信息。
这个过程有些坑要注意:
1:在使用kafka-spout的时候,我们要指定该kafka消费者在zookeeper中存储偏移量的地址,这里是/kafka-offset。同时指定该kafka对应的实例id这里是onetest.kafkapout和kafka客户端代码不一样,它没有消费组的概念,也不能这样说吧,只能说数据的存放不一样,不同的实例代表
不同的消费组。
2:修改某一个kafkaspout实例的时候,我们一定要把该id的拓扑关闭掉,我们在项目中遇到一个大坑,就是不熟一样的kafkaspout它的id是相同的,也就是共用同一个目录,那么如果我们没有下线这些拓扑任务,而只是把
这些拓扑任务设置为不活跃状态,那么我们修改zookeeper中偏移量后,再把拓扑设置为活跃状态后,会发现修改无效,offset还是变为以前的offset了,这是因为拓扑没有杀掉,它的运行程序中也会保存当前消费的偏移量,会定时更新的。
3:我们在杀拓扑时,要设置时间,因为拓扑默认30秒向zookeeper提交一下偏移量信息。
修改偏移量有两种,一种就是在部署拓扑前,先修改zookeeper中的偏移量,或者直接删除zookeeper中的对应实例的目录。这样从新部署都会从最新的偏移量开始运行。
下面的是我当初自己学习kafka时,思考自己写kafka时,该如何解决kafka的消费者和消费组之间对数据消费时的判断。虽然框架极大简化了我们的生产力,但是作为一个有
思想的程序员,我们应该换一个角度去思考一个框架,而不应该再是这个框架有什么功能,我们用这个框架的这个功能,这样下去,我们就会一直认为这个框架好厉害,却不明白其内部实现方式。 如果自己要实现kafka功能:
kafka主题offset各种需求修改方法的更多相关文章
- 使用Java API创建(create),查看(describe),列举(list),删除(delete)Kafka主题(Topic)
使用Kafka的同学都知道,我们每次创建Kafka主题(Topic)的时候可以指定分区数和副本数等信息,如果将这些属性配置到server.properties文件中,以后调用Java API生成的主题 ...
- IDE-IntelliJ IDEA 主题、字体、编辑区主题、文件编码修改、乱码问题
主题修改 上图标注 1 所示为 IntelliJ IDEA 修改主题的地方,可以通过打开左上角的File -> Setting.在 Windows 系统上 IntelliJ IDEA 默认提供的 ...
- Kafka提交offset机制
在kafka的消费者中,有一个非常关键的机制,那就是offset机制.它使得Kafka在消费的过程中即使挂了或者引发再均衡问题重新分配Partation,当下次重新恢复消费时仍然可以知道从哪里开始消费 ...
- IntelliJ IDEA 主题、字体、编辑区主题、文件编码修改
主题修改 上图标注 1 所示为 IntelliJ IDEA 修改主题的地方,在 Windows 系统上 IntelliJ IDEA 默认提供的主题有四套:Darcula.IntelliJ.Window ...
- kafka 主题管理
对于 kafka 主题(topic)的管理(增删改查),使用最多的便是kafka自带的脚本. 创建主题 kafka提供了自带的 kafka-topics 脚本,用来帮助用户创建主题(topic). b ...
- BluetoothChat用于蓝牙串口通信的修改方法
本人最近在研究嵌入式的串口通信,任务是要写一个手机端的遥控器用来遥控双轮平衡小车.界面只用了一个小时就写好了,重要的问题是如何与板子所带的SPP-CA蓝牙模块进行通信. SPP-CA模块自带代码,在这 ...
- linux 的终端字体色和背景色的修改方法(一)
更改Linux系统终端的颜色主题 随着Linux系统在服务器端的崛起,Linux也在慢慢进军个人桌面系统领域.如果在使用Linux系统的终端时,对其颜色主题不是很满意,该怎么修改颜色的主题呢?今天笔者 ...
- 迅为4412开发板QtE系统源码-屏幕横竖屏切换修改方法
迅为4412开发板QtE系统源码-屏幕横竖屏切换修改方法 详情了解:http://topeetboard.com 更多了解:https://arm-board.taobao.com/ 用户在开发板上运 ...
- kafka主题管理
若代理设置了 auto.create.topics.enable=true,这样还未创建topic就往kafka发送消息时, 会自动创建一个 ${num.partitions}个分区和{default ...
随机推荐
- Java虚拟机 JVM
finalize();(不建议使用,代价高,不确定性大) 如果你在一个类中覆写了finalize()方法, 那么你可以在第一次被GC的时候,挽救一个你想挽救的对象,让其不被回收,但只能挽救一次. GC ...
- HA 高可用软件系统保养指南
又过了一年 618,六月是公司一年一度的大促月,一般提前一个月各系统就会减少需求和功能的开发,转而更多去关注系统可用性.稳定性和管控性等方面的非功能需求.大促前的准备工作一般叫作「备战」,可以把线上运 ...
- MySQL 优化之 ICP (index condition pushdown:索引条件下推)
ICP技术是在MySQL5.6中引入的一种索引优化技术.它能减少在使用 二级索引 过滤where条件时的回表次数 和 减少MySQL server层和引擎层的交互次数.在索引组织表中,使用二级索引进行 ...
- 您真的理解了SQLSERVER的日志链了吗?
您真的理解了SQLSERVER的日志链了吗? 先感谢宋沄剑给本人指点迷津,还有郭忠辉童鞋今天在QQ群里抛出的问题 这个问题跟宋沄剑讨论了三天,再次感谢宋沄剑 一直以来,SQLSERVER提供了一个非常 ...
- Ubuntu下利用Mono,Jexus搭建Asp.Net(MVC) Web服务器
最近在Ubuntu上搭建了Asp.Net的Web服务器,其中遇到很多问题,整理一下思路,以备后用. 搭建环境以及配套软件 Ubuntu: 11.10 Mono:3.0.6 下载地址(http://do ...
- [DeviceOne开发]-土地销售项目源码分享
一.简介 这个是一个真实项目开源,虽然不是很花哨,但是中规中矩,小细节处理的也很好,非常值得参考和借鉴.里面的数据都缓存到本地,可以离线运行,但是调整一下代码,马上就可以和服务端完全对接.后续会有详细 ...
- H5嵌入原生开发小结----兼容安卓与ios的填坑之路
一开始听说开发H5,以为就是做适配现代浏览器的移动网页,心想不用管IE了,欧也.到今天,发现当初too young too simple,兼容IE和兼容安卓与IOS,后者让你更抓狂.接下来数一下踩过的 ...
- SQL Server 索引和表体系结构(非聚集索引)
非聚集索引 概述 对于非聚集索引,涉及的信息要比聚集索引更多一些,由于整个篇幅比较大涉及接下来的要写的“包含列的索引”,“索引碎片”等一些知识点,可能要结合起来阅读理解起来要更容易一些.非聚集索引和聚 ...
- Java——搭建自己的RESTful API服务器(SpringBoot、Groovy)
这又是一篇JavaWeb相关的博客,内容涉及: SpringBoot:微框架,提供快速构建服务的功能 SpringMVC:Struts的替代者 MyBatis:数据库操作库 Groovy:能与Java ...
- ★Kali信息收集★8.Nmap :端口扫描
★Kali信息收集~ 0.Httrack 网站复制机 http://www.cnblogs.com/dunitian/p/5061954.html ★Kali信息收集~ 1.Google Hackin ...