Flink生产数据到Kafka频繁出现事务失效导致任务重启
在生产中需要将一些数据发到kafka,而且需要做到EXACTLY_ONCE,kafka使用的版本为1.1.0,flink的版本为1.8.0,但是会很经常因为提交事务引起错误,甚至导致任务重启
kafka producer的配置如下
def getKafkaProducer(kafkaAddr: String, targetTopicName: String, kafkaProducersPoolSize: Int): FlinkKafkaProducer[String] = {
val properties = new Properties()
properties.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, kafkaAddr)
properties.setProperty(ProducerConfig.TRANSACTION_TIMEOUT_CONFIG, 6000 * 6 + "")
// 设置了retries参数,可以在Kafka的Partition发生leader切换时,Flink不重启,而是做5次尝试:
properties.setProperty(ProducerConfig.RETRIES_CONFIG, "5")
properties.setProperty(ProducerConfig.MAX_REQUEST_SIZE_CONFIG, String.valueOf(1048576 * 5))
val serial = new KeyedSerializationSchemaWrapper(new SimpleStringSchema())
//val producer = new FlinkKafkaProducer011[String](targetTopicName, serial, properties, Optional.of(new KafkaProducerPartitioner[String]()), Semantic.EXACTLY_ONCE, kafkaProducersPoolSize)
val producer = new FlinkKafkaProducer[String](targetTopicName, serial, properties, Optional.of(new KafkaProducerPartitioner[String]()), FlinkKafkaProducer.Semantic.EXACTLY_ONCE, kafkaProducersPoolSize)
producer.setWriteTimestampToKafka(true)
producer
}
Flink env如下
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.enableCheckpointing(60 * 1000 * 1, CheckpointingMode.EXACTLY_ONCE)
val config = env.getCheckpointConfig
//RETAIN_ON_CANCELLATION在job canceled的时候会保留externalized checkpoint state
config.enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION)
//用于指定checkpoint coordinator上一个checkpoint完成之后最小等多久可以出发另一个checkpoint,当指定这个参数时,maxConcurrentCheckpoints的值为1
config.setMinPauseBetweenCheckpoints(3000)
//用于指定运行中的checkpoint最多可以有多少个,如果有设置了minPauseBetweenCheckpoints,则maxConcurrentCheckpoints这个参数就不起作用了(大于1的值不起作用)
config.setMaxConcurrentCheckpoints(1)
//指定checkpoint执行的超时时间(单位milliseconds),超时没完成就会被abort掉
config.setCheckpointTimeout(30000)
//用于指定在checkpoint发生异常的时候,是否应该fail该task,默认为true,如果设置为false,则task会拒绝checkpoint然后继续运行
//https://issues.apache.org/jira/browse/FLINK-11662
config.setFailOnCheckpointingErrors(false)
然后经常会出现事务失效的问题,报错有很多种,大概为以下
java.lang.RuntimeException: Error while confirming checkpoint
at org.apache.flink.runtime.taskmanager.Task$2.run(Task.java:1218)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.flink.util.FlinkRuntimeException: Committing one of transactions failed, logging first encountered failure
at org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunction.notifyCheckpointComplete(TwoPhaseCommitSinkFunction.java:296)
at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.notifyCheckpointComplete(AbstractUdfStreamOperator.java:130)
at org.apache.flink.streaming.runtime.tasks.StreamTask.notifyCheckpointComplete(StreamTask.java:684)
at org.apache.flink.runtime.taskmanager.Task$2.run(Task.java:1213)
... 5 more
Caused by: org.apache.kafka.common.errors.ProducerFencedException: Producer attempted an operation with an old epoch. Either there is a newer producer with the same transactionalId, or the producer's transaction has been expired by the broker.
org.apache.flink.streaming.connectors.kafka.FlinkKafkaException: Failed to send data to Kafka: Producer attempted an operation with an old epoch. Either there is a newer producer with the same transactionalId, or the producer's transaction has been expired by the broker.
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.checkErroneous(FlinkKafkaProducer.java:1002)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.invoke(FlinkKafkaProducer.java:619)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.invoke(FlinkKafkaProducer.java:97)
at org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunction.invoke(TwoPhaseCommitSinkFunction.java:228)
at org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:56)
at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:202)
at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:105)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.kafka.common.errors.ProducerFencedException: Producer attempted an operation with an old epoch. Either there is a newer producer with the same transactionalId, or the producer's transaction has been expired by the broker.
org.apache.kafka.common.KafkaException: Cannot perform send because at least one previous transactional or idempotent request has failed with errors.
at org.apache.kafka.clients.producer.internals.TransactionManager.failIfNotReadyForSend(TransactionManager.java:278)
at org.apache.kafka.clients.producer.internals.TransactionManager.maybeAddPartitionToTransaction(TransactionManager.java:263)
at org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:804)
at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:760)
at org.apache.flink.streaming.connectors.kafka.internal.FlinkKafkaInternalProducer.send(FlinkKafkaInternalProducer.java:105)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.invoke(FlinkKafkaProducer.java:650)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.invoke(FlinkKafkaProducer.java:97)
at org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunction.invoke(TwoPhaseCommitSinkFunction.java:228)
at org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:56)
at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:202)
at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:105)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.kafka.common.errors.ProducerFencedException: Producer attempted an operation with an old epoch. Either there is a newer producer with the same transactionalId, or the producer's transaction has been expired by the broker.
Checkpoint failed: Failed to send data to Kafka: Producer attempted an operation with an old epoch. Either there is a newer producer with the same transactionalId, or the producer's transaction has been expired by the broker. Checkpoint failed: Could not complete snapshot 11 for operator Sink: data_Sink (2/2).
这些错误基本涉及到两阶段提交、事务、checkpoint。
查看kafka documentation和研究ProducerConfig这个类后发现 kafka producer 在使用EXACTLY_ONCE的时候需要增加一些配置
the transaction timeout must be larger than the checkpoint interval, but smaller than the broker transaction.max.timeout.ms.
在getKafkaProducer增加以下配置后,出现原来的错误减少
//checkpoint 间隔时间<TRANSACTION_TIMEOUT_CONFIG<kafka transaction.max.timeout.ms (默认900秒)
properties.setProperty(ProducerConfig.TRANSACTION_TIMEOUT_CONFIG, 1000 * 60 * 3 + "")
properties.setProperty(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, "1")
properties.setProperty(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, "true")
问题得到缓解。
参考:
https://www.cnblogs.com/wangzhuxing/p/10111831.html
http://www.heartthinkdo.com/?p=2040
http://romanmarkunas.com/web/blog/kafka-transactions-in-practice-1-producer/
Flink生产数据到Kafka频繁出现事务失效导致任务重启的更多相关文章
- kafka没配置好,导致服务器重启之后,topic丢失,topic里面的消息也丢失
转,原文:https://blog.csdn.net/zfszhangyuan/article/details/53389916 ----------------------------------- ...
- Kafka科普系列 | Kafka中的事务是什么样子的?
事务,对于大家来说可能并不陌生,比如数据库事务.分布式事务,那么Kafka中的事务是什么样子的呢? 在说Kafka的事务之前,先要说一下Kafka中幂等的实现.幂等和事务是Kafka 0.11.0.0 ...
- flink⼿手动维护kafka偏移量量
flink对接kafka,官方模式方式是自动维护偏移量 但并没有考虑到flink消费kafka过程中,如果出现进程中断后的事情! 如果此时,进程中段: 1:数据可能丢失 从获取了了数据,但是在执⾏行行 ...
- java面试记录二:spring加载流程、springmvc请求流程、spring事务失效、synchronized和volatile、JMM和JVM模型、二分查找的实现、垃圾收集器、控制台顺序打印ABC的三种线程实现
注:部分答案引用网络文章 简答题 1.Spring项目启动后的加载流程 (1)使用spring框架的web项目,在tomcat下,是根据web.xml来启动的.web.xml中负责配置启动spring ...
- Mysql引起的spring事务失效
老项目加新功能,导致出现service调用service的情况..一共2张表有数据的添加删除.然后测试了一下事务,表A和表B,我在表B中抛了异常,但结果发现,表B回滚正常,但是表A并没有回滚.显示事务 ...
- spring声明式事务 同一类内方法调用事务失效
只要避开Spring目前的AOP实现上的限制,要么都声明要事务,要么分开成两个类,要么直接在方法里使用编程式事务 [问题] Spring的声明式事务,我想就不用多介绍了吧,一句话“自从用了Spring ...
- SpringMvc配置 导致实事务失效
SpringMVC回归MVC本质,简简单单的Restful式函数,没有任何基类之后,应该是传统Request-Response框架中最好用的了. Tips 1.事务失效的惨案 Spring MVC最打 ...
- Spring component-scan 的逻辑 、单例模式下多实例问题、事务失效
原创内容,转发请保留:http://www.cnblogs.com/iceJava/p/6930118.html,谢谢 之前遇到该问题,今天查看了下 spring 4.x 的代码 一,先理解下 con ...
- spring事务失效情况分析
详见:http://blog.yemou.net/article/query/info/tytfjhfascvhzxcyt113 <!--[if !supportLists]-->一.&l ...
- kafka producer生产数据到kafka异常:Got error produce response with correlation id 16 on topic-partition...Error: NETWORK_EXCEPTION
kafka producer生产数据到kafka异常:Got error produce response with correlation id 16 on topic-partition... ...
随机推荐
- 02.java基础(一)java的基础、方法和数组
目录 Java基础 Java特性 Java程序运行机制 Java基础语法 1.数据类型 基本类型 引用类型 数据类型扩展 String类型内存分配过程 转义字符 类型转换 变量 常量 2.运算符 逻辑 ...
- 使用Libusb和hidapi测试HID设备
一.测试中断或者Bulk传输: 首先要使用Libusb打印出HID设备的Endpoint查看是否支持中断或者Bulk传输模式:如果支持的话才可以进一步测试: 因为HID设备在插入的时候无需安装,并且一 ...
- 关于同时使用Vue.js 和 Jquery时dom事件失效问题
先加载vue.js,让页面渲染完成后加载jq,给jq绑定ready事件 $(document).ready(function(){ $(function(){ (Jq) }); });
- Linux 第五节(特殊权限,隐藏权限,SU,SUDO,FHS文件系统层次化标准)
特殊权限 SUID 执行者临时获取命令的所有权限(对程序进行设置) SGID 目录内新文件所有组,继承原有目录所有组的名称 SBID 粘滞位,保护位 chmod +权限 文件 chmod ...
- 何同学新视频火了!找到减少沉迷手机的最佳方法:附免费APP
以优质原创视频吸引百万粉丝的 Up 主"何同学"昨晚(1 月 6 日)上线了最新作品,探讨了如何有效地减少现代人使用或者说沉迷手机的时间. 在视频开头,何同学提到,整理了 5000 ...
- 小白之Python-基础中的基础01
Python-基础中的基础01 第一次写博客笔记,尝试并监督下自己. 每一天都值得期待! 20170803 -----------------华丽的分界线------------- Python之-- ...
- python_lib_0001_decorator_print_log
def decorator_log_funcname( func ): def wrapper(*arg, **kw): print("") ...
- Win11右键默认显示更多选项的设置
怎么让Win11右键默认显示更多选项?有很多朋友不喜欢win11系统的右键菜单显示,经常需要多点一次"显示更多选项"才能看到想要的内容,大家想知道如何让win11右键菜单默认显示更 ...
- uniapp 配置钉钉小程序package.json文件
{ "uni-app": { "scripts": { "mp-dingtalk": { "title": " ...
- 关于邮箱怎么验证是不是真实的企业邮箱(java汉字和英文呼唤)
企业邮箱的域名一般都是zhangsan@公司域名,或者zhang_san@公司域名这种形式.这里我只列举zhangsan@公司域名这种形式. 公司要我做一个企业邮箱的模糊匹配和验证,刚接到以为很难.结 ...