问题描述

Azure Event Hubs -- Kafka 生产者发送消息存在延迟接收和丢失问题, 在客户端的日志中发现如下异常:

2023-06-05 02:00:20.467 [kafka-producer-thread | producer-1] ERROR com.deloitte.common.kafka.CommonKafkaProducer - messageId:9235f334-e39f-b429-227e-45cd30dd6486, topic:notify_topic
发送消息失败 org.springframework.kafka.core.KafkaProducerException: Failed to send; nested exception is org.apache.kafka.common.errors.TimeoutException: The request timed out.
at org.springframework.kafka.core.KafkaTemplate.lambda$buildCallback$6(KafkaTemplate.java:690)
at org.apache.skywalking.apm.plugin.kafka.CallbackAdapter.onCompletion(CallbackAdapter.java:45)
at org.springframework.kafka.core.DefaultKafkaProducerFactory$CloseSafeProducer$1.onCompletion$original$dElInXX8(DefaultKafkaProducerFactory.java:1001)
at org.springframework.kafka.core.DefaultKafkaProducerFactory$CloseSafeProducer$1.onCompletion$original$dElInXX8$accessor$6jLL1TNr(DefaultKafkaProducerFactory.java)
at org.springframework.kafka.core.DefaultKafkaProducerFactory$CloseSafeProducer$1$auxiliary$ldSQQGBZ.call(Unknown Source)
at org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.InstMethodsInter.intercept(InstMethodsInter.java:86)
at org.springframework.kafka.core.DefaultKafkaProducerFactory$CloseSafeProducer$1.onCompletion(DefaultKafkaProducerFactory.java)
at org.apache.kafka.clients.producer.KafkaProducer$InterceptorCallback.onCompletion$original$PwZecSoL(KafkaProducer.java:1350)
at org.apache.kafka.clients.producer.KafkaProducer$InterceptorCallback.onCompletion$original$PwZecSoL$accessor$5Ux1udg0(KafkaProducer.java)
at org.apache.kafka.clients.producer.KafkaProducer$InterceptorCallback$auxiliary$a5oVYNi3.call(Unknown Source)
at org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.InstMethodsInter.intercept(InstMethodsInter.java:86)
at org.apache.kafka.clients.producer.KafkaProducer$InterceptorCallback.onCompletion(KafkaProducer.java)
at org.apache.kafka.clients.producer.internals.ProducerBatch.completeFutureAndFireCallbacks(ProducerBatch.java:273)
at org.apache.kafka.clients.producer.internals.ProducerBatch.done(ProducerBatch.java:234)
at org.apache.kafka.clients.producer.internals.ProducerBatch.completeExceptionally(ProducerBatch.java:198)
at org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:758)
at org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:743)
at org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:695)
at org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:634)
at org.apache.kafka.clients.producer.internals.Sender.lambda$null$1(Sender.java:575)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
at org.apache.kafka.clients.producer.internals.Sender.lambda$handleProduceResponse$2(Sender.java:562)
at java.base/java.lang.Iterable.forEach(Iterable.java:75)
at org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:562)
at org.apache.kafka.clients.producer.internals.Sender.lambda$sendProduceRequest$5(Sender.java:836)
at org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109)
at org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:574)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:566)
at org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:328)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:243)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.kafka.common.errors.TimeoutException: The request timed out.

当发送端收到 Failed to send; nested exception is org.apache.kafka.common.errors.TimeoutException: The request timed out. 日志后, 如何来分析这个问题呢?

问题回答

机器人回答一:

This error message indicates that the Kafka producer failed to send a message due to a timeout.

One possible reason for this error is that the transaction timeout is larger than the maximum value allowed by the broker.

You can check the broker configuration for `transaction.max.timeout.ms` and make sure that it is set to a value that is larger than the transaction timeout used by the producer.

You can also check the Kafka logs for more information about the error.

If you are using Flink, you can refer to the Hilo documentation on [Flink to HDInsightKafkaTransactionTimeoutIssue](https://supportability.visualstudio.com/870b0c89-5235-4a33-a0c6-44be26e284e7/_wiki/wikis/e9f84cb2-cd1b-4143-ac21-5bc63f3bdba0?pagePath=%2FHilo%2F[Hilo] TSGs%2F[Hilo] Flink TSGs%2F[Hilo] Flink to HDInsightKafkaTransactionTimeoutIssue) for more information on this issue. One or more links in the message.

机器人回答二:

The error message you provided indicates that the request timed out. This could be due to a network issue or a configuration issue. You can check the following: 1. Check the network connection between the producer and the Event Hub service. Make sure that the network is stable and that there are no issues with the firewall or network security groups. 2. Check the Kafka producer configuration. Make sure that the `request.timeout.ms` property is set to a value that is appropriate for your use case. You can also try increasing the value of this property to see if it resolves the issue. 3. Check the size of the message being sent. If the message size is greater than the maximum allowed size, the request may time out. You can check the maximum message size allowed by the Event Hub service and make sure that your message size is within the limit. 4. Check the Kafka logs for more information about the error. The logs may provide more details about the cause of the error. If you are still unable to resolve the issue, you can contact Azure support for further assistance. Message is interactive. Click to interact.

If you are not using Flink and are using the Kafka producer method to send messages to Event Hub, you can check the Kafka logs for more information about the error.

The error message you provided indicates that the request timed out. This could be due to a network issue or a configuration issue. You can check the following:

1. Check the network connection between the producer and the Event Hub service. Make sure that the network is stable and that there are no issues with the firewall or network security groups.

2. Check the Kafka producer configuration. Make sure that the `request.timeout.ms` property is set to a value that is appropriate for your use case. You can also try increasing the value of this property to see if it resolves the issue.

3. Check the size of the message being sent. If the message size is greater than the maximum allowed size, the request may time out. You can check the maximum message size allowed by the Event Hub service and make sure that your message size is within the limit.

4. Check the Kafka logs for more information about the error. The logs may provide more details about the cause of the error.

If you are still unable to resolve the issue, you can contact Azure support for further assistance.

【Azure 事件中心】Kafka 生产者发送消息失败,根据失败消息询问机器人得到的分析步骤的更多相关文章

  1. 【Azure 事件中心】在微软云中国区 (Mooncake) 上实验以Apache Kafka协议方式发送/接受Event Hubs消息 (Java版)

    问题描述 事件中心提供 Kafka 终结点,现有的基于 Kafka 的应用程序可将该终结点用作运行你自己的 Kafka 群集的替代方案. 事件中心可与许多现有 Kafka 应用程序配合使用.在Azur ...

  2. 【Azure 事件中心】为应用程序网关(Application Gateway with WAF) 配置诊断日志,发送到事件中心

    问题描述 在Application Gateway中,开启WAF(Web application firewall)后,现在需要把访问的日志输出到第三方分析代码中进行分析,如何来获取WAF的诊断日志呢 ...

  3. 【Azure 事件中心】Event Hub 无法连接,出现 Did not observe any item or terminal signal within 60000ms in 'flatMapMany' 的错误消息

    问题描述 使用Java SDK连接Azure Event Hub,一直出现 java.util.concurrent.TimeoutException 异常, 消息为:java.util.concur ...

  4. 【Azure 事件中心】EPH (EventProcessorHost) 消费端观察到多次Shutdown,LeaseLost的error信息,这是什么情况呢?

    问题详情 使用EPH获取Event Hub数据时,多次出现连接shutdown和LeaseLost的error  ,截取某一次的error log如: Time:2021-03-10 08:43:48 ...

  5. 【Azure 事件中心】Azure Event Hub 新功能尝试 -- 异地灾难恢复 (Geo-Disaster Recovery)

    问题描述 关于Event Hub(事件中心)的灾备方案,大多数就是新建另外一个备用的Event Hub,当主Event Hub出现不可用的情况时,就需要切换到备Event Hub上. 而在切换的过程中 ...

  6. 【Azure 事件中心】使用Azure AD认证方式创建Event Hub Consume Client + 自定义Event Position

    问题描述 当使用SDK连接到Azure Event Hub时,最常规的方式为使用连接字符串.这种做法参考官网文档就可成功完成代码:https://docs.azure.cn/zh-cn/event-h ...

  7. 【Azure 事件中心】 org.slf4j.Logger 收集 Event Hub SDK(Java) 输出日志并以文件形式保存

    问题描述 在使用Azure Event Hub的SDK时候,常规情况下,发现示例代码中并没有SDK内部的日志输出.因为在Java项目中,没有添加 SLF4J 依赖,已致于在启动时候有如下提示: SLF ...

  8. Kafka生产者发送消息的三种方式

    Kafka是一种分布式的基于发布/订阅的消息系统,它的高吞吐量.灵活的offset是其它消息系统所没有的. Kafka发送消息主要有三种方式: 1.发送并忘记 2.同步发送 3.异步发送+回调函数 下 ...

  9. kafka 生产者发送消息

    KafkaProducer 创建一个 KafkaThread 来运行 Sender.run 方法. 1. 发送消息的入口在 KafkaProducer#doSend 中,但其实是把消息加入到 batc ...

  10. 【Azure 事件中心】azure-spring-cloud-stream-binder-eventhubs客户端组件问题, 实践消息非顺序可达

    问题描述 查阅了Azure的官方文档( 将事件发送到特定分区: https://docs.azure.cn/zh-cn/event-hubs/event-hubs-availability-and-c ...

随机推荐

  1. Oracle 提示 用户在线,无法删除的清理方法

    查找删除策略. select username,sid,serial# from v$session where username='FSSCDEC004'; alter system kill se ...

  2. CentOS7升级Glibc到超过2.17版本无法启动的解决办法

    CentOS7升级Glibc到超过2.17版本无法启动的解决办法 背景 今天有同事告知服务器宕机无法启动. 提示信息为: [sda] Assuming drive cache: write throu ...

  3. log4j-漏洞修复

    Log4j漏洞修复 修复参考文档:https://www.cert.org.cn/publish/main/9/2021/20211215154225883558274/202112151542258 ...

  4. Android APP升级时解析程序包时出现问题

    一个新的测试机在自动下载升级安装更新版本APP时,报出"解析程序包时出现问题"错误.原因众说纷纭, 一番搜索,下面的回答比较全面: https://stackoverflow.co ...

  5. Go Plugin介绍

    以下内容来自官方文档. go version: 1.17.5 综述 plugin包实现了Go插件的加载和符号解析. Go插件是一个包括了可导出函数和变量的main包(可以没有main()函数),构建时 ...

  6. 安装Docker填坑

    从官网下载适合win10使用的docker,但是下载后,出现了各种坑,记录一下解决方式 1.docker想要正常启动,需要做以下的准备,开启 Windows 虚拟化和 Linux 子系统(WSL2), ...

  7. 《熬夜整理》保姆级系列教程-玩转Wireshark抓包神器教程(1)-初识Wireshark

    1.简介 前边已经介绍过两款抓包工具,应该是够用了,也能够处理在日常工作中遇到的问题了,但是还是有人留言让宏哥要讲解讲解Wireshark这一款抓包工具,说实话宏哥之前也没有用过这款工具,只能边研究边 ...

  8. WebAssembly核心编程[3]: Module 与 Instance

    WebAssembly程序总是以模块来组织,模块是基本的部署.加载和编译单元.在JavaScript编程接口中,模块通过WebAssembly.Module类型表示.WebAssembly.Modul ...

  9. 记忆化搜索 P1028 数的计算

    P1028 [NOIP2001 普及组] 数的计算 - 洛谷 | 计算机科学教育新生态 (luogu.com.cn) 一开始是想暴力搜索的,也就是枚举比n/2小的数,但是只过了5个点,其他点都TLE ...

  10. 小知识:杜绝明文密码,OGG的credentialstore特性

    之前OGG配置文件中都会明文记录密码,而在OGG12c及以上版本中,有一个特性credentialstore,可以用来杜绝明文密码,提升安全性. 这里测试添加credentialstore并配置数据库 ...