问题描述

Azure Event Hubs -- Kafka 生产者发送消息存在延迟接收和丢失问题, 在客户端的日志中发现如下异常:

2023-06-05 02:00:20.467 [kafka-producer-thread | producer-1] ERROR com.deloitte.common.kafka.CommonKafkaProducer - messageId:9235f334-e39f-b429-227e-45cd30dd6486, topic:notify_topic
发送消息失败 org.springframework.kafka.core.KafkaProducerException: Failed to send; nested exception is org.apache.kafka.common.errors.TimeoutException: The request timed out.
at org.springframework.kafka.core.KafkaTemplate.lambda$buildCallback$6(KafkaTemplate.java:690)
at org.apache.skywalking.apm.plugin.kafka.CallbackAdapter.onCompletion(CallbackAdapter.java:45)
at org.springframework.kafka.core.DefaultKafkaProducerFactory$CloseSafeProducer$1.onCompletion$original$dElInXX8(DefaultKafkaProducerFactory.java:1001)
at org.springframework.kafka.core.DefaultKafkaProducerFactory$CloseSafeProducer$1.onCompletion$original$dElInXX8$accessor$6jLL1TNr(DefaultKafkaProducerFactory.java)
at org.springframework.kafka.core.DefaultKafkaProducerFactory$CloseSafeProducer$1$auxiliary$ldSQQGBZ.call(Unknown Source)
at org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.InstMethodsInter.intercept(InstMethodsInter.java:86)
at org.springframework.kafka.core.DefaultKafkaProducerFactory$CloseSafeProducer$1.onCompletion(DefaultKafkaProducerFactory.java)
at org.apache.kafka.clients.producer.KafkaProducer$InterceptorCallback.onCompletion$original$PwZecSoL(KafkaProducer.java:1350)
at org.apache.kafka.clients.producer.KafkaProducer$InterceptorCallback.onCompletion$original$PwZecSoL$accessor$5Ux1udg0(KafkaProducer.java)
at org.apache.kafka.clients.producer.KafkaProducer$InterceptorCallback$auxiliary$a5oVYNi3.call(Unknown Source)
at org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.InstMethodsInter.intercept(InstMethodsInter.java:86)
at org.apache.kafka.clients.producer.KafkaProducer$InterceptorCallback.onCompletion(KafkaProducer.java)
at org.apache.kafka.clients.producer.internals.ProducerBatch.completeFutureAndFireCallbacks(ProducerBatch.java:273)
at org.apache.kafka.clients.producer.internals.ProducerBatch.done(ProducerBatch.java:234)
at org.apache.kafka.clients.producer.internals.ProducerBatch.completeExceptionally(ProducerBatch.java:198)
at org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:758)
at org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:743)
at org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:695)
at org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:634)
at org.apache.kafka.clients.producer.internals.Sender.lambda$null$1(Sender.java:575)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
at org.apache.kafka.clients.producer.internals.Sender.lambda$handleProduceResponse$2(Sender.java:562)
at java.base/java.lang.Iterable.forEach(Iterable.java:75)
at org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:562)
at org.apache.kafka.clients.producer.internals.Sender.lambda$sendProduceRequest$5(Sender.java:836)
at org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109)
at org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:574)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:566)
at org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:328)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:243)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.kafka.common.errors.TimeoutException: The request timed out.

当发送端收到 Failed to send; nested exception is org.apache.kafka.common.errors.TimeoutException: The request timed out. 日志后, 如何来分析这个问题呢?

问题回答

机器人回答一:

This error message indicates that the Kafka producer failed to send a message due to a timeout.

One possible reason for this error is that the transaction timeout is larger than the maximum value allowed by the broker.

You can check the broker configuration for `transaction.max.timeout.ms` and make sure that it is set to a value that is larger than the transaction timeout used by the producer.

You can also check the Kafka logs for more information about the error.

If you are using Flink, you can refer to the Hilo documentation on [Flink to HDInsightKafkaTransactionTimeoutIssue](https://supportability.visualstudio.com/870b0c89-5235-4a33-a0c6-44be26e284e7/_wiki/wikis/e9f84cb2-cd1b-4143-ac21-5bc63f3bdba0?pagePath=%2FHilo%2F[Hilo] TSGs%2F[Hilo] Flink TSGs%2F[Hilo] Flink to HDInsightKafkaTransactionTimeoutIssue) for more information on this issue. One or more links in the message.

机器人回答二:

The error message you provided indicates that the request timed out. This could be due to a network issue or a configuration issue. You can check the following: 1. Check the network connection between the producer and the Event Hub service. Make sure that the network is stable and that there are no issues with the firewall or network security groups. 2. Check the Kafka producer configuration. Make sure that the `request.timeout.ms` property is set to a value that is appropriate for your use case. You can also try increasing the value of this property to see if it resolves the issue. 3. Check the size of the message being sent. If the message size is greater than the maximum allowed size, the request may time out. You can check the maximum message size allowed by the Event Hub service and make sure that your message size is within the limit. 4. Check the Kafka logs for more information about the error. The logs may provide more details about the cause of the error. If you are still unable to resolve the issue, you can contact Azure support for further assistance. Message is interactive. Click to interact.

If you are not using Flink and are using the Kafka producer method to send messages to Event Hub, you can check the Kafka logs for more information about the error.

The error message you provided indicates that the request timed out. This could be due to a network issue or a configuration issue. You can check the following:

1. Check the network connection between the producer and the Event Hub service. Make sure that the network is stable and that there are no issues with the firewall or network security groups.

2. Check the Kafka producer configuration. Make sure that the `request.timeout.ms` property is set to a value that is appropriate for your use case. You can also try increasing the value of this property to see if it resolves the issue.

3. Check the size of the message being sent. If the message size is greater than the maximum allowed size, the request may time out. You can check the maximum message size allowed by the Event Hub service and make sure that your message size is within the limit.

4. Check the Kafka logs for more information about the error. The logs may provide more details about the cause of the error.

If you are still unable to resolve the issue, you can contact Azure support for further assistance.

【Azure 事件中心】Kafka 生产者发送消息失败,根据失败消息询问机器人得到的分析步骤的更多相关文章

  1. 【Azure 事件中心】在微软云中国区 (Mooncake) 上实验以Apache Kafka协议方式发送/接受Event Hubs消息 (Java版)

    问题描述 事件中心提供 Kafka 终结点,现有的基于 Kafka 的应用程序可将该终结点用作运行你自己的 Kafka 群集的替代方案. 事件中心可与许多现有 Kafka 应用程序配合使用.在Azur ...

  2. 【Azure 事件中心】为应用程序网关(Application Gateway with WAF) 配置诊断日志,发送到事件中心

    问题描述 在Application Gateway中,开启WAF(Web application firewall)后,现在需要把访问的日志输出到第三方分析代码中进行分析,如何来获取WAF的诊断日志呢 ...

  3. 【Azure 事件中心】Event Hub 无法连接,出现 Did not observe any item or terminal signal within 60000ms in 'flatMapMany' 的错误消息

    问题描述 使用Java SDK连接Azure Event Hub,一直出现 java.util.concurrent.TimeoutException 异常, 消息为:java.util.concur ...

  4. 【Azure 事件中心】EPH (EventProcessorHost) 消费端观察到多次Shutdown,LeaseLost的error信息,这是什么情况呢?

    问题详情 使用EPH获取Event Hub数据时,多次出现连接shutdown和LeaseLost的error  ,截取某一次的error log如: Time:2021-03-10 08:43:48 ...

  5. 【Azure 事件中心】Azure Event Hub 新功能尝试 -- 异地灾难恢复 (Geo-Disaster Recovery)

    问题描述 关于Event Hub(事件中心)的灾备方案,大多数就是新建另外一个备用的Event Hub,当主Event Hub出现不可用的情况时,就需要切换到备Event Hub上. 而在切换的过程中 ...

  6. 【Azure 事件中心】使用Azure AD认证方式创建Event Hub Consume Client + 自定义Event Position

    问题描述 当使用SDK连接到Azure Event Hub时,最常规的方式为使用连接字符串.这种做法参考官网文档就可成功完成代码:https://docs.azure.cn/zh-cn/event-h ...

  7. 【Azure 事件中心】 org.slf4j.Logger 收集 Event Hub SDK(Java) 输出日志并以文件形式保存

    问题描述 在使用Azure Event Hub的SDK时候,常规情况下,发现示例代码中并没有SDK内部的日志输出.因为在Java项目中,没有添加 SLF4J 依赖,已致于在启动时候有如下提示: SLF ...

  8. Kafka生产者发送消息的三种方式

    Kafka是一种分布式的基于发布/订阅的消息系统,它的高吞吐量.灵活的offset是其它消息系统所没有的. Kafka发送消息主要有三种方式: 1.发送并忘记 2.同步发送 3.异步发送+回调函数 下 ...

  9. kafka 生产者发送消息

    KafkaProducer 创建一个 KafkaThread 来运行 Sender.run 方法. 1. 发送消息的入口在 KafkaProducer#doSend 中,但其实是把消息加入到 batc ...

  10. 【Azure 事件中心】azure-spring-cloud-stream-binder-eventhubs客户端组件问题, 实践消息非顺序可达

    问题描述 查阅了Azure的官方文档( 将事件发送到特定分区: https://docs.azure.cn/zh-cn/event-hubs/event-hubs-availability-and-c ...

随机推荐

  1. Harbor简单搭建以及异常排查的过程与思路

    Harbor简单搭建以及异常排查的过程与思路 前言 我发现我总是能够遇到别人遇不到的问题. 本来搭建十分钟就可以搭建完成 结果我硬生生的搭建了四十分钟. 为了保证下次不再浪费时间. 这里加单总结一下遇 ...

  2. kubernetes中不可见的OOM

    最近看了一篇文章:Tracking Down "Invisible" OOM Kills in Kubernetes,其讲述的是由于内存不足导致Pod中的进程被killed,但Po ...

  3. docker 镜像导出和导入(适用于内网无法拉镜像的问题)

    1.在外网将镜像从指定的仓库拉下来 docker pull consul 现在已将consul镜像拉到了可连外网的服务器  2.将镜像把包到指定的tar文件中 docker save consul:l ...

  4. Windows10磁盘占用100%和内存占用高

    前言 公司配备了两台电脑,两台电脑都是安装的win10系统,一台是磁盘占用高,另一台是内存可用低. 具体情况如下: 一台外网机 8g内存,安装win10 专业版,开机一天后经常出现内存不够用,但其实都 ...

  5. 5.4 Windows驱动开发:内核通过PEB取进程参数

    PEB结构(Process Envirorment Block Structure)其中文名是进程环境块信息,进程环境块内部包含了进程运行的详细参数信息,每一个进程在运行后都会存在一个特有的PEB结构 ...

  6. C/C++ 类与构造析构等知识

    简单定义类 #include <iostream> #include <string> using namespace std; class Student { public: ...

  7. Intel自曝未来三代酷睿!AI性能涨2倍、再涨2倍

    根据最新财报数据,Intel 2023年第四季度154.1亿美元,同比增长10%,全年收入542亿美元,同比下跌14%,预计2024年第一季度收入122-132亿美元. 其中,酷睿处理器业务为主的CC ...

  8. 【C语言深度解剖】一篇解决程序的环境【编译+链接详解】让面试官给我们竖起大拇指

    文章目录 程序的翻译环境 翻译环境详解 编译 预编译 编译 汇编 关于形成符号表 链接 运行环境 尾声 [C语言深度解剖][Linux操作系统]程序的环境[编译+链接详解] 那么这里博主先安利一下一些 ...

  9. Proteus仿真出现“Internal Exception: access violation in module ‘LOADERS.DLL‘ [00020627].”错误

    Proteus仿真问题 在使用 Proteus 8.4 进行仿真时, 出现错误提示 Internal Exception: access violation in module 'LOADERS.DL ...

  10. 小知识:后台执行Oracle创建索引免受会话中断影响

    因为客户环境的堡垒机经常会莫名的断开连接,也不是简单的超时,因为有时候即使你一直在操作,也可能会断. 这样对于操作一些耗时长且中途中断可能会导致异常的操作就很危险,而最简单的避免方法就是将其写到脚本中 ...