首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >失去与卡夫卡的联系。会发生什么?

失去与卡夫卡的联系。会发生什么?
EN

Stack Overflow用户
提问于 2022-11-27 09:07:15
回答 1查看 54关注 0票数 0

作业管理器和任务管理器运行在单个VM上。同样,Kafka运行在同一台服务器上。

我有10项任务,所有阅读来自不同的卡夫卡主题,处理信息和回信给卡夫卡。有时候,我发现我的任务经理被打倒了,什么也没有起作用。我试着通过查看日志来找出问题,我相信这是卡夫卡连接的一个问题。(或者可能是网络问题?但一切都在一台服务器上。)

我想问的是,如果我在短时间内失去了与卡夫卡的联系,会发生什么。为什么任务失败,最重要的是为什么任务管理器崩溃?

一些日志:

代码语言:javascript
复制
2022-11-26 23:35:15,626 INFO  org.apache.kafka.clients.NetworkClient                       [] - [Producer clientId=producer-15] Disconnecting from node 0 due to request timeout.
2022-11-26 23:35:15,626 INFO  org.apache.kafka.clients.NetworkClient                       [] - [Producer clientId=producer-8] Disconnecting from node 0 due to request timeout.
2022-11-26 23:35:15,626 INFO  org.apache.kafka.clients.NetworkClient                       [] - [Consumer clientId=cpualgosgroup1-1, groupId=cpualgosgroup1] Disconnecting from node 0 due to request timeout.
2022-11-26 23:35:15,692 INFO  org.apache.kafka.clients.NetworkClient                       [] - [Consumer clientId=telefilter1-0, groupId=telefilter1] Cancelled in-flight FETCH request with correlation id 3630156 due to node 0 being disconnected (elapsed time since creation: 61648ms, elapsed time since send: 61648ms, request timeout: 30000ms)
2022-11-26 23:35:15,702 INFO  org.apache.kafka.clients.NetworkClient                       [] - [Producer clientId=producer-15] Cancelled in-flight PRODUCE request with correlation id 2159429 due to node 0 being disconnected (elapsed time since creation: 51069ms, elapsed time since send: 51069ms, request timeout: 30000ms)
2022-11-26 23:35:15,702 INFO  org.apache.kafka.clients.NetworkClient                       [] - [Consumer clientId=cpualgosgroup1-1, groupId=cpualgosgroup1] Cancelled in-flight FETCH request with correlation id 2344708 due to node 0 being disconnected (elapsed time since creation: 51184ms, elapsed time since send: 51184ms, request timeout: 30000ms)
2022-11-26 23:35:15,702 INFO  org.apache.kafka.clients.NetworkClient                       [] - [Producer clientId=producer-15] Cancelled in-flight PRODUCE request with correlation id 2159430 due to node 0 being disconnected (elapsed time since creation: 51069ms, elapsed time since send: 51069ms, request timeout: 30000ms)
2022-11-26 23:35:15,842 WARN  org.apache.kafka.clients.producer.internals.Sender           [] - [Producer clientId=producer-15] Received invalid metadata error in produce request on partition tele.alerts.cpu-4 due to org.apache.kafka.common.errors.NetworkException: Disconnected from node 0. Going to request metadata update now
2022-11-26 23:35:15,842 WARN  org.apache.kafka.clients.producer.internals.Sender           [] - [Producer clientId=producer-8] Received invalid metadata error in produce request on partition tele.alerts.cpu-6 due to org.apache.kafka.common.errors.NetworkException: Disconnected from node 0. Going to request metadata update now
2

然后

代码语言:javascript
复制
2022-11-26 23:35:56,673 WARN  org.apache.flink.runtime.taskmanager.Task                    [] - CPUTemperatureAnalysisAlgorithm -> Sink: Writer -> Sink: Committer (1/1)#0 (619139347a459b6de22089ff34edff39_d0ae1ab03e621ff140fb6b0b0a2932f9_0_0) switched from RUNNING to FAILED with failure cause: org.apache.flink.util.FlinkException: Disconnect from JobManager responsible for 8d57994a59ab86ea9ee48076e80a7c7f.
        at org.apache.flink.runtime.taskexecutor.TaskExecutor.disconnectJobManagerConnection(TaskExecutor.java:1702)
        ...
        at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
        Caused by: java.util.concurrent.TimeoutException: The heartbeat of JobManager with id 99d52303d7e24496ae661ddea2b6a372 timed out.

2022-11-26 23:35:56,682 INFO  org.apache.flink.runtime.taskmanager.Task                    [] - Triggering cancellation of task code CPUTemperatureAnalysisAlgorithm -> Sink: Writer -> Sink: Committer (1/1)#0 (619139347a459b6de22089ff34edff39_d0ae1ab03e621ff140fb6b0b0a2932f9_0_0).
2022-11-26 23:35:57,199 INFO  org.apache.flink.runtime.taskmanager.Task                    [] - Attempting to fail task externally TemperatureAnalysis -> Sink: Writer -> Sink: Committer (1/1)#0 (619139347a459b6de22089ff34edff39_15071110d0eea9f1c7f3d75503ff58eb_0_0).
2022-11-26 23:35:57,202 WARN  org.apache.flink.runtime.taskmanager.Task                    [] - TemperatureAnalysis -> Sink: Writer -> Sink: Committer (1/1)#0 (619139347a459b6de22089ff34edff39_15071110d0eea9f1c7f3d75503ff58eb_0_0) switched from RUNNING to FAILED with failure cause: org.apache.flink.util.FlinkException: Disconnect from JobManager responsible for 8d57994a59ab86ea9ee48076e80a7c7f.
        at org.apache.flink.runtime.taskexecutor.TaskExecutor.disconnectJobManagerConnection(TaskExecutor.java:1702)

为什么任务执行器失去与JobManager的连接?

如果我不关心任何数据丢失,我应该如何配置卡夫卡客户端和flink恢复。我只想让卡夫卡的客户不死。特别是我不想让我的任务或任务经理压垮我。如果我失去了连接,是否可以将Flink配置为仅供等待?如果我们不能阅读,等等,如果我们不能回信给卡夫卡,就等着?

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-11-27 11:08:47

JobManager的心跳以id 99d5303d7e24496ae661ddea2b6a372超时。

听起来服务器有点过载了。但是你可以尝试增加心跳超时

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/74588760

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档