首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >flink on minikub无法解析ResourceManager地址

flink on minikub无法解析ResourceManager地址
EN

Stack Overflow用户
提问于 2022-02-28 04:17:56
回答 1查看 521关注 0票数 0

我按照https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/resource-providers/standalone/kubernetes/的指示设置会话模式flink集群,如“启动Kubernetes集群(会话模式)”一节中使用以下命令:

代码语言:javascript
复制
# Configuration and service definition
$ kubectl create -f flink-configuration-configmap.yaml
$ kubectl create -f jobmanager-service.yaml
# Create the deployments for the cluster
$ kubectl create -f jobmanager-session-deployment.yaml
$ kubectl create -f taskmanager-session-deployment.yaml

任务经理吊舱一直在坠毁。任务管理器日志显示:

代码语言:javascript
复制
2022-02-28 03:41:40,543 WARN  akka.remote.ReliableDeliverySupervisor                       [] - Association with remote system [akka.tcp://flink@flink-jobmanager:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@flink-jobmanager:6123]] Caused by: [java.net.UnknownHostException: flink-jobmanager: Temporary failure in name resolution]
2022-02-28 03:41:40,555 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could not resolve ResourceManager address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.
2022-02-28 03:42:00,584 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could not resolve ResourceManager address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.
2022-02-28 03:42:10,594 INFO  akka.remote.transport.ProtocolStateActor                     [] - No response from remote for outbound association. Associate timed out after [20000 ms].
2022-02-28 03:42:10,596 WARN  akka.remote.ReliableDeliverySupervisor                       [] - Association with remote system [akka.tcp://flink@flink-jobmanager:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@flink-jobmanager:6123]] Caused by: [No response from remote for outbound association. Associate timed out after [20000 ms].]
2022-02-28 03:42:10,605 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could not resolve ResourceManager address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.
2022-02-28 03:42:30,644 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could not resolve ResourceManager address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.
2022-02-28 03:42:40,052 ERROR org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Fatal error occurred in TaskExecutor akka.tcp://flink@10.244.205.198:6122/user/rpc/taskmanager_0.
org.apache.flink.runtime.taskexecutor.exceptions.RegistrationTimeoutException: Could not register at the ResourceManager within the specified maximum registration duration 300000 ms. This indicates a problem with this instance. Terminating now.
    at org.apache.flink.runtime.taskexecutor.TaskExecutor.registrationTimeout(TaskExecutor.java:1449) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.taskexecutor.TaskExecutor.lambda$startRegistrationTimeout$17(TaskExecutor.java:1434) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$handleRunAsync$4(AkkaRpcActor.java:455) ~[flink-rpc-akka_dcfe8153-9945-448e-897a-6dec4f3d2704.jar:1.14.3]
    at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68) ~[flink-rpc-akka_dcfe8153-9945-448e-897a-6dec4f3d2704.jar:1.14.3]
    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:455) ~[flink-rpc-akka_dcfe8153-9945-448e-897a-6dec4f3d2704.jar:1.14.3]
    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:213) ~[flink-rpc-akka_dcfe8153-9945-448e-897a-6dec4f3d2704.jar:1.14.3]
    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:163) ~[flink-rpc-akka_dcfe8153-9945-448e-897a-6dec4f3d2704.jar:1.14.3]
    at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24) [flink-rpc-akka_dcfe8153-9945-448e-897a-6dec4f3d2704.jar:1.14.3]
    at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20) [flink-rpc-akka_dcfe8153-9945-448e-897a-6dec4f3d2704.jar:1.14.3]
    at scala.PartialFunction.applyOrElse(PartialFunction.scala:123) [flink-rpc-akka_dcfe8153-9945-448e-897a-6dec4f3d2704.jar:1.14.3]
    at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122) [flink-rpc-akka_dcfe8153-9945-448e-897a-6dec4f3d2704.jar:1.14.3]
    at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20) [flink-rpc-akka_dcfe8153-9945-448e-897a-6dec4f3d2704.jar:1.14.3]
    at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) [flink-rpc-akka_dcfe8153-9945-448e-897a-6dec4f3d2704.jar:1.14.3]
    at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) [flink-rpc-akka_dcfe8153-9945-448e-897a-6dec4f3d2704.jar:1.14.3]
    at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) [flink-rpc-akka_dcfe8153-9945-448e-897a-6dec4f3d2704.jar:1.14.3]
    at akka.actor.Actor.aroundReceive(Actor.scala:537) [flink-rpc-akka_dcfe8153-9945-448e-897a-6dec4f3d2704.jar:1.14.3]
    at akka.actor.Actor.aroundReceive$(Actor.scala:535) [flink-rpc-akka_dcfe8153-9945-448e-897a-6dec4f3d2704.jar:1.14.3]
    at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:220) [flink-rpc-akka_dcfe8153-9945-448e-897a-6dec4f3d2704.jar:1.14.3]
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:580) [flink-rpc-akka_dcfe8153-9945-448e-897a-6dec4f3d2704.jar:1.14.3]
    at akka.actor.ActorCell.invoke(ActorCell.scala:548) [flink-rpc-akka_dcfe8153-9945-448e-897a-6dec4f3d2704.jar:1.14.3]
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270) [flink-rpc-akka_dcfe8153-9945-448e-897a-6dec4f3d2704.jar:1.14.3]
    at akka.dispatch.Mailbox.run(Mailbox.scala:231) [flink-rpc-akka_dcfe8153-9945-448e-897a-6dec4f3d2704.jar:1.14.3]
    at akka.dispatch.Mailbox.exec(Mailbox.scala:243) [flink-rpc-akka_dcfe8153-9945-448e-897a-6dec4f3d2704.jar:1.14.3]
    at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) [?:1.8.0_322]
    at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) [?:1.8.0_322]
    at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) [?:1.8.0_322]
    at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175) [?:1.8.0_322]
2022-02-28 03:42:40,066 ERROR org.apache.flink.runtime.taskexecutor.TaskManagerRunner      [] - Fatal error occurred while executing the TaskManager. Shutting it down...

此错误是否意味着任务管理器无法将flink-作业管理器解析为flink-作业管理器服务群集ip?

flink-工作经理服务已经结束:

代码语言:javascript
复制
(base) ~/cloudmap3/cloudmap3-k8s/flink $ kubectl get services
NAME               TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
flink-jobmanager   ClusterIP   10.111.160.112   <none>        6123/TCP,6124/TCP,8081/TCP   83m
kubernetes         ClusterIP   10.96.0.1        <none>        443/TCP                      32d

如何调试此问题?

添加一些更多的信息: coreDNS吊舱没有显示任何意义。豆荚上的木头显示:

代码语言:javascript
复制
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[WARNING] plugin/kubernetes: starting server with unsynced Kubernetes API
.:53
[INFO] plugin/reload: Running configuration MD5 = 08e2b174e0f0a30a2e82df9c995f4a34
CoreDNS-1.8.4
linux/amd64, go1.16.4, 053c4d5
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[WARNING] plugin/health: Local health request to "http://:8080/health" failed: Get "http://:8080/health": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
[WARNING] plugin/health: Local health request to "http://:8080/health" took more than 1s: 2.607635169s
[WARNING] plugin/health: Local health request to "http://:8080/health" failed: Get "http://:8080/health": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
[WARNING] plugin/health: Local health request to "http://:8080/health" failed: Get "http://:8080/health": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
[WARNING] plugin/health: Local health request to "http://:8080/health" failed: Get "http://:8080/health": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
[WARNING] plugin/health: Local health request to "http://:8080/health" took more than 1s: 1.885799651s
EN

回答 1

Stack Overflow用户

发布于 2022-03-02 09:51:26

重新启动coredns pod后,来自coredns pod [INFO] plugin/ready: Still waiting on: "kubernetes"的错误将消失,flink任务管理器可以向作业管理器注册。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/71290534

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档