我试图在kubernetes环境中运行Flink作业集群(1.8.1)。我使用这位医生用Job创建了停靠映像。
按照kubefiles创建作业、职务管理器和任务管理器。问题是任务管理器无法连接到职务经理并继续崩溃。
在调试作业管理器日志时,jobmanager.rpc.address将绑定到"localhost“。
但我已经通过了库比文件中的args,如这位医生所示。
我还尝试在env变量(FLINK_ENV_JAVA_OPTS)中设置FLINK_ENV_JAVA_OPTS。
env:
- name: FLINK_ENV_JAVA_OPTS
value: "-Djobmanager.rpc.address=flink-job-cluster"作业管理器控制台日志:
Starting the job-cluster
Starting standalonejob as a console application on host flink-job-cluster-bbxrn.
2019-07-16 17:31:10,759 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --------------------------------------------------------------------------------
2019-07-16 17:31:10,760 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Starting StandaloneJobClusterEntryPoint (Version: <unknown>, Rev:4caec0d, Date:03.04.2019 @ 13:25:54 PDT)
2019-07-16 17:31:10,760 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - OS current user: flink
2019-07-16 17:31:10,761 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Current Hadoop/Kerberos user: <no hadoop dependency found>
2019-07-16 17:31:10,761 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JVM: OpenJDK 64-Bit Server VM - IcedTea - 1.8/25.212-b04
2019-07-16 17:31:10,761 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Maximum heap size: 989 MiBytes
2019-07-16 17:31:10,761 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JAVA_HOME: /usr/lib/jvm/java-1.8-openjdk/jre
2019-07-16 17:31:10,761 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - No Hadoop Dependency available
2019-07-16 17:31:10,761 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JVM Options:
2019-07-16 17:31:10,761 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Xms1024m
2019-07-16 17:31:10,761 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Xmx1024m
2019-07-16 17:31:10,762 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Djobmanager.rpc.address=flink-job-cluster
2019-07-16 17:31:10,762 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dlog4j.configuration=file:/opt/flink-1.8.1/conf/log4j-console.properties
2019-07-16 17:31:10,762 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dlogback.configurationFile=file:/opt/flink-1.8.1/conf/logback-console.xml
2019-07-16 17:31:10,762 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Program Arguments:
2019-07-16 17:31:10,762 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --configDir
2019-07-16 17:31:10,762 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - /opt/flink-1.8.1/conf
2019-07-16 17:31:10,762 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --job-classname
2019-07-16 17:31:10,762 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - wikiedits.WikipediaAnalysis
2019-07-16 17:31:10,762 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --host
2019-07-16 17:31:10,762 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - flink-job-cluster
2019-07-16 17:31:10,762 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Djobmanager.rpc.address=flink-job-cluster
2019-07-16 17:31:10,763 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dparallelism.default=2
2019-07-16 17:31:10,763 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dblob.server.port=6124
2019-07-16 17:31:10,763 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dqueryable-state.server.ports=6125
2019-07-16 17:31:10,763 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Classpath: /opt/flink-1.8.1/lib/log4j-1.2.17.jar:/opt/flink-1.8.1/lib/slf4j-log4j12-1.7.15.jar:/opt/flink-1.8.1/lib/wiki-edits-0.1.jar:/opt/flink-1.8.1/lib/flink-dist_2.11-1.8.1.jar:::
2019-07-16 17:31:10,763 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --------------------------------------------------------------------------------
2019-07-16 17:31:10,764 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Registered UNIX signal handlers for [TERM, HUP, INT]
2019-07-16 17:31:10,850 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, localhost
2019-07-16 17:31:10,851 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123
2019-07-16 17:31:10,851 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.size, 1024m
2019-07-16 17:31:10,851 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.size, 1024m
2019-07-16 17:31:10,851 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 1
2019-07-16 17:31:10,851 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1上面的日志显示rpc.address绑定到本地主机,而不是flink-job-cluster。
我假设任务管理器的消息被Akka rpc删除,因为它绑定到localhost:6123。
2019-07-16 17:31:12,546 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Request slot with profile ResourceProfile{cpuCores=-1.0, heapMemoryInMB=-1, directMemoryInMB=0, nativeMemoryInMB=0, networkMemoryInMB=0} for job 38190f2570cd5f0a0a47f65ddf7aae1f with allocation id 97af00eae7e3dfb31a79232077ea7ee6.
2019-07-16 17:31:14,043 ERROR akka.remote.EndpointWriter - dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[akka.tcp://flink@flink-job-cluster:6123/]] arriving at [akka.tcp://flink@flink-job-cluster:6123] inbound addresses are [akka.tcp://flink@localhost:6123]
2019-07-16 17:31:26,564 ERROR akka.remote.EndpointWriter - dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[akka.tcp://flink@flink-job-cluster:6123/]] arriving at [akka.tcp://flink@flink-job-cluster:6123] inbound addresses are [akka.tcp://flink@localhost:6123]不确定为什么职务管理器绑定到本地主机。
任务管理器pod可以解析flink-job-cluster主机。主机名解析为服务ip地址。
https://stackoverflow.com/questions/57062965
复制相似问题