我们已经配置了4名工作人员的Dataproc集群。集群已经启动并正在运行,每当我们尝试提交星火作业时,我们都会得到以下错误:
YarnRuntimeException: Recieved SHUTDOWN signal from Resourcemanager, Registration of NodeManager failed, Message from ResourceManager: Disallowed NodeManager在Stackdriver日志中看到的一些消息是
Daemon YARN_NODE_MANAGER failed to restart更新:即使我们将新的工作节点添加到现有的Dataproc集群中,也会注意到这个问题。
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Recieved SHUTDOWN signal from Resourcemanager, Registration of NodeManager failed, Message from ResourceManager: Disallowed NodeManager from <MasterNode DNS> , Sending SHUTDOWN signal to the NodeManager.
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:374)
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:252)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:845)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:912)发布于 2019-09-10 20:35:03
此错误看起来像纱线节点管理器解压缩问题。能否检查Dataproc主GCE中的以下纱线是否存在错误,包括/排除节点配置文件:
更改这些配置文件后,请运行刷新节点命令:
yarn rmadmin -refreshNodes 然后,你应该会看到诺德曼人重新加入纱线。
https://stackoverflow.com/questions/57779731
复制相似问题