I am running simplest Driver alone long running job to reproduce this error
Hadoop Version 2.7.3.2.6.5.0-292
Spark-core version 2_11.2.3.0.2.6.5.0-292Code:
FileSystem fs = tmpPath.getFileSystem(sc.hadoopConfiguration())
log.info("Path {} is ",path,fs.exists(tmpPath);行为:我的作业运行了17~18个小时,在新键作为
HadoopFSDelagationTokenProvider的一部分释放之后,作业继续使用新发出的委托令牌运行,但在委托令牌更新后的1小时内,在缓存中找不到带有错误令牌的作业失败。我已经通过编程为所涉及的namenodes生成了我自己的dfs.adddelegationtoken,我也看到了同样的行为。
问题:
。
Path /test/abc.parquet is true
Path /test/abc.parquet is true
INFO Successfully logged into KDC
INFO getting token for DFS[DFSClient][clientName=DFSClient_NONMAPREDUCE_2324234_29,ugi=qa_user@ABC.com(auth:KERBEROS)](org.apache.spark.deploy.security.HadoopFSDelagationTokenProvider)
INFO Created HDFS_DELEGATION_TOKEN token 31615466 for qa_user on ha:hdfs:hacluster
INFO getting token for DFS[DFSClient][clientName=DFSClient_NONMAPREDUCE_2324234_29,ugi=qa_user@ABC.com(auth:KERBEROS)](org.apache.spark.deploy.security.HadoopFSDelagationTokenProvider)
INFO Created HDFS_DELEGATION_TOKEN token 31615467 for qa_user on ha:hdfs:hacluster
INFO writing out delegation tokens to hdfs://abc/user/qa/.sparkstaging/application_121212.....tmp
INFO delegation tokens written out successfully, renaming file to hdfs://.....
INFO delegation token file rename complete(org.apache.spark.deploy.yarn.security.AMCredentialRenewer)
Scheduling login from keytab in 64799125 millis
Path /test/abc.parquet is true
Path /test/abc.parquet is true
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token 31615466 for qa_user) can't be found in cache
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554)
at org.apache.hadoop.ipc.Client.call(Client.java:1498)
at org.apache.hadoop.ipc.Client.call(Client.java:1398)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
at com.sun.proxy.$Proxy13.getListing(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:620)
at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)FYI以纱线-集群模式提交:--keytab /path/to/ the / that keytab,--主体principalNameAsPerTheKeytab --conf spark.hadoop.fs.hdfs.impl.disable.cache=true注意到令牌更新程序正在发出新的密钥,新的密钥也在工作,但是它不知怎么地从服务器上被撤销了,AM日志对此没有任何线索。
发布于 2020-09-29 04:16:00
回答我自己的问题:
这里有几个非常重要的问题要讲。
UserGroupInformation.getCredentials.getAllTokens(),中存储的
(fraction*renewal time)更新密钥,即--keytab和--principalfs.disable.cache --也就是说,每次您获得新的文件系统对象时,操作都很昂贵,但是您肯定会得到新的fsObject,而不是从CACHE.get(fsname).那里得到它。
如果没有工作,您可以通过使用新凭据() https://hadoop.apache.org/docs/r2.8.2/hadoop-project-dist/hadoop-common/api/org/apache/hadoop/fs/FileSystem.html#addDelegationTokens(java.lang.String,%20org.apache.hadoop.security.Credentials)调用自己的委托令牌,但是必须用kerberosUGI.doAS({});调用此方法
https://stackoverflow.com/questions/64089152
复制相似问题