我正在尝试使用distcp命令将数据从一个cdh(CDH4.7.1)集群移动到另一个cdh(cdh5.4.1)集群,如下所示:
hadoop distcp -D mapred.task.timeout=60000000 -update hdfs://namenodeIp of source(CDH4):8020/user/admin/distcptest1 webhdfs://namenodeIp of target(CDH5):50070/user/admin/testdir使用此命令时,将目录和子目录从源群集cdh4复制到目标群集cdh5,但未将来自源群集的文件复制到目标群集,失败并显示以下错误:
无法将临时文件(=webhdfs://10.10.200.221:50070/user/admin/testdir/_distcp_tmp_g79i9w/distcptest1/account.xlsx)重命名为目标文件(=webhdfs://10.10.200.221:50070/user/admin/testdir/distcptest1/account.xlsx)
在该作业的日志中找到的堆栈跟踪如下:
2016-02-19 03:16:57,006 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
2016-02-19 03:16:58,686 WARN org.apache.hadoop.conf.Configuration: session.id is deprecated. Instead, use dfs.metrics.session-id
2016-02-19 03:16:58,693 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId=
2016-02-19 03:16:59,736 INFO org.apache.hadoop.util.ProcessTree: setsid exited with exit code 0
2016-02-19 03:16:59,752 INFO org.apache.hadoop.mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@715f1f9c
2016-02-19 03:17:00,248 INFO org.apache.hadoop.mapred.MapTask: Processing split: hdfs://n1.quadratics.com:8020/user/admin/.stagingdistcp_g79i9w/_distcp_src_files:0+2443
2016-02-19 03:17:00,345 WARN mapreduce.Counters: Counter name MAP_INPUT_BYTES is deprecated. Use FileInputFormatCounters as group name and BYTES_READ as counter name instead
2016-02-19 03:17:00,353 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 0
2016-02-19 03:17:01,098 INFO org.apache.hadoop.tools.DistCp: FAIL distcptest1/account.xlsx : java.io.IOException: Fail to rename tmp file (=webhdfs://10.10.200.221:50070/user/admin/testdir/_distcp_tmp_g79i9w/distcptest1/account.xlsx) to destination file (=webhdfs://10.10.200.221:50070/user/admin/testdir/distcptest1/account.xlsx)
at org.apache.hadoop.tools.DistCp$CopyFilesMapper.rename(DistCp.java:494)
at org.apache.hadoop.tools.DistCp$CopyFilesMapper.copy(DistCp.java:463)
at org.apache.hadoop.tools.DistCp$CopyFilesMapper.map(DistCp.java:549)
at org.apache.hadoop.tools.DistCp$CopyFilesMapper.map(DistCp.java:316)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.io.IOException
at org.apache.hadoop.tools.DistCp$CopyFilesMapper.rename(DistCp.java:490)
... 11 more
2016-02-19 03:17:10,457 INFO org.apache.hadoop.tools.DistCp: FAIL distcptest1/_distcp_logs_ww86cq/_logs/history/job_201602160057_0105_1455872921915_hdfs_distcp : java.io.IOException: Fail to rename tmp file (=webhdfs://10.10.200.221:50070/user/admin/testdir/_distcp_tmp_g79i9w/distcptest1/_distcp_logs_ww86cq/_logs/history/job_201602160057_0105_1455872921915_hdfs_distcp) to destination file (=webhdfs://10.10.200.221:50070/user/admin/testdir/distcptest1/_distcp_logs_ww86cq/_logs/history/job_201602160057_0105_1455872921915_hdfs_distcp)
at org.apache.hadoop.tools.DistCp$CopyFilesMapper.rename(DistCp.java:494)
at org.apache.hadoop.tools.DistCp$CopyFilesMapper.copy(DistCp.java:463)
at org.apache.hadoop.tools.DistCp$CopyFilesMapper.map(DistCp.java:549)
at org.apache.hadoop.tools.DistCp$CopyFilesMapper.map(DistCp.java:316)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.io.IOException
at org.apache.hadoop.tools.DistCp$CopyFilesMapper.rename(DistCp.java:490)
... 11 more即使在使用此命令之后也得到了上面的错误:
hadoop distcp -D mapred.task.timeout=60000000 -update webhdfs://namenodeIp of source(CDH4):50070/user/admin/distcptest1 webhdfs://namenodeIp of target(CDH5):50070/user/admin/testdir在两个群集中都启用了WebHDFS。
关于distcp命令的执行,我是从我的源群集中执行的,即cdh4,用户是'admin‘,根据下面给出的cloudera链接,这是可能的:
http://www.cloudera.com/documentation/enterprise/5-4-x/topics/cdh_admin_distcp_data_cluster_migrate.html
当我监视目标集群时,源集群中的文件没有被写入到由file.Can在目标集群中创建的临时文件夹中。这就是为什么在目标集群中重命名失败的原因,因为目标路径不包含该distcp,有人告诉我为什么文件写入失败了吗?
我已经搜索了stackoverflow上的相关文章,并尝试了这些解决方案,但没有一个不能解决这个problem.Any问题,解决这个问题的想法将会非常有帮助。
发布于 2016-02-19 20:31:21
HDFS是一个不能运行yarn作业的用户,它很可能是您的YARN配置中被禁止的用户。
如果这是一个安全集群,那么您还需要两个Kerberos域之间的信任。
https://stackoverflow.com/questions/35501932
复制相似问题