我正在尝试运行hadoop-streaming命令示例:
hadoop-streaming -files streamingCode/wordSplitter.py \
-mapper wordSplitter.py \
-input s3://elasticmapreduce/samples/wordcount/input \
-output streamingCode/wordCountOut \
-reducer aggregate但是我一直收到这个错误:
Exception in thread "main" com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: Moved Permanently (Service: Amazon S3; Status Code: 301; Error Code: 301 Moved Permanently; Request ID: 98038E504E150CEC), S3 Extended Request ID: IW1x5otBSepAnPgW/RKELCUI9dhADQvrXqU2Ase1CLIa0SWDFnBbTscXihrvHvNm2ZRxjjSJZ1Q=我认为这是因为我的集群在us-west-2中,但我不知道如何正确地格式化s3 url (或者这可能根本不是问题)。
编辑:修改为以下url后:
s3://s3-us-west-2.amazonaws.com/elasticmapreduce/samples/wordcount/input我现在收到以下错误:
Exception in thread "main" com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3
Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: BC8DB415C780DF84),
S3 Extended Request ID: sx8W/+gvND2ssqQce9ZQsZTiqxmSJYZs8OiXgrjwL3dm0JRPaC7ceHor+yrHsPuKTjM2LUwkRAw=编辑:我已经确认错误确实是因为我的集群在us-west-2中,我已经在us-east-1中创建了一个集群,并且它工作正常。那么,问题是如何从其他地域访问s3存储桶?这有可能吗?
发布于 2016-11-03 21:47:00
Amazon从emr-4.7.0开始更改了默认行为,这会在我们升级EMR版本时导致此错误。
解决方案很简单,将此配置添加到核心站点: fs.s3n.endpoint=s3.amazonaws.com
https://stackoverflow.com/questions/38710637
复制相似问题