我使用以下代码设置s3a以切换AWS emr-6.2.0中的角色
sparky.sparkContext._jsc.hadoopConfiguration().set(
"fs.s3a.aws.credentials.provider",
"org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider",
)
sparky.sparkContext._jsc.hadoopConfiguration().set(
"fs.s3a.access.key", new_credentials["Credentials"]["AccessKeyId"]
)
sparky.sparkContext._jsc.hadoopConfiguration().set(
"fs.s3a.secret.key", new_credentials["Credentials"]["SecretAccessKey"]
)
sparky.sparkContext._jsc.hadoopConfiguration().set(
"fs.s3a.session.token", new_credentials["Credentials"]["SessionToken"]
)问题是我如何切换回访问我当前的角色?简单的解决方案似乎是:spark.sparkContext._jsc.hadoopConfiguration().clear(),但这会清除所有错误&我得到以下错误。
>>> df_disp_prod = spark.read.csv(
... "s3://sandboxes-analysis/demo_inventory/distinct_disp_prod_id.tsv",
... sep=r"\t",
... header=True,
... )
Traceback (most recent call last):
File "<stdin>", line 4, in <module>
File "/usr/lib/spark/python/pyspark/sql/readwriter.py", line 535, in csv
return self._df(self._jreader.csv(self._spark._sc._jvm.PythonUtils.toSeq(path)))
File "/usr/lib/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
File "/usr/lib/spark/python/pyspark/sql/utils.py", line 128, in deco
return f(*a, **kw)
File "/usr/lib/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o515.csv.
: org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "s3"
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3336)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3356)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:123)我使用以下方法来获取配置:>>> _ = [print(conf) for conf in spark.sparkContext.getConf().getAll()]
如果我知道它的名字,我可以得到配置,但是有没有_jsc.hadoopConfiguration()的getAll?那么,可以存储和重新填充配置吗?
configs = [
"fs.s3a.aws.credentials.provider",
"fs.s3a.access.key",
"fs.s3a.secret.key",
"fs.s3a.session.token",
]
_ = [print(c, spark.sparkContext._jsc.hadoopConfiguration().get(c)) for c in configs]发布于 2021-05-13 01:57:51
您应该能够仅使用s3:// URLS来使用EMR连接器,而使用s3a://来使用s3连接器,只需设置s3a身份验证详细信息,它们就应该共存。
清除配置似乎会丢失太多信息。如果确实需要删除某个选项,可以使用Configuration.unset(key)
发布于 2021-05-06 22:31:34
spark.sparkContext._jsc.hadoopConfiguration().clear()
spark.sparkContext._jsc.hadoopConfiguration().reloadConfiguration()似乎得到“S3A://my_home_home/prodigal_son_returns”
def hadoop_config_dict(spark):
hadoop_config_d = {
e.getKey(): e.getValue()
for e in spark.sparkContext._jsc.hadoopConfiguration().iterator()
}
# Sort by key and return
return {k: hadoop_config_d[k] for k in sorted(hadoop_config_d)}获取所有hadoop配置的字典
https://stackoverflow.com/questions/67416365
复制相似问题