文章/答案/技术大牛

发布

社区首页 >问答首页 >存储在Azure Data Lake中的Oozie文件或归档标记引用文件

问存储在Azure Data Lake中的Oozie文件或归档标记引用文件
EN

Stack Overflow用户

提问于 2019-12-25 00:04:09

回答 1查看 177关注 0票数 0

我们在Azure计算节点上安装了自定义的Apache Hadoop，并使用Apache Oozie来安排工作流程。

所有工作流和协调器xml文件都部署到Microsoft Azure Data Lake外部存储中。

目前有一个pyspark action，我们将它部署到Azure Data Lake上的不同路径中。

在工作流操作中，我尝试通过文件标记引用它，但没有成功：

<action name='start-job'>
        <spark xmlns="uri:oozie:spark-action:0.2">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <master>${sparkMaster}</master>
            <mode>${sparkMode}</mode>
            <name>PySparkJob</name>
            <jar>${executor}</jar>
            <spark-opts>
                --num-executors ${num_executors} --executor-cores ${executor_cores} --executor-memory ${executor_memory} --driver-memory ${driver_memory} --conf spark.executor.memoryOverhead=${executor_memory_overhead} --py-files ${egg_file_name} --conf spark.driver.maxResultSize=${driver_max_result_size}
            </spark-opts>
            <arg>...</arg>
            <file>${adl_pyfiles_absolute_path}/${egg_file_name}</file>
       </spark>
       <ok to="success-email" />
       <error to="error-email"/>
</action>

这将导致：

Error Message     : Missing py4j and/or pyspark zip files. Please add them to the lib folder or to the Spark sharelib.

有没有办法做到这一点？

azure

apache-spark

pyspark

oozie

azure-data-lake

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-12-28 00:59:35

我找到了根本原因。

所以，仅仅像下面这样提到文件是不够的：

<file>${adl_pyfiles_absolute_path}/${egg_file_name}</file>

用于在spark-opts --py-files ${egg_file_name}中引用它。

因此，给它显式命名可以解决这个问题，即：

<file>${adl_pyfiles_absolute_path}/${egg_file_name}#${egg_file_name}</file>

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/59471165

复制

相似问题

问存储在Azure Data Lake中的Oozie文件或归档标记引用文件
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问存储在Azure Data Lake中的Oozie文件或归档标记引用文件EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问存储在Azure Data Lake中的Oozie文件或归档标记引用文件
EN