首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Spark structured streaming -将数据存储到Azure datalake gen1时出错

Spark structured streaming -将数据存储到Azure datalake gen1时出错
EN

Stack Overflow用户
提问于 2019-10-19 13:23:58
回答 1查看 55关注 0票数 0

我陷入了一个问题,目前正试图找到解决方案。此问题与将流数据输出存储到Azure Datalake相关。下面是我在存储数据时得到的异常

代码语言:javascript
复制
    Exception in thread "main" org.apache.hadoop.fs.InvalidPathException: Invalid path name Wrong FS: adl://<azure-data-lake>.azuredatalakestore.net/eventstore/_spark_metadata, expected: adl://<azure-data-lake>.azuredatalakestore.net/
    at org.apache.hadoop.fs.AbstractFileSystem.checkPath(AbstractFileSystem.java:383)
    at org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:110)
    at org.apache.hadoop.fs.FileContext$14.next(FileContext.java:1120)
    at org.apache.hadoop.fs.FileContext$14.next(FileContext.java:1116)
    at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
    at org.apache.hadoop.fs.FileContext.getFileStatus(FileContext.java:1116)
    at org.apache.hadoop.fs.FileContext$Util.exists(FileContext.java:1581)
    at org.apache.spark.sql.execution.streaming.HDFSMetadataLog$FileContextManager.exists(HDFSMetadataLog.scala:390)
    at org.apache.spark.sql.execution.streaming.HDFSMetadataLog.<init>(HDFSMetadataLog.scala:65)
    at org.apache.spark.sql.execution.streaming.CompactibleFileStreamLog.<init>(CompactibleFileStreamLog.scala:46)
    at org.apache.spark.sql.execution.streaming.FileStreamSinkLog.<init>(FileStreamSinkLog.scala:85)
    at org.apache.spark.sql.execution.streaming.FileStreamSink.<init>(FileStreamSink.scala:95)
    at org.apache.spark.sql.execution.datasources.DataSource.createSink(DataSource.scala:316)
    at org.apache.spark.sql.streaming.DataStreamWriter.start(DataStreamWriter.scala:293)

下面是我的pom依赖项

代码语言:javascript
复制
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>2.3.0</version>
    </dependency>

    <dependency> <!-- Spark dependency -->
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.11</artifactId>
        <version>2.3.0</version>
    </dependency>

    <!-- https://mvnrepository.com/artifact/com.microsoft.azure/azure-eventhubs-spark -->
    <dependency>
        <groupId>com.microsoft.azure</groupId>
        <artifactId>azure-eventhubs-spark_2.11</artifactId>
        <version>2.3.12</version>
    </dependency>

    <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-streaming -->
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-streaming_2.11</artifactId>
        <version>2.4.3</version>
    </dependency>

    <dependency>
        <groupId>com.microsoft.azure</groupId>
        <artifactId>azure-eventhubs</artifactId>
        <version>2.2.0</version>
    </dependency>

    <dependency>
        <groupId>com.microsoft.azure</groupId>
        <artifactId>azure-data-lake-store-sdk</artifactId>
        <version>2.2.8</version>
    </dependency>

    <dependency>
        <groupId>com.microsoft.azure</groupId>
        <artifactId>azure-eventhubs-eph</artifactId>
        <version>2.4.0</version>
    </dependency>

任何关于这方面的帮助都将不胜感激。

EN

回答 1

Stack Overflow用户

发布于 2019-10-23 16:19:05

最后,我能够通过添加适当的maven依赖项来解决这个问题。

我使用的依赖项是

通用hadoop- v3.8.1

  • azure-data-lake-store-sdk - v2.3.7

  • hadoop-azure-datalake v3.2.1

希望这能帮助其他人解决这类问题。

谢谢阿维纳什

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/58460761

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档