首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >存储到S3时,Apache Hudi抛出数据集未找到异常

存储到S3时,Apache Hudi抛出数据集未找到异常
EN

Stack Overflow用户
提问于 2019-09-16 14:16:19
回答 1查看 506关注 0票数 0

我正在尝试将一个简单的数据帧作为Hudi数据集加载到S3中,但我在这样做时遇到了麻烦。我是Apache Hudi的新手,我正在尝试通过在我的Windows机器上本地运行代码来加载数据。下面提到了我用来实现这一点的所有Maven依赖项以及代码和异常

代码语言:javascript
复制
inputDF.write.format("com.uber.hoodie")
.option(HoodieWriteConfig.TABLE_NAME, tablename)
.option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "GameId")
.option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY,"operatorShortName")
.option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "HandledTimestamp")
.option(DataSourceWriteOptions.OPERATION_OPT_KEY, DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL)
.mode(SaveMode.Append)
.save("s3a://s3_buket/Games2" )

<!-- https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk -->
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk</artifactId>
<version>1.11.623</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-aws -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-aws</artifactId>
<version>3.2.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>3.2.0</version>
</dependency>
<dependency>
<groupId>com.uber.hoodie</groupId>
<artifactId>hoodie</artifactId>
<version>0.4.7</version>
<type>pom</type>
</dependency>
<!-- https://mvnrepository.com/artifact/com.uber.hoodie/hoodie-spark -->
<dependency>
<groupId>com.uber.hoodie</groupId>
<artifactId>hoodie-spark</artifactId>
<version>0.4.7</version>
</dependency>

Exception in thread "main" com.uber.hoodie.exception.DatasetNotFoundException: Hoodie dataset not found in path s3a://gat-datalake-raw-dev/Games2\.hoodie
at com.uber.hoodie.exception.DatasetNotFoundException.checkValidDataset(DatasetNotFoundException.java:45)
at com.uber.hoodie.common.table.HoodieTableMetaClient.<init>(HoodieTableMetaClient.java:91)
at com.uber.hoodie.HoodieWriteClient.rollbackInflightCommits(HoodieWriteClient.java:1172)
at com.uber.hoodie.HoodieWriteClient.startCommitWithTime(HoodieWriteClient.java:1044)
at com.uber.hoodie.HoodieWriteClient.startCommit(HoodieWriteClient.java:1037)
at com.uber.hoodie.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:144)
at com.uber.hoodie.DefaultSource.createRelation(DefaultSource.scala:91)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:668)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:276)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:270)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:228)
at com.playngoplatform.scala.dao.DataAccessS3.writeDataToRefinedS3(DataAccessS3.scala:26)
at com.playngoplatform.scala.controller.GameAndProviderDataTransform.processData(GameAndProviderDataTransform.scala:29)
at com.playngoplatform.scala.action.GameAndProviderData$.main(GameAndProviderData.scala:10)
at com.playngoplatform.scala.action.GameAndProviderData.main(GameAndProviderData.scala)

除了这个,我不会做任何其他事情。我只是直接从我的Spark数据源码创建一个Hudi数据集。我看到文件夹被创建为S3路径,但下面没有提到任何进一步的.hoodie.properties文件

代码语言:javascript
复制
hoodie.compaction.payload.class=com.uber.hoodie.common.model.HoodieAvroPayload
hoodie.table.name=hoodie.games
hoodie.archivelog.folder=archived
hoodie.table.type=MERGE_ON_READ
EN

回答 1

Stack Overflow用户

发布于 2020-03-04 15:14:59

Hudi还不完全成熟,不能支持你的windows操作系统。

通过在windows计算机上运行此程序时更改文件分隔字符,已修复此问题。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/57951280

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档