首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >使用hudi创建外部表配置单元的问题

使用hudi创建外部表配置单元的问题
EN

Stack Overflow用户
提问于 2021-03-19 20:18:05
回答 1查看 655关注 0票数 0

我正在尝试使用apache hudi框架在hive metastore中创建一个外部文件。它能够与hive metastore连接,但在尝试创建表时,在连接后抛出异常。

代码语言:javascript
复制
dataFrame.writeStream
      .format("org.apache.hudi")
      .option(HoodieWriteConfig.TABLE_NAME, tableName)
      .option(DataSourceWriteOptions.HIVE_TABLE_OPT_KEY,tableName)
      .option(DataSourceWriteOptions.OPERATION_OPT_KEY, DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL)
      .option(DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY, "true")
      .option(DataSourceWriteOptions.HIVE_AUTO_CREATE_DATABASE_OPT_KEY, "true")
      .option(DataSourceWriteOptions.DEFAULT_HIVE_ASSUME_DATE_PARTITION_OPT_VAL, "false")
           .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, "partition_id")
            .option(DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY, "partition_id")
            .option(DataSourceWriteOptions.HIVE_URL_OPT_KEY, "jdbc:hive2://localhost:10000")
      .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, key)
      .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, combineKey)
      .option(DataSourceWriteOptions.OPERATION_OPT_KEY, DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL)
      .option("checkpointLocation", "/tmp/test/checkpoint")
      .option("spark.kryo.registrationRequired", "true")
      .option("hoodie.upsert.shuffle.parallelism", "1")
      .outputMode("append")
      .start("s3a://testbucket/test")

依赖关系:

代码语言:javascript
复制
scalaVersion := "2.12.1"

libraryDependencies += "org.apache.spark" %% "spark-core" % "3.1.1"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "3.1.1"
libraryDependencies += "org.apache.spark" %% "spark-sql-kafka-0-10" % "3.1.1"
libraryDependencies += "org.apache.spark" %% "spark-streaming" % "3.1.1" % "provided"
libraryDependencies += "org.apache.spark" %% "spark-streaming-kafka-0-10" % "3.1.1"
libraryDependencies += "org.apache.hudi" %% "hudi-spark-bundle" % "0.7.0"
libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "3.1.4"
libraryDependencies += "org.apache.hadoop" % "hadoop-hdfs" % "3.1.1"
libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "3.1.1"
libraryDependencies += "org.apache.hadoop" % "hadoop-aws" % "3.1.1"
libraryDependencies += "org.apache.spark" %% "spark-hive" % "3.1.1"
libraryDependencies += "org.apache.hive" % "hive-jdbc" % "3.1.1"
libraryDependencies += "org.apache.hive" % "hive-metastore" % "3.1.1"
libraryDependencies += "org.apache.hive" % "hive-exec" % "3.1.1"
dependencyOverrides += "org.apache.hadoop" % "hadoop-common" % "3.1.1"
dependencyOverrides += "org.apache.commons" % "commons-lang3" % "3.9"

获得以下异常:

代码语言:javascript
复制
org.apache.hudi.hive.HoodieHiveSyncException: Failed in executing SQL CREATE EXTERNAL TABLE  IF NOT EXISTS `default`.......' org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3a://testbucket/test'

Caused by: org.apache.hive.service.cli.HiveSQLException: Error running query: java.lang.NoClassDefFoundError: org/apache/hadoop/fs/StreamCapabilities
    at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:300) ~[hive-jdbc-3.1.1.jar:3.1.1]
    at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:286) ~[hive-jdbc-3.1.1.jar:3.1.1]
    at org.apache.hive.jdbc.HiveStatement.runAsyncOnServer(HiveStatement.java:324) ~[hive-jdbc-3.1.1.jar:3.1.1]
    at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:265) ~[hive-jdbc-3.1.1.jar:3.1.1]
    at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:367) ~[hudi-spark-bundle_2.12-0.7.0.jar:0.7.0]
    ... 37 more
EN

回答 1

Stack Overflow用户

发布于 2021-03-21 19:12:27

似乎有一个jar版本不匹配?您可以打开一个hudi github问题,以获得社区的及时响应。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/66708029

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档