首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >更改Iceberg中的分区字段时由于火花而产生的错误

更改Iceberg中的分区字段时由于火花而产生的错误
EN

Stack Overflow用户
提问于 2022-04-19 11:16:51
回答 1查看 679关注 0票数 0

我们使用spark写到冰山,当重命名分区字段名时,我们得到了一个验证错误:

代码语言:javascript
复制
org.apache.iceberg.exceptions.ValidationException: Cannot find source column for partition field: 1000: some_date: void(1)

似乎Iceberg指的是现有的表分区字段名,这已经不相关了--因为有一个新的分区字段,而且写入模式是“覆盖”。

有什么建议吗?谢谢!

下面是一个最小的可重现性示例:

使用分区字段“some_date”创建原始表:

代码语言:javascript
复制
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType ,StructField, StringType
dataDF = [('1991-04-01',)]
schema = StructType([
        StructField('some_date',StringType(), True)])

spark = SparkSession.builder.master('local[1]').appName('example') \
    .getOrCreate()

df = spark.createDataFrame(data = dataDF, schema = schema)
spark.sql(f"use iprod")  # catalog
spark.sql(f"CREATE SCHEMA IF NOT EXISTS iprod.test_schema")

df.write.mode("overwrite").format("parquet").partitionBy('some_date').saveAsTable("iprod.test_schema.example")

尝试用相同的代码覆盖表,但是将分区字段重命名为some_date_2

代码语言:javascript
复制
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType ,StructField, StringType
dataDF = [('1991-04-01',)]
schema = StructType([
        StructField('some_date_2',StringType(), True)])

spark = SparkSession.builder.master('local[1]').appName('example') \
    .getOrCreate()

df = spark.createDataFrame(data = dataDF, schema = schema)
spark.sql(f"use iprod")  # catalog
spark.sql(f"CREATE SCHEMA IF NOT EXISTS iprod.test_schema")

df.write.mode("overwrite").format("parquet").partitionBy('some_date_2').saveAsTable("iprod.test_schema.example")

全迹:

::org.apache.iceberg.exceptions.ValidationException:找不到分区字段的源列: 1000: some_date: void(1) at org.apache.iceberg.exceptions.ValidationException.check(ValidationException.java:46) at org.apache.iceberg.PartitionSpec.checkCompatibility(PartitionSpec.java:511) at org.apache.iceberg.PartitionSpec$Builder.build(PartitionSpec.java:503) at org.apache.iceberg.TableMetadata.reassignPartitionIds(TableMetadata.java:768) at org.apache.iceberg.TableMetadata.buildReplacement(TableMetadata.java:790) at org.apache.iceberg.BaseMetastoreCatalog$BaseMetastoreCatalogTableBuilder.newReplaceTableTransaction(BaseMetastoreCatalog.java:256) at org.apache.iceberg.BaseMetastoreCatalog$BaseMetastoreCatalogTableBuilder.createOrReplaceTransaction(BaseMetastoreCatalog.java:244) at org.apache.iceberg.CachingCatalog$CachingTableBuilder.createOrReplaceTransaction(CachingCatalog.java:244) at org.apache.iceberg.spark.SparkCatalog.stageCreateOrReplace(SparkCatalog.java:190) at org.apache.spark.sql.execution.datasources.v2.AtomicReplaceTableAsSelectExec.run(WriteToDataSourceV2Exec.scala:197) at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:40) at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:40) at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.doExecute(V2CommandExec.scala:55) at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:194) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:232) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:190),org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:134),org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:133),org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989),org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107),org.apache.spark.sql。execution.SQLExecution$.withTracker(SQLExecution.scala:232) at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107) at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135) at org。apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989) at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:686) at org。apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:619) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)在sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:750)

EN

回答 1

Stack Overflow用户

发布于 2022-07-05 23:32:09

此错误是因为您的表的Iceberg表格式为版本1。

您应该将表更新为version 2 (format-version table属性)。AFAIK,它可以通过SQL完成:

代码语言:javascript
复制
ALTER TABLE catalog.ns.table
SET TBLPROPERTIES (
  'format-version' = '2'
)

但是DataFrame API v2也是如此。类似于:

代码语言:javascript
复制
df.writeTo('catalog.ns.table').using("iceberg").tableProperty("format-version", "2").createOrReplace()

您可以在等级库中阅读更多关于Iceberg表格式的内容(而这里则可以找到版本1到2之间更改集的摘要)。

如果您想继续使用版本1,您应该使用DROP,然后重新ADD分区(通过ALTER TABLE)。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/71924060

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档