文章/答案/技术大牛

发布

社区首页 >问答首页 >写入Delta表时检测到的架构不匹配- Azure数据库

问写入Delta表时检测到的架构不匹配- Azure数据库
EN

Stack Overflow用户

提问于 2020-03-29 14:00:21

回答 2查看 20K关注 0票数 9

我试着把"small_radio_json.json“装到三角湖桌上。在这段代码之后，我将创建表。

我尝试创建Delta表，但得到了错误“写入Delta表时检测到的架构不匹配”。它可能与events.write.format("delta").mode("overwrite").partitionBy("artist").save("/delta/events/")的分区有关。

如何修复或修改代码。

    //https://learn.microsoft.com/en-us/azure/azure-databricks/databricks-extract-load-sql-data-warehouse
    //https://learn.microsoft.com/en-us/azure/databricks/_static/notebooks/delta/quickstart-scala.html
    
    //Session configuration
    val appID = "123558b9-3525-4c62-8c48-d3d7e2c16a6a"
    val secret = "123[xEPjpOIBJtBS-W9B9Zsv7h9IF:qw"
    val tenantID = "12344839-0afa-4fae-a34a-326c42112bca"

    spark.conf.set("fs.azure.account.auth.type", "OAuth")
    spark.conf.set("fs.azure.account.oauth.provider.type", 
    "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
    spark.conf.set("fs.azure.account.oauth2.client.id", "<appID>")
    spark.conf.set("fs.azure.account.oauth2.client.secret", "<secret>")
   spark.conf.set("fs.azure.account.oauth2.client.endpoint", "https://login.microsoftonline.com/<tenant- 
   id>/oauth2/token")
   spark.conf.set("fs.azure.createRemoteFileSystemDuringInitialization", "true")

   //Account Information
    val storageAccountName = "mydatalake"
   val fileSystemName = "fileshare1"

    spark.conf.set("fs.azure.account.auth.type." + storageAccountName + ".dfs.core.windows.net", "OAuth")
    spark.conf.set("fs.azure.account.oauth.provider.type." + storageAccountName + 
    ".dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
    spark.conf.set("fs.azure.account.oauth2.client.id." + storageAccountName + ".dfs.core.windows.net", 
    "" + appID + "")
    spark.conf.set("fs.azure.account.oauth2.client.secret." + storageAccountName + 
    ".dfs.core.windows.net", "" + secret + "")
    spark.conf.set("fs.azure.account.oauth2.client.endpoint." + storageAccountName + 
    ".dfs.core.windows.net", "https://login.microsoftonline.com/" + tenantID + "/oauth2/token")
    spark.conf.set("fs.azure.createRemoteFileSystemDuringInitialization", "true")
    dbutils.fs.ls("abfss://" + fileSystemName  + "@" + storageAccountName + ".dfs.core.windows.net/")
    spark.conf.set("fs.azure.createRemoteFileSystemDuringInitialization", "false")

    dbutils.fs.cp("file:///tmp/small_radio_json.json", "abfss://" + fileSystemName + "@" + 
    storageAccountName + ".dfs.core.windows.net/")

    val df = spark.read.json("abfss://" + fileSystemName + "@" + storageAccountName + 
   ".dfs.core.windows.net/small_radio_json.json")

    //df.show()

    import org.apache.spark.sql._
   import org.apache.spark.sql.functions._

    val events = df
  
    display(events)

    import org.apache.spark.sql.SaveMode

    events.write.format("delta").mode("overwrite").partitionBy("artist").save("/delta/events/")
    import org.apache.spark.sql.SaveMode

   val events_delta = spark.read.format("delta").load("/delta/events/")
    display(events_delta)

例外情况：

    org.apache.spark.sql.AnalysisException: A schema mismatch detected when writing to the Delta table.
    To enable schema migration, please set:
    '.option("mergeSchema", "true")'.

    Table schema:
    root
    -- action: string (nullable = true)
    -- date: string (nullable = true)


    Data schema:
    root
    -- artist: string (nullable = true)
    -- auth: string (nullable = true)
    -- firstName: string (nullable = true)
    -- gender: string (nullable = true)

scala

azure-databricks

delta-lake

回答 2

Stack Overflow用户

回答已采纳

发布于 2020-03-29 17:37:11

很可能/delta/events/目录中有以前运行的一些数据，并且这个数据可能有一个与当前的模式不同的模式，所以当将新数据加载到同一个目录时，您将得到这种类型的异常。

票数 5

Stack Overflow用户

发布于 2021-04-01 11:45:31

您正在获得架构不匹配错误，因为您的表中的列与您在dataframe中的列不同。

根据您在问题中粘贴的错误快照，您的表模式只有两列，而dataframe模式有四列：

Table schema:
root
-- action: string (nullable = true)
-- date: string (nullable = true)


Data schema:
root
-- artist: string (nullable = true)
-- auth: string (nullable = true)
-- firstName: string (nullable = true)
-- gender: string (nullable = true)

现在你有两个选择

如果您想保持dataframe中存在的架构，那么可以将overwriteSchema的选项添加到true；如果要保留所有列，则可以将mergeSchema的选项设置为true。在本例中，它将合并模式，现在的表将有六列，即在dataframe.

中有两个现有列和四个新列。

票数 10

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/60915267

复制

相似问题

问写入Delta表时检测到的架构不匹配- Azure数据库
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问写入Delta表时检测到的架构不匹配- Azure数据库EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问写入Delta表时检测到的架构不匹配- Azure数据库
EN