Scala 写入增量表时检测到架构不匹配-Azure DataRicks

Scala 写入增量表时检测到架构不匹配-Azure DataRicks,scala,azure-databricks,delta-lake,Scala,Azure Databricks,Delta Lake,我尝试将“small_radio_json.json”加载到Delta Lake表中。在这段代码之后,我将创建一个表 我尝试创建增量表,但出现错误“写入增量表时检测到架构不匹配。” 它可能和events.write.format(“delta”).mode(“overwrite”).partitionBy(“artist”).save(“/delta/events/”)的分区有关 如何修复或修改代码 //https://docs.microsoft.com/en-us/azure/azure-d

我尝试将“small_radio_json.json”加载到Delta Lake表中。在这段代码之后,我将创建一个表

我尝试创建增量表,但出现错误“写入增量表时检测到架构不匹配。” 它可能和
events.write.format(“delta”).mode(“overwrite”).partitionBy(“artist”).save(“/delta/events/”)的分区有关

如何修复或修改代码

//https://docs.microsoft.com/en-us/azure/azure-databricks/databricks-extract-load-sql-data-warehouse
//https://docs.microsoft.com/en-us/azure/databricks/_static/notebooks/delta/quickstart-scala.html
//会话配置
val appID=“123558b9-3525-4c62-8c48-D3D7E2C16A”
val secret=“123[xEPjpOIBJtBS-W9B9Zsv7h9IF:qw”
val tenantID=“12344839-0afa-4fae-a34a-326c42112bca”
spark.conf.set(“fs.azure.account.auth.type”,“OAuth”)
spark.conf.set(“fs.azure.account.oauth.provider.type”,
“org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider”)
spark.conf.set(“fs.azure.account.oauth2.client.id”,”)
spark.conf.set(“fs.azure.account.oauth2.client.secret”,”)
spark.conf.set(“fs.azure.account.oauth2.client.endpoint”https://login.microsoftonline.com//oauth2/token")
spark.conf.set(“fs.azure.createRemoteFileSystemDuringInitialization”,“true”)
//帐户信息
val storageAccountName=“mydatalake”
val fileSystemName=“fileshare1”
spark.conf.set(“fs.azure.account.auth.type.+storageAccountName+”.dfs.core.windows.net,“OAuth”)
spark.conf.set(“fs.azure.account.oauth.provider.type”。+storageAccountName+
“.dfs.core.windows.net”、“org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider”)
spark.conf.set(“fs.azure.account.oauth2.client.id.”+storageAccountName+“.dfs.core.windows.net”,
“+appID+”)
spark.conf.set(“fs.azure.account.oauth2.client.secret”。+storageAccountName+
“.dfs.core.windows.net”、“+secret+”)
spark.conf.set(“fs.azure.account.oauth2.client.endpoint”。+storageAccountName+
“.dfs.core.windows.net”https://login.microsoftonline.com/“+tenantID+”/oauth2/token”)
spark.conf.set(“fs.azure.createRemoteFileSystemDuringInitialization”,“true”)
dbutils.fs.ls(“abfss://“+fileSystemName+”@“+storageAccountName+”.dfs.core.windows.net/”)
spark.conf.set(“fs.azure.createRemoteFileSystemDuringInitialization”,“false”)
dbutils.fs.cp(“file:///tmp/small_radio_json.json“,”abfss://“+文件系统名+“@”+
storageAccountName+“.dfs.core.windows.net/”)
val df=spark.read.json(“abfss://”+文件系统名+“@”+存储帐户名+
“.dfs.core.windows.net/small\u radio\u json.json”)
//df.show()
导入org.apache.spark.sql_
导入org.apache.spark.sql.functions_
val事件=df
显示(事件)
导入org.apache.spark.sql.SaveMode
events.write.format(“delta”).mode(“覆盖”).partitionBy(“艺术家”).save(“/delta/events/”)
导入org.apache.spark.sql.SaveMode
val events_delta=spark.read.format(“delta”).load(“/delta/events/”)
显示(事件增量)
例外情况:

    org.apache.spark.sql.AnalysisException: A schema mismatch detected when writing to the Delta table.
    To enable schema migration, please set:
    '.option("mergeSchema", "true")'.

    Table schema:
    root
    -- action: string (nullable = true)
    -- date: string (nullable = true)


    Data schema:
    root
    -- artist: string (nullable = true)
    -- auth: string (nullable = true)
    -- firstName: string (nullable = true)
    -- gender: string (nullable = true)

很可能
/delta/events/
没有任何数据,因此从同一目录加载数据时,可能会出现此类异常。

您会遇到架构不匹配错误,因为表中的列与数据框中的列不同

根据问题中粘贴的错误快照,表架构只有两列,而dataframe架构有四列:

Table schema:
root
-- action: string (nullable = true)
-- date: string (nullable = true)


Data schema:
root
-- artist: string (nullable = true)
-- auth: string (nullable = true)
-- firstName: string (nullable = true)
-- gender: string (nullable = true)
现在你有两个选择

  • 如果要保留数据帧中存在的模式,可以将
    overwriteSchema
    选项添加到true
  • 如果要保留所有列,可以将
    mergeSchema
    选项设置为true。在这种情况下,它将合并架构,现在表将有六列,即数据框中的两个现有列和四个新列