仅更新更改的行pyspark delta表数据块

仅更新更改的行pyspark delta表数据块,pyspark,merge,databricks,delta,Pyspark,Merge,Databricks,Delta,与创建的数据帧相比,需要只更新现有表中已更改的行。所以现在,我确实减去并得到更改的行,但不确定如何合并到现有表中 old_df = spark.sql("select * from existing table") diff = new_df.subtract(old_df) 现在必须插入差异数据帧(如果是新行)或更新现有记录 (deltaTable.alias("full_df").merge( merge_df.alias("app

与创建的数据帧相比,需要只更新现有表中已更改的行。所以现在,我确实减去并得到更改的行,但不确定如何合并到现有表中

old_df = spark.sql("select * from existing table")
diff = new_df.subtract(old_df)
现在必须插入差异数据帧(如果是新行)或更新现有记录

(deltaTable.alias("full_df").merge(
    merge_df.alias("append_df"),
    "full_df.col1 = append_df.col1 OR full_df.col2 =append_df.col2") 
  .whenNotMatchedInsertAll() 
  .execute()
)

这不会更新现有记录(案例:col2值已更改;col1未更改)

。当MatchedUpdateAll()
接受可用于保留未更改行的条件时:

(deltaTable.alias("full_df").merge(
    merge_df.alias("append_df"),
    "full_df.col1 = append_df.col1 OR full_df.col2 = append_df.col2") 
  .whenNotMatchedInsertAll()
  .whenMatchedUpdateAll("full_df.col1 != append_df.col1 OR full_df.col2 != append_df.col2")
  .execute()
)

将数据写入temp表并使用jdbc插入更新。