Apache spark 在Spark SQL中合并-如果源不匹配,则

Apache spark 在Spark SQL中合并-如果源不匹配,则,apache-spark,apache-spark-sql,azure-databricks,Apache Spark,Apache Spark Sql,Azure Databricks,我正在Databricks中编写Python和Spark SQL,并使用Spark 2.4.5 我有两张桌子 Create table IF NOT EXISTS db_xsi_ed_faits_shahgholi_ardalan.Destination ( id Int, Name string, Deleted int ) USING Delta; Create table IF NOT EXISTS db_xsi_ed_faits_shahgholi_ardalan.Sour

我正在Databricks中编写Python和Spark SQL,并使用Spark 2.4.5

我有两张桌子

Create table IF NOT EXISTS db_xsi_ed_faits_shahgholi_ardalan.Destination
(
  id Int,
  Name string,
  Deleted int
) USING Delta;

Create table IF NOT EXISTS db_xsi_ed_faits_shahgholi_ardalan.Source
(
  id Int,
  Name string,
  Deleted int
) USING Delta;
我需要在源和目标之间运行Merge命令。我写了下面的命令

%sql
MERGE INTO db_xsi_ed_faits_shahgholi_ardalan.Destination AS D
USING db_xsi_ed_faits_shahgholi_ardalan.Source AS S
ON (S.id = D.id)
-- UPDATE
WHEN MATCHED AND S.Name <> D.Name THEN 
  UPDATE SET 
    D.Name = S.Name
-- INSERT    
WHEN NOT MATCHED THEN 
  INSERT (id, Name, Deleted)
  VALUES (S.id, S.Name, S.Deleted)
 -- DELETE
WHEN NOT MATCHED BY SOURCE THEN 
  UPDATE SET 
     D.Deleted = 1
运行此命令时,出现以下错误:


看来我们在星火中没有不匹配的源!我需要一个解决方案来做到这一点。

我写了这段代码,但我仍然在寻找更好的方法

%sql
MERGE INTO db_xsi_ed_faits_shahgholi_ardalan.Destination AS D
USING db_xsi_ed_faits_shahgholi_ardalan.Source AS S
ON (S.id = D.id)
-- UPDATE
WHEN MATCHED AND S.Name <> D.Name THEN 
  UPDATE SET 
    D.Name = S.Name
-- INSERT    
WHEN NOT MATCHED THEN 
  INSERT (id, Name, Deleted)
  VALUES (S.id, S.Name, S.Deleted)
;

%sql
-- Logical delete
UPDATE db_xsi_ed_faits_shahgholi_ardalan.Destination
  SET Deleted = 1
WHERE db_xsi_ed_faits_shahgholi_ardalan.Destination.id in
(
  SELECT
    D.id
  FROM db_xsi_ed_faits_shahgholi_ardalan.Destination AS D
  LEFT JOIN db_xsi_ed_faits_shahgholi_ardalan.Source AS S ON (S.id = D.id)
  WHERE S.id is null
) 

我写了这段代码,但仍然在寻找更好的方法

%sql
MERGE INTO db_xsi_ed_faits_shahgholi_ardalan.Destination AS D
USING db_xsi_ed_faits_shahgholi_ardalan.Source AS S
ON (S.id = D.id)
-- UPDATE
WHEN MATCHED AND S.Name <> D.Name THEN 
  UPDATE SET 
    D.Name = S.Name
-- INSERT    
WHEN NOT MATCHED THEN 
  INSERT (id, Name, Deleted)
  VALUES (S.id, S.Name, S.Deleted)
;

%sql
-- Logical delete
UPDATE db_xsi_ed_faits_shahgholi_ardalan.Destination
  SET Deleted = 1
WHERE db_xsi_ed_faits_shahgholi_ardalan.Destination.id in
(
  SELECT
    D.id
  FROM db_xsi_ed_faits_shahgholi_ardalan.Destination AS D
  LEFT JOIN db_xsi_ed_faits_shahgholi_ardalan.Source AS S ON (S.id = D.id)
  WHERE S.id is null
)