Apache spark 使用多列更新Apache Spark/Databricks中的表

Apache spark 使用多列更新Apache Spark/Databricks中的表,apache-spark,apache-spark-sql,databricks,Apache Spark,Apache Spark Sql,Databricks,我正在尝试根据多个列与另一个表的匹配情况更新一个表。我已经尝试了下面显示的内容,但是我得到了显示的错误。这是怎么做到的 update my_table set flag = '1' where (patient_id, org) in ( select distinct (patient_id, org) from enc where lower(enc_type) like '%visit%' ) 错误: Error in SQL statement: AnalysisException

我正在尝试根据多个列与另一个表的匹配情况更新一个表。我已经尝试了下面显示的内容,但是我得到了显示的错误。这是怎么做到的

update my_table set flag = '1' where (patient_id, org) in (
  select distinct (patient_id, org) from enc where lower(enc_type) like '%visit%'
)
错误:

Error in SQL statement: AnalysisException: IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few commands: 'DeltaUpdateTable [post_partum#1911], [1], named_struct
---编辑------------------------------

下面是一个基于复杂查询更新表的完整示例,该查询基于公认答案中给出的文档进行工作

MERGE INTO events eve
USING (
  select 
    enc.org as org,
    enc.person_id as person_id,
    min(encounter_date) as visit_day
  from 
    enc
    join events eve on enc.org = eve.org and enc.person_id = eve.person_id and eve.is_post_partum = 1
  where 
    lower(enc.enc_type) like '%visit%'
  group by 1, 2
) visits
ON eve.org = visits.org and eve.person_id = visits.person_id
WHEN MATCHED THEN
  UPDATE SET eve.delivery_date = visits.visit_day
;

首先,确保使用Delta Lake作为表格格式。第二,我认为你正在寻找更高的职位,这是

一种操作,在数据库表中插入行(如果行不存在),或者更新行(如果行存在)

为此,您需要将
MERGE
UPDATE
结合使用。下面是一个匹配表达式的示例:

MERGE INTO events
USING updates
ON events.eventId = updates.eventId
WHEN MATCHED THEN
  UPDATE SET events.data = updates.data
WHEN NOT MATCHED
  THEN INSERT (date, eventId, data) VALUES (date, eventId, data)

请参阅Databricks文档中的更多内容。

谢谢,我能够根据您提供的文档链接完成我需要的工作。