Sql 雪花合并到正在添加数据,即使满足条件,并且目标表和源表中的字段已经存在

Sql 雪花合并到正在添加数据,即使满足条件,并且目标表和源表中的字段已经存在,sql,merge,snowflake-cloud-data-platform,Sql,Merge,Snowflake Cloud Data Platform,我有以下Snowflake语句,它将检查目标表中是否已经存在来自stage文件的散列字段,然后在不匹配时执行插入: MERGE INTO LINK_DIMENSION_FIELD AS D USING (SELECT md5(concat(T.$2, T.$4)) DIM_FIELD, T.$2 DIMENSION_NAME, T.$4 FIELD_NAME FROM @ingest_stage_temp/dimension_field.csv (FILE_FORMAT=>

我有以下Snowflake语句,它将检查目标表中是否已经存在来自stage文件的散列字段,然后在不匹配时执行插入:

MERGE INTO LINK_DIMENSION_FIELD AS D 
USING (SELECT md5(concat(T.$2, T.$4)) DIM_FIELD, T.$2 DIMENSION_NAME, T.$4 FIELD_NAME
       FROM @ingest_stage_temp/dimension_field.csv (FILE_FORMAT=>"GENERIC_CSV_FORMAT") T) ST 
ON md5(concat(D.DIMENSION_NAME_HASH_KEY, D.FIELD_NAME_HASH_KEY)) = ST.DIM_FIELD
WHEN NOT MATCHED THEN 
INSERT (DIMENSION_NAME_FIELD_NAME_HASH_KEY, LOAD_DT, RECORD_SRC, DIMENSION_NAME_HASH_KEY, FIELD_NAME_HASH_KEY) 
VALUES(MD5(CONCAT(ST.DIMENSION_NAME, ST.FIELD_NAME)), current_timestamp(), 'TEST', md5(ST.DIMENSION_NAME), md5(ST.FIELD_NAME));
问题是,即使当
md5(concat(D.DIMENSION\u NAME\u HASH\u KEY,D.FIELD\u NAME\u HASH\u KEY))=ST.DIM\u FIELD
时,合并也始终有效

如果可以看到,这是运行select查询后的暂存文件:

SELECT md5(concat(T.$2, T.$4)) DIM_FIELD, T.$2 DIMENSION_NAME, T.$4 FIELD_NAME
FROM @ingest_stage_temp/dimension_field.csv (FILE_FORMAT=>"GENERIC_CSV_FORMAT") T
结果是:

DIM_FIELD                           DIMENSION_NAME                  FIELD_NAME
87d7dae13cf0326fd03a348ca6c518b5    cg_child_6mo_receiv_ind_iycf    cg_child_6mo_receiv_ind_iycf/nbr_1st_cons_6mc_iycfc
2b75306f968f11b45f066efb9871babb    cg_child_6mo_receiv_ind_iycf    cg_child_6mo_receiv_ind_iycf/nbr_followup_2nd_time_6mc_iycfc
53273e7133d7a0b513af8c9bcc934437    preg_women_rec_ind_counselling  preg_women_rec_ind_counselling/nbr_1st_cons_pregw_iycfc
对现有数据运行select查询时:

select * from LINK_DIMENSION_FIELD;
您可以清楚地看到
DIM_字段中的所有值都已在此表中,因此不应执行插入查询:


在您的ON子句中,您正在比较:

md5(concat(D.DIMENSION_NAME_HASH_KEY, D.FIELD_NAME_HASH_KEY)) = ST.DIM_FIELD

我认为将ST.DIM_字段与DIMENSION_NAME_FIELD_NAME_HASH_KEY(目标表中最后计算的列)进行比较可以达到这个目的。

是的,我错过了一个额外的
md5()
所以应该是
md5(concat(md5(D.DIMENSION_NAME_HASH_KEY),md5(D.FIELD_NAME_HASH_KEY))