通配符列上的SQL Join/如果表else中的col1在col2上联接，则在col1和col2上联接_Sql_Join_Google Bigquery

通配符列上的SQL Join/如果表else中的col1在col2上联接，则在col1和col2上联接

sql join google-bigquery

通配符列上的SQL Join/如果表else中的col1在col2上联接，则在col1和col2上联接,sql,join,google-bigquery,Sql,Join,Google Bigquery,假设我是一家销售占星术的公司，基于客户名称。我有一张有姓氏、姓氏和星座文字的桌子。因为我不能涵盖每个名字组合，所以我经常将姓氏存储为NULL，作为一个全面的值 Horoscope DB sur | fam | horoscope ---------------------- John| Doe | text1 Jane| Doe | text2 NULL| Doe | text3 Ike | Smith| text4 NULL| Smith| text5 还有一份客户名单 custom

假设我是一家销售占星术的公司，基于客户名称。我有一张有姓氏、姓氏和星座文字的桌子。因为我不能涵盖每个名字组合，所以我经常将姓氏存储为NULL，作为一个全面的值

Horoscope DB

sur | fam | horoscope
----------------------
John| Doe  | text1
Jane| Doe  | text2
NULL| Doe  | text3
Ike | Smith| text4
NULL| Smith| text5

还有一份客户名单

customer DB

sur | fam
---------
John| Doe
Jack| Doe
Lisa| Smith
Carl| Smith

现在我们需要为每个客户匹配一个星座。如果姓氏和姓氏完全匹配，则两者都匹配，但如果没有完全匹配，则只匹配姓氏，因此结果将是：

Customer horoscope DB

sur | fam | horoscope
----------------------
John| Doe  | text1
Jack| Doe  | text3
Lisa| Smith| text5
Carl| Smith| text5

如果我使用（sur，fam）进行正常的

左连接，我只会得到John的匹配。如果我使用LEFT-JOIN-USING（fam）
我会得到很多重复。我需要设置一些条件，但我不确定如何设置
如果需要，我愿意更改catch all值，或者将其编码为单独的列
具体地说，我正在与谷歌大查询合作
 这里有一种方法：
select . . .
from (select c.*,
             h.* except (sur, fam), -- whatever columns you want
             row_number() over (partition by c.fam
                                order by (case when c.sur = h.sur then 1 else 2 end)
                               ) as seqnum
      from horoscope h join
           customer c
           on c.fam = h.fam
     ) ch
where seqnum = 1;

基本上，这会加入家族，并选择“最佳匹配”——即姓氏上的精确匹配
不过，您应该小心，因为不同的族可以有相同的族名。
另一种解决方案是使用条件聚合。你可以加入姓氏，然后检查给定姓氏是否存在星座；如果不是，则使用空姓氏
SELECT
    c.sur,
    c.fam,
    COALESCE(
        MAX(CASE WHEN c.sur = h.sur THEN h.text END),
        MAX(CASE WHEN h.sur IS NULL THEN h.text END) 
    ) horoscope_text
FROM
    customer c
    INNER JOIN horoscope h ON c.fam = h.fam
GROUP BY 
    c.sur,
    c.fam

根据我的理解，这里有一种方法
select c.id customer_id, c.sur, c.fam, h.id horoscope_id, h.sur h_sur, 
h.fam h_fam, h.horoscope
FROM customer c join horoscope h
on (c.sur = h.sur and c.fam = h.fam)
or (h.sur is null and c.fam = h.fam and not exists 
      (select 1 from horoscope h1 where h1.sur = c.sur and h1.fam = c.fam)
   )



结果呢
您可以在多种条件下加入，以涵盖每种情况：
select c.sur, c.fam, h.horoscope from customer c 
inner join horoscope h
on (c.fam = h.fam and c.sur = h.sur) or 
  (c.fam = h.fam and h.sur is null and not exists(
    select 1 from horoscope 
    where fam = c.fam and sur = c.sur
  )
)

有关BigQuery标准SQL，请参见下面的
#standardSQL
SELECT c.sur, c.fam,
  ARRAY_AGG(horoscope ORDER BY h.sur DESC LIMIT 1)[OFFSET(0)] horoscope
FROM `project.dataset.customer` c
JOIN `project.dataset.horoscope` h
ON c.fam = h.fam
AND c.sur = IFNULL(h.sur, c.sur)
GROUP BY c.sur, c.fam

您可以使用示例中的示例数据测试、播放上述内容，如下例所示
#standardSQL
WITH `project.dataset.horoscope` AS (
  SELECT 'John' sur,'Doe' fam, 'text1' horoscope UNION ALL
  SELECT 'Jane', 'Doe', 'text2' UNION ALL
  SELECT NULL, 'Doe', 'text3' UNION ALL
  SELECT 'Ike', 'Smith', 'text4' UNION ALL
  SELECT NULL, 'Smith', 'text5' 
), `project.dataset.customer` AS (
  SELECT 'John' sur, 'Doe' fam UNION ALL
  SELECT 'Jack', 'Doe' UNION ALL
  SELECT 'Lisa', 'Smith' UNION ALL
  SELECT 'Carl', 'Smith' 
)
SELECT c.sur, c.fam,
  ARRAY_AGG(horoscope ORDER BY h.sur DESC LIMIT 1)[OFFSET(0)] horoscope
FROM `project.dataset.customer` c
JOIN `project.dataset.horoscope` h
ON c.fam = h.fam
AND c.sur = IFNULL(h.sur, c.sur)
GROUP BY c.sur, c.fam  

结果
Row sur     fam     horoscope    
1   John    Doe     text1    
2   Jack    Doe     text3    
3   Lisa    Smith   text5    
4   Carl    Smith   text5    

我真的很喜欢你的解决方案，因为它很容易扩展到更大的连接。虽然它在小提琴中工作，但遗憾的是BigQuery不支持它<联接谓词内不支持代码>存在子查询。

BigQuery只生成2行。