Google bigquery BigQuery-在使用具有不同数据类型的UNION ALL&STRUCT时不能使用DISTINCT
我有3张绿色突出显示的表格,希望将所有表格、潜在客户和联系人合并在一起,每个表格都有1列未显示,红色突出显示。此外,加入联系人表,通过帐户获得3个字段,因此红色突出显示的字段代码_uk将是加入后潜在客户中不存在的另一列。剩余的列我想将一列放在另一列的下方,如查询中所示 所需输出: 按以下查询中的方式排列列,合并所有联系人、潜在客户和联系人-完成 潜在客户中不存在联系人电子邮件/联系人产品的联系人行。i、 e.黄色突出显示的第9、11、14行不应出现。-完成 使用STRUCT,因为Lead&CONTACTS中的列数不匹配-完成 只有来自潜在客户的唯一潜在客户电子邮件(即蓝色突出显示的第2行和第5行)不应出现。当我使用distinct lead_电子邮件时,错误是无法在SELECT distinct中使用STRUCT类型 当我删除distinct只是为了测试STRUCT是否正常工作时,它输出15行,但我想要13行,不包括2和5行。 尝试分组,但也不起作用 有人能帮我修一下吗?参考图像 使用的查询: 联系人:Google bigquery BigQuery-在使用具有不同数据类型的UNION ALL&STRUCT时不能使用DISTINCT,google-bigquery,Google Bigquery,我有3张绿色突出显示的表格,希望将所有表格、潜在客户和联系人合并在一起,每个表格都有1列未显示,红色突出显示。此外,加入联系人表,通过帐户获得3个字段,因此红色突出显示的字段代码_uk将是加入后潜在客户中不存在的另一列。剩余的列我想将一列放在另一列的下方,如查询中所示 所需输出: 按以下查询中的方式排列列,合并所有联系人、潜在客户和联系人-完成 潜在客户中不存在联系人电子邮件/联系人产品的联系人行。i、 e.黄色突出显示的第9、11、14行不应出现。-完成 使用STRUCT,因为Lead&CON
contact_id contact_email contact_product contact_date contact_cancel_date
9 msoffice 2010-01-23 07:30:01 UTC 2020-02-23
07:30:02 UTC
10 pqr@pqr.com playstore 2010-01-23 07:30:01 UTC 2020-02-23 07:30:02 UTC
11 123@123.com 2010-01-23 07:30:01 UTC 2020-02-23 07:30:02 UTC
12 abc1@abc1.com 2010-01-23 07:30:01 UTC 2020-02-23 07:30:02 UTC
13 itunes@apple.com ipod 2010-01-23 07:30:01 UTC 2020-02-23 07:30:02 UTC
14 googlecloud 2010-01-23 07:30:01 UTC 2020-02-23 07:30:02 UTC
15 yahoo@yahoo.com 2010-01-23 07:30:01 UTC 2020-02-23 07:30:02 UTC
16 456@gmail.com 2010-01-23 07:30:01 UTC 2020-02-23 07:30:02 UTC
账户:
account_id employee code_us code_uk
9 100 001-A 450-a
10 200 002-B 451-a
11 300 003-A 452-a
12 400 004-B 453-a
13 500 005-C 454-a
14 600 006-B 455-a
15 700 007-A 456-a
16 800 008-B 457-a
下面是BigQuery标准SQL
#standardSQL
SELECT
leadid,
lead_employee,
lead_product,
lead_reason,
lead_date,
COALESCE(lead_employee_count_code_us, employee) AS lead_employee_count_code_us,
contact_cancel_date,
COALESCE(lead_code, code_us) AS lead_code,
code_uk
FROM (
SELECT DISTINCT
lead_id AS leadid,
lead_email AS lead_employee,
lead_product,
lead_reason,
lead_date,
lead_employee_count AS lead_employee_count_code_us,
CAST(NULL AS TIMESTAMP) AS contact_cancel_date,
lead_code
FROM `project.dataset.leads`
UNION ALL
SELECT
contact_id,
contact_email,
contact_product,
'',
contact_date,
NULL,
contact_cancel_date,
''
FROM `project.dataset.contacts`
WHERE NOT EXISTS (SELECT lead_email FROM `project.dataset.leads` WHERE lead_email = contact_email)
AND NOT EXISTS (SELECT lead_product FROM `project.dataset.leads` WHERE lead_product = contact_product)
)
LEFT JOIN `project.dataset.accounts`
ON leadid = account_id
如果要应用于您问题中的样本数据-结果为
Row leadid lead_employee lead_product lead_reason lead_date lead_employee_count_code_us contact_cancel_date lead_code code_uk
1 1 abc@abc.com msoffice abc 2020-02-23 07:30:02 UTC 1000 null 1005-C null
2 2 pqr1@pqr1.com chrome pqr1 2020-02-23 07:30:02 UTC 2000 null 2006-B null
3 3 xyz@xyz.com iphone xyz 2020-02-23 07:30:02 UTC 3000 null 3007-A null
4 4 zzz@zzz.com macbook zzz 2020-02-23 07:30:02 UTC 4000 null 4008-B null
5 5 xyz1@xyz.com itunes xyz1 2020-02-23 07:30:02 UTC 5000 null 5001-A null
6 6 google@google.com googlecloud xyz2 2020-02-23 07:30:02 UTC 6000 null 6002-B null
7 7 123@123.com yahoomail junk 2020-02-23 07:30:02 UTC 7000 null 7003-A null
8 8 abc1@gmail.com null null 2020-02-23 07:30:02 UTC 8000 null 8004-B null
9 10 pqr@pqr.com playstore 2010-01-23 07:30:01 UTC 200 2020-02-23 07:30:02 UTC 451-a
10 12 abc1@abc1.com null 2010-01-23 07:30:01 UTC 400 2020-02-23 07:30:02 UTC 453-a
11 13 itunes@apple.com ipod 2010-01-23 07:30:01 UTC 500 2020-02-23 07:30:02 UTC 454-a
12 15 yahoo@yahoo.com null 2010-01-23 07:30:01 UTC 700 2020-02-23 07:30:02 UTC 456-a
13 16 456@gmail.com null 2010-01-23 07:30:01 UTC 800 2020-02-23 07:30:02 UTC 457-a
您好@MikhailBerlyant,您能帮忙吗?请您不要把表格写成图片,好吗?这将有助于我重现您的问题是的,请将问题简化为基本问题-这样我们就可以专注于解决阻止整个流程运行的问题。Hello@rmesteves和FelipeHoffa-我编辑了我的问题,并添加了3个源表lead、contact和account作为文本。我已经手动尝试了很多,通过进行大量格式化,使您的工作更轻松,因为很难以正确的格式从stackoverflow中的excel/csv复制粘贴。希望这对你们两个都有帮助。谢谢你,米哈伊尔伯里扬特先生。这很有效。
#standardSQL
SELECT
leadid,
lead_employee,
lead_product,
lead_reason,
lead_date,
COALESCE(lead_employee_count_code_us, employee) AS lead_employee_count_code_us,
contact_cancel_date,
COALESCE(lead_code, code_us) AS lead_code,
code_uk
FROM (
SELECT DISTINCT
lead_id AS leadid,
lead_email AS lead_employee,
lead_product,
lead_reason,
lead_date,
lead_employee_count AS lead_employee_count_code_us,
CAST(NULL AS TIMESTAMP) AS contact_cancel_date,
lead_code
FROM `project.dataset.leads`
UNION ALL
SELECT
contact_id,
contact_email,
contact_product,
'',
contact_date,
NULL,
contact_cancel_date,
''
FROM `project.dataset.contacts`
WHERE NOT EXISTS (SELECT lead_email FROM `project.dataset.leads` WHERE lead_email = contact_email)
AND NOT EXISTS (SELECT lead_product FROM `project.dataset.leads` WHERE lead_product = contact_product)
)
LEFT JOIN `project.dataset.accounts`
ON leadid = account_id
Row leadid lead_employee lead_product lead_reason lead_date lead_employee_count_code_us contact_cancel_date lead_code code_uk
1 1 abc@abc.com msoffice abc 2020-02-23 07:30:02 UTC 1000 null 1005-C null
2 2 pqr1@pqr1.com chrome pqr1 2020-02-23 07:30:02 UTC 2000 null 2006-B null
3 3 xyz@xyz.com iphone xyz 2020-02-23 07:30:02 UTC 3000 null 3007-A null
4 4 zzz@zzz.com macbook zzz 2020-02-23 07:30:02 UTC 4000 null 4008-B null
5 5 xyz1@xyz.com itunes xyz1 2020-02-23 07:30:02 UTC 5000 null 5001-A null
6 6 google@google.com googlecloud xyz2 2020-02-23 07:30:02 UTC 6000 null 6002-B null
7 7 123@123.com yahoomail junk 2020-02-23 07:30:02 UTC 7000 null 7003-A null
8 8 abc1@gmail.com null null 2020-02-23 07:30:02 UTC 8000 null 8004-B null
9 10 pqr@pqr.com playstore 2010-01-23 07:30:01 UTC 200 2020-02-23 07:30:02 UTC 451-a
10 12 abc1@abc1.com null 2010-01-23 07:30:01 UTC 400 2020-02-23 07:30:02 UTC 453-a
11 13 itunes@apple.com ipod 2010-01-23 07:30:01 UTC 500 2020-02-23 07:30:02 UTC 454-a
12 15 yahoo@yahoo.com null 2010-01-23 07:30:01 UTC 700 2020-02-23 07:30:02 UTC 456-a
13 16 456@gmail.com null 2010-01-23 07:30:01 UTC 800 2020-02-23 07:30:02 UTC 457-a