Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/asp.net-core/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Google bigquery BigQuery-在使用具有不同数据类型的UNION ALL&STRUCT时不能使用DISTINCT_Google Bigquery - Fatal编程技术网

Google bigquery BigQuery-在使用具有不同数据类型的UNION ALL&STRUCT时不能使用DISTINCT

Google bigquery BigQuery-在使用具有不同数据类型的UNION ALL&STRUCT时不能使用DISTINCT,google-bigquery,Google Bigquery,我有3张绿色突出显示的表格,希望将所有表格、潜在客户和联系人合并在一起,每个表格都有1列未显示,红色突出显示。此外,加入联系人表,通过帐户获得3个字段,因此红色突出显示的字段代码_uk将是加入后潜在客户中不存在的另一列。剩余的列我想将一列放在另一列的下方,如查询中所示 所需输出: 按以下查询中的方式排列列,合并所有联系人、潜在客户和联系人-完成 潜在客户中不存在联系人电子邮件/联系人产品的联系人行。i、 e.黄色突出显示的第9、11、14行不应出现。-完成 使用STRUCT,因为Lead&CON

我有3张绿色突出显示的表格,希望将所有表格、潜在客户和联系人合并在一起,每个表格都有1列未显示,红色突出显示。此外,加入联系人表,通过帐户获得3个字段,因此红色突出显示的字段代码_uk将是加入后潜在客户中不存在的另一列。剩余的列我想将一列放在另一列的下方,如查询中所示

所需输出:

按以下查询中的方式排列列,合并所有联系人、潜在客户和联系人-完成 潜在客户中不存在联系人电子邮件/联系人产品的联系人行。i、 e.黄色突出显示的第9、11、14行不应出现。-完成 使用STRUCT,因为Lead&CONTACTS中的列数不匹配-完成 只有来自潜在客户的唯一潜在客户电子邮件(即蓝色突出显示的第2行和第5行)不应出现。当我使用distinct lead_电子邮件时,错误是无法在SELECT distinct中使用STRUCT类型

当我删除distinct只是为了测试STRUCT是否正常工作时,它输出15行,但我想要13行,不包括2和5行。 尝试分组,但也不起作用 有人能帮我修一下吗?参考图像

使用的查询:

联系人:

    contact_id  contact_email   contact_product contact_date        contact_cancel_date
    9                         msoffice  2010-01-23 07:30:01 UTC 2020-02-23 
 07:30:02 UTC
    10         pqr@pqr.com   playstore  2010-01-23 07:30:01 UTC 2020-02-23 07:30:02 UTC
    11         123@123.com              2010-01-23 07:30:01 UTC 2020-02-23 07:30:02 UTC
    12         abc1@abc1.com            2010-01-23 07:30:01 UTC 2020-02-23 07:30:02 UTC
    13        itunes@apple.com  ipod    2010-01-23 07:30:01 UTC 2020-02-23 07:30:02 UTC
    14                      googlecloud 2010-01-23 07:30:01 UTC 2020-02-23 07:30:02 UTC
    15        yahoo@yahoo.com           2010-01-23 07:30:01 UTC 2020-02-23 07:30:02 UTC
    16        456@gmail.com             2010-01-23 07:30:01 UTC 2020-02-23 07:30:02 UTC
账户:

account_id  employee    code_us code_uk
9            100        001-A   450-a
10           200        002-B   451-a
11           300        003-A   452-a
12           400        004-B   453-a
13           500        005-C   454-a
14           600        006-B   455-a
15           700        007-A   456-a
16           800        008-B   457-a

下面是BigQuery标准SQL

#standardSQL
SELECT
    leadid,
    lead_employee,
    lead_product,
    lead_reason, 
    lead_date,
    COALESCE(lead_employee_count_code_us, employee) AS lead_employee_count_code_us,
    contact_cancel_date,
    COALESCE(lead_code, code_us) AS lead_code,
    code_uk
FROM (
  SELECT DISTINCT
    lead_id AS leadid,
    lead_email AS lead_employee,
    lead_product,
    lead_reason, 
    lead_date,
    lead_employee_count AS lead_employee_count_code_us,
    CAST(NULL AS TIMESTAMP) AS contact_cancel_date,
    lead_code
  FROM `project.dataset.leads`
    UNION ALL
  SELECT 
    contact_id, 
    contact_email,
    contact_product,
    '',
    contact_date, 
    NULL,
    contact_cancel_date,
    ''
  FROM `project.dataset.contacts`
  WHERE NOT EXISTS (SELECT lead_email FROM `project.dataset.leads` WHERE lead_email = contact_email)
  AND NOT EXISTS (SELECT lead_product FROM `project.dataset.leads` WHERE lead_product = contact_product)
)
LEFT JOIN `project.dataset.accounts`
ON leadid = account_id  
如果要应用于您问题中的样本数据-结果为

Row leadid  lead_employee       lead_product    lead_reason lead_date               lead_employee_count_code_us contact_cancel_date lead_code   code_uk  
1   1       abc@abc.com         msoffice        abc         2020-02-23 07:30:02 UTC 1000    null                    1005-C  null     
2   2       pqr1@pqr1.com       chrome          pqr1        2020-02-23 07:30:02 UTC 2000    null                    2006-B  null     
3   3       xyz@xyz.com         iphone          xyz         2020-02-23 07:30:02 UTC 3000    null                    3007-A  null     
4   4       zzz@zzz.com         macbook         zzz         2020-02-23 07:30:02 UTC 4000    null                    4008-B  null     
5   5       xyz1@xyz.com        itunes          xyz1        2020-02-23 07:30:02 UTC 5000    null                    5001-A  null     
6   6       google@google.com   googlecloud     xyz2        2020-02-23 07:30:02 UTC 6000    null                    6002-B  null     
7   7       123@123.com         yahoomail       junk        2020-02-23 07:30:02 UTC 7000    null                    7003-A  null     
8   8       abc1@gmail.com      null            null        2020-02-23 07:30:02 UTC 8000    null                    8004-B  null     
9   10      pqr@pqr.com         playstore                   2010-01-23 07:30:01 UTC 200     2020-02-23 07:30:02 UTC         451-a    
10  12      abc1@abc1.com       null                        2010-01-23 07:30:01 UTC 400     2020-02-23 07:30:02 UTC         453-a    
11  13      itunes@apple.com    ipod                        2010-01-23 07:30:01 UTC 500     2020-02-23 07:30:02 UTC         454-a    
12  15      yahoo@yahoo.com     null                        2010-01-23 07:30:01 UTC 700     2020-02-23 07:30:02 UTC         456-a    
13  16      456@gmail.com       null                        2010-01-23 07:30:01 UTC 800     2020-02-23 07:30:02 UTC         457-a    

您好@MikhailBerlyant,您能帮忙吗?请您不要把表格写成图片,好吗?这将有助于我重现您的问题是的,请将问题简化为基本问题-这样我们就可以专注于解决阻止整个流程运行的问题。Hello@rmesteves和FelipeHoffa-我编辑了我的问题,并添加了3个源表lead、contact和account作为文本。我已经手动尝试了很多,通过进行大量格式化,使您的工作更轻松,因为很难以正确的格式从stackoverflow中的excel/csv复制粘贴。希望这对你们两个都有帮助。谢谢你,米哈伊尔伯里扬特先生。这很有效。
#standardSQL
SELECT
    leadid,
    lead_employee,
    lead_product,
    lead_reason, 
    lead_date,
    COALESCE(lead_employee_count_code_us, employee) AS lead_employee_count_code_us,
    contact_cancel_date,
    COALESCE(lead_code, code_us) AS lead_code,
    code_uk
FROM (
  SELECT DISTINCT
    lead_id AS leadid,
    lead_email AS lead_employee,
    lead_product,
    lead_reason, 
    lead_date,
    lead_employee_count AS lead_employee_count_code_us,
    CAST(NULL AS TIMESTAMP) AS contact_cancel_date,
    lead_code
  FROM `project.dataset.leads`
    UNION ALL
  SELECT 
    contact_id, 
    contact_email,
    contact_product,
    '',
    contact_date, 
    NULL,
    contact_cancel_date,
    ''
  FROM `project.dataset.contacts`
  WHERE NOT EXISTS (SELECT lead_email FROM `project.dataset.leads` WHERE lead_email = contact_email)
  AND NOT EXISTS (SELECT lead_product FROM `project.dataset.leads` WHERE lead_product = contact_product)
)
LEFT JOIN `project.dataset.accounts`
ON leadid = account_id  
Row leadid  lead_employee       lead_product    lead_reason lead_date               lead_employee_count_code_us contact_cancel_date lead_code   code_uk  
1   1       abc@abc.com         msoffice        abc         2020-02-23 07:30:02 UTC 1000    null                    1005-C  null     
2   2       pqr1@pqr1.com       chrome          pqr1        2020-02-23 07:30:02 UTC 2000    null                    2006-B  null     
3   3       xyz@xyz.com         iphone          xyz         2020-02-23 07:30:02 UTC 3000    null                    3007-A  null     
4   4       zzz@zzz.com         macbook         zzz         2020-02-23 07:30:02 UTC 4000    null                    4008-B  null     
5   5       xyz1@xyz.com        itunes          xyz1        2020-02-23 07:30:02 UTC 5000    null                    5001-A  null     
6   6       google@google.com   googlecloud     xyz2        2020-02-23 07:30:02 UTC 6000    null                    6002-B  null     
7   7       123@123.com         yahoomail       junk        2020-02-23 07:30:02 UTC 7000    null                    7003-A  null     
8   8       abc1@gmail.com      null            null        2020-02-23 07:30:02 UTC 8000    null                    8004-B  null     
9   10      pqr@pqr.com         playstore                   2010-01-23 07:30:01 UTC 200     2020-02-23 07:30:02 UTC         451-a    
10  12      abc1@abc1.com       null                        2010-01-23 07:30:01 UTC 400     2020-02-23 07:30:02 UTC         453-a    
11  13      itunes@apple.com    ipod                        2010-01-23 07:30:01 UTC 500     2020-02-23 07:30:02 UTC         454-a    
12  15      yahoo@yahoo.com     null                        2010-01-23 07:30:01 UTC 700     2020-02-23 07:30:02 UTC         456-a    
13  16      456@gmail.com       null                        2010-01-23 07:30:01 UTC 800     2020-02-23 07:30:02 UTC         457-a