Sql 根据列值合并两个日期不同的表？_Sql_Google Bigquery_Union

Sql 根据列值合并两个日期不同的表？

sql google-bigquery

Sql 根据列值合并两个日期不同的表？,sql,google-bigquery,union,Sql,Google Bigquery,Union,我们在1月1日切换到一个新的平台，我需要合并两个表，以获得一个合并了旧数据和新数据的数据源。然而，一些帐户必须在1月1日之前从旧平台上注销新的数据表包含所有账户的12月数据，但我只想在没有旧的12月数据的情况下使用新的12月数据。我如何将新数据与1月1日开始的大多数账户数据以及12月适当日期开始的少数异常账户数据合并例：对于Account1，我需要从1月1日开始的新数据；对于Account2，我需要12月30日的新数据；对于账户3，我需要12月31日的新数据 Old Table ----

我们在1月1日切换到一个新的平台，我需要合并两个表，以获得一个合并了旧数据和新数据的数据源。然而，一些帐户必须在1月1日之前从旧平台上注销

新的数据表包含所有账户的12月数据，但我只想在没有旧的12月数据的情况下使用新的12月数据。我如何将新数据与1月1日开始的大多数账户数据以及12月适当日期开始的少数异常账户数据合并

例：对于Account1，我需要从1月1日开始的新数据；对于Account2，我需要12月30日的新数据；对于账户3，我需要12月31日的新数据

Old Table  
------------------------------------   
Account         Date         Sales  
------------------------------------
Account1        12-29-18     10  
Account1        12-30-18     10  
Account1        12-31-18     5  
Account2        12-29-18     10    
Account3        12-29-18     20  
Account3        12-30-18     10

New Table
------------------------------------   
Account         Date         Sales  
------------------------------------
Account1        12-29-18     10  
Account1        12-30-18     10  
Account1        12-31-18     5  
Account1        01-01-19     20  
Account2        12-30-18     15  
Account2        12-31-18     20  
Account2        01-01-19     10  
Account3        12-30-18     10  
Account3        12-31-18     20  
Account3        01-01-19     5  

Output
------------------------------------   
Account         Date         Sales  
------------------------------------
Account1        12-29-18     10  
Account1        12-30-18     10  
Account1        12-31-18     5  
Account1        01-01-19     20  
Account2        12-29-18     10
Account2        12-30-18     15  
Account2        12-31-18     20  
Account2        01-01-19     10
Account3        12-29-18     20  
Account3        12-30-18     10
Account3        12-31-18     20  
Account3        01-01-19     5

下面是BigQuery标准SQL

  #standardSQL
  SELECT account, date, 
    ARRAY_AGG(sales ORDER BY data LIMIT 1)[OFFSET(0)] sales
  FROM (
    SELECT 'old' data, * FROM `project.dataset.old_table` UNION ALL 
    SELECT 'new' data, * FROM `project.dataset.new_table` 
  )
  GROUP BY account, date

您可以使用您的问题中的示例数据来测试、播放上述内容

  #standardSQL
  WITH `project.dataset.old_table` AS (
    SELECT 'Account1' account, '12-29-18' date, 10 sales UNION ALL  
    SELECT 'Account1', '12-30-18', 10 UNION ALL  
    SELECT 'Account1', '12-31-18', 5 UNION ALL  
    SELECT 'Account2', '12-29-18', 10 UNION ALL    
    SELECT 'Account3', '12-29-18', 20 UNION ALL  
    SELECT 'Account3', '12-30-18', 10 
  ),  `project.dataset.new_table` AS (
    SELECT 'Account1' account, '12-29-18' date, 10 sales UNION ALL
    SELECT 'Account1', '12-30-18', 10 UNION ALL
    SELECT 'Account1', '12-31-18', 5 UNION ALL
    SELECT 'Account1', '01-01-19', 20 UNION ALL
    SELECT 'Account2', '12-30-18', 15 UNION ALL
    SELECT 'Account2', '12-31-18', 20 UNION ALL
    SELECT 'Account2', '01-01-19', 10 UNION ALL
    SELECT 'Account3', '12-30-18', 10 UNION ALL
    SELECT 'Account3', '12-31-18', 20 UNION ALL
    SELECT 'Account3', '01-01-19', 5 
  )
  SELECT account, date, 
    ARRAY_AGG(sales ORDER BY data LIMIT 1)[OFFSET(0)] sales
  FROM (
    SELECT 'old' data, * FROM `project.dataset.old_table` UNION ALL 
    SELECT 'new' data, * FROM `project.dataset.new_table` 
  )
  GROUP BY account, date
  ORDER BY account, PARSE_DATE('%m-%d-%y', date)

结果

Row account     date        sales    
1   Account1    12-29-18    10   
2   Account1    12-30-18    10   
3   Account1    12-31-18    5    
4   Account1    01-01-19    20   
5   Account2    12-29-18    10   
6   Account2    12-30-18    15   
7   Account2    12-31-18    20   
8   Account2    01-01-19    10   
9   Account3    12-29-18    20   
10  Account3    12-30-18    10   
11  Account3    12-31-18    20   
12  Account3    01-01-19    5

这项工作完成了，非常感谢！如果你能为我澄清，我有几个问题。ARRAY_AGG函数到底做什么？我查过了，但理解起来有困难。另外，为什么按数据排序只限于“旧”数据？因为按字母顺序排列的“new”在排序后会位于顶部，所以按数据排序然后限制它不会使用“new”数据吗？是的，绝对会。再次注释以确保您看到上面的编辑注释

ARRAY_AGG

with

groupby

按

帐户、日期聚合所有销售额
，然后按数据排序数组，然后只保留顶部元素。最后，偏移量（0）

从数组中取出第一个（也是唯一一个）元素