获取最新记录或第一笔交易(销售)BigQuery SQL标准

获取最新记录或第一笔交易(销售)BigQuery SQL标准,sql,google-bigquery,Sql,Google Bigquery,我在BigQuery上有一个包含以下列的表: user_id visit_date referral transaction 1234 20180101 site2 0 1234 20180102 site3 1 1234 20180103 site2 1 4567 20180104 site4 0 4567 20180105 site5 0 5678 20180

我在BigQuery上有一个包含以下列的表:

user_id visit_date  referral    transaction
1234    20180101    site2       0
1234    20180102    site3       1
1234    20180103    site2       1
4567    20180104    site4       0
4567    20180105    site5       0
5678    20180101    site2       0
5768    20180102    site3       1
我的目标是将以下格式的表作为输出:

path                transactions
site2 > site3       2 
site2               1
site4 > site5       0
我不明白的是如何“重置”在同一时间段内进行多次转换的用户的路径,就像user_id=1234的情况一样

到目前为止,我使用以下查询,但它不是所需的输出

SELECT
  referral_path,
  SUM(transactions) AS transactions
FROM (
  SELECT
  user_id,
  STRING_AGG(DISTINCT(referral), ',') AS referral_path,
  MAX(transactions) AS transactions
  FROM (
     SELECT
     user_id,
     referral,
     transactions
  FROM
     table
  ORDER BY
     user_id )a
  GROUP BY
     user_id )b
  GROUP BY
     referral_path
  ORDER BY
      transactions DESC

下面是BigQuery标准SQL

#standardSQL
SELECT 
  path, 
  SUM(transaction) transactions
FROM (
  SELECT
    STRING_AGG(referral, ' > ') path,
    SUM(transaction) transaction
  FROM (
    SELECT 
      user_id, visit_date, referral, transaction, 
      IFNULL(SUM(transaction) OVER(PARTITION BY user_id ORDER BY visit_date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0) grp 
    FROM `project.dataset.table`
  )
  GROUP BY user_id, grp
)
GROUP BY path
您可以使用问题中的虚拟数据来测试、播放上述内容,如下所示

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 1234 user_id, '20180101' visit_date, 'site2' referral, 0 transaction UNION ALL
  SELECT 1234, '20180102', 'site3', 1 UNION ALL
  SELECT 1234, '20180103', 'site2', 1 UNION ALL
  SELECT 4567, '20180104', 'site4', 0 UNION ALL
  SELECT 4567, '20180105', 'site5', 0 UNION ALL
  SELECT 5678, '20180101', 'site2', 0 UNION ALL
  SELECT 5678, '20180102', 'site3', 1 
)
SELECT 
  path, 
  SUM(transaction) transactions
FROM (
  SELECT
    STRING_AGG(referral, ' > ') path,
    SUM(transaction) transaction
  FROM (
    SELECT 
      user_id, visit_date, referral, transaction, 
      IFNULL(SUM(transaction) OVER(PARTITION BY user_id ORDER BY visit_date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0) grp 
    FROM `project.dataset.table`
  )
  GROUP BY user_id, grp
)
GROUP BY path
ORDER BY transactions DESC  
结果

Row path            transactions     
1   site2 > site3   2    
2   site2           1    
3   site4 > site5   0    

您所说的重置路径是什么意思?请详细解释。您是否只允许路径中的两个转介?它可能有两个以上的转介。通过重置,我的意思是,对于同一时期的第一个用户(id=1234),它有2次转换,在第一次转换之后,路径应该从site2重新启动。而不是site2>site3>site2.tnx。我懂了。明白了解决方案很好用!我只是不明白over函数末尾的“1 previous”命令。所以在一个特定的用户id中,它从无界的前一行(第一行)到最后一行?这是正确的。当组值发生更改时,此逻辑会捕获,因此您可以在grp BY中使用grp-我建议您逐个运行内部查询以了解其工作原理-因此您可以自己使用此方法:o)