获取最新记录或第一笔交易(销售)BigQuery SQL标准
我在BigQuery上有一个包含以下列的表:获取最新记录或第一笔交易(销售)BigQuery SQL标准,sql,google-bigquery,Sql,Google Bigquery,我在BigQuery上有一个包含以下列的表: user_id visit_date referral transaction 1234 20180101 site2 0 1234 20180102 site3 1 1234 20180103 site2 1 4567 20180104 site4 0 4567 20180105 site5 0 5678 20180
user_id visit_date referral transaction
1234 20180101 site2 0
1234 20180102 site3 1
1234 20180103 site2 1
4567 20180104 site4 0
4567 20180105 site5 0
5678 20180101 site2 0
5768 20180102 site3 1
我的目标是将以下格式的表作为输出:
path transactions
site2 > site3 2
site2 1
site4 > site5 0
我不明白的是如何“重置”在同一时间段内进行多次转换的用户的路径,就像user_id=1234的情况一样
到目前为止,我使用以下查询,但它不是所需的输出
SELECT
referral_path,
SUM(transactions) AS transactions
FROM (
SELECT
user_id,
STRING_AGG(DISTINCT(referral), ',') AS referral_path,
MAX(transactions) AS transactions
FROM (
SELECT
user_id,
referral,
transactions
FROM
table
ORDER BY
user_id )a
GROUP BY
user_id )b
GROUP BY
referral_path
ORDER BY
transactions DESC
下面是BigQuery标准SQL
#standardSQL
SELECT
path,
SUM(transaction) transactions
FROM (
SELECT
STRING_AGG(referral, ' > ') path,
SUM(transaction) transaction
FROM (
SELECT
user_id, visit_date, referral, transaction,
IFNULL(SUM(transaction) OVER(PARTITION BY user_id ORDER BY visit_date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0) grp
FROM `project.dataset.table`
)
GROUP BY user_id, grp
)
GROUP BY path
您可以使用问题中的虚拟数据来测试、播放上述内容,如下所示
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1234 user_id, '20180101' visit_date, 'site2' referral, 0 transaction UNION ALL
SELECT 1234, '20180102', 'site3', 1 UNION ALL
SELECT 1234, '20180103', 'site2', 1 UNION ALL
SELECT 4567, '20180104', 'site4', 0 UNION ALL
SELECT 4567, '20180105', 'site5', 0 UNION ALL
SELECT 5678, '20180101', 'site2', 0 UNION ALL
SELECT 5678, '20180102', 'site3', 1
)
SELECT
path,
SUM(transaction) transactions
FROM (
SELECT
STRING_AGG(referral, ' > ') path,
SUM(transaction) transaction
FROM (
SELECT
user_id, visit_date, referral, transaction,
IFNULL(SUM(transaction) OVER(PARTITION BY user_id ORDER BY visit_date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0) grp
FROM `project.dataset.table`
)
GROUP BY user_id, grp
)
GROUP BY path
ORDER BY transactions DESC
结果
Row path transactions
1 site2 > site3 2
2 site2 1
3 site4 > site5 0
您所说的重置路径是什么意思?请详细解释。您是否只允许路径中的两个转介?它可能有两个以上的转介。通过重置,我的意思是,对于同一时期的第一个用户(id=1234),它有2次转换,在第一次转换之后,路径应该从site2重新启动。而不是site2>site3>site2.tnx。我懂了。明白了解决方案很好用!我只是不明白over函数末尾的“1 previous”命令。所以在一个特定的用户id中,它从无界的前一行(第一行)到最后一行?这是正确的。当组值发生更改时,此逻辑会捕获,因此您可以在grp BY中使用grp-我建议您逐个运行内部查询以了解其工作原理-因此您可以自己使用此方法:o)