Sql 如何使用BigQuery分析函数计算时间戳行之间的时间?
我有一个表示分析事件的数据集,如:Sql 如何使用BigQuery分析函数计算时间戳行之间的时间?,sql,google-bigquery,bigquery-standard-sql,Sql,Google Bigquery,Bigquery Standard Sql,我有一个表示分析事件的数据集,如: Row timestamp account_id type 1 2018-11-14 21:05:40 UTC abc start 2 2018-11-14 21:05:40 UTC xyz another_type 3 2018-11-26 22:01:19 UTC xyz start 4 2018-11-26 22:01:23 UTC abc start 5 2018-11-26
Row timestamp account_id type
1 2018-11-14 21:05:40 UTC abc start
2 2018-11-14 21:05:40 UTC xyz another_type
3 2018-11-26 22:01:19 UTC xyz start
4 2018-11-26 22:01:23 UTC abc start
5 2018-11-26 22:01:29 UTC xyz some_other_type
11 2018-11-26 22:13:58 UTC xyz start
...
具有一定数量的帐户ID。我需要找到每个帐户id
的开始
记录之间的平均时间
我试图使用前面描述的解析函数。我的最终目标是这样一张桌子:
Row account_id avg_time_between_events_mins
1 xyz 53
2 abc 47
3 pqr 65
...
我的最佳尝试(基于)如下所示:
WITH
events AS (
SELECT
COUNTIF(type = 'start' AND account_id='abc') OVER (ORDER BY timestamp) as diff,
timestamp
FROM
`myproject.dataset.events`
WHERE
account_id='abc')
SELECT
min(timestamp) AS start_time,
max(timestamp) AS next_start_time,
ABS(timestamp_diff(min(timestamp), max(timestamp), MINUTE)) AS minutes_between
FROM
events
GROUP BY
diff
WITH
events AS (
SELECT
COUNT(*) OVER (PARTITION BY account_id ORDER BY timestamp ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) as diff,
timestamp
FROM
`myproject.dataset.events`
WHERE
type = 'start')
SELECT
min(timestamp) AS start_time,
max(timestamp) AS next_start_time,
ABS(timestamp_diff(min(timestamp), max(timestamp), MINUTE)) AS minutes_between
FROM
events
GROUP BY
diff
这将计算每个启动
事件与下一个启动
事件之前的上一个非启动
事件之间的时间,该事件针对特定的帐户id
我尝试使用分区
和窗口框架子句
,如下所示:
WITH
events AS (
SELECT
COUNTIF(type = 'start' AND account_id='abc') OVER (ORDER BY timestamp) as diff,
timestamp
FROM
`myproject.dataset.events`
WHERE
account_id='abc')
SELECT
min(timestamp) AS start_time,
max(timestamp) AS next_start_time,
ABS(timestamp_diff(min(timestamp), max(timestamp), MINUTE)) AS minutes_between
FROM
events
GROUP BY
diff
WITH
events AS (
SELECT
COUNT(*) OVER (PARTITION BY account_id ORDER BY timestamp ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) as diff,
timestamp
FROM
`myproject.dataset.events`
WHERE
type = 'start')
SELECT
min(timestamp) AS start_time,
max(timestamp) AS next_start_time,
ABS(timestamp_diff(min(timestamp), max(timestamp), MINUTE)) AS minutes_between
FROM
events
GROUP BY
diff
但我得到了一张毫无意义的结果表。有人能告诉我,我会如何写这样一个问题,并对其进行推理吗 这并不需要解析函数:
select timestamp_diff(min(timestamp), max(timestamp), MINUTE)) / nullif(count(*) - 1, 0)
from `myproject.dataset.events`
where type = 'start'
group by account_id;
这是最近的时间戳减去最早的时间戳,除以比开始次数少一的时间戳。这是两次出发之间的平均值。哦,哇!我不敢相信这个解决方案是多么简单和明智。非常感谢。@SolomonBothwell-同意,这很简单-但我真的怀疑它是否真的回答了你的问题!如果你接受这个答案,我会相信你的。但在这种情况下,你需要考虑调整你的问题来匹配答案:O?这是因为建议的结果有
行
列吗?没有。我想OP要求在下次启动之前,从启动到最后一次非启动事件之间的平均时间。当你回答两次开始之间的平均值时。有道理?这显然是一项简单的任务,但如果我的帖子让人困惑的话,就需要窗口函数库。我要求按帐户id设置开始
事件之间的时间间隔。我认为@GordonLinoff的解决方案需要修改为按帐户id设置的第一部分