基于最后七个条目设置列的SQL查询 问题
我很难弄清楚如何创建一个查询,该查询可以判断任何userentry之前是否有7天没有任何活动(secondsPlayed==0),如果是,则用值1表示,否则为0基于最后七个条目设置列的SQL查询 问题,sql,google-bigquery,Sql,Google Bigquery,我很难弄清楚如何创建一个查询,该查询可以判断任何userentry之前是否有7天没有任何活动(secondsPlayed==0),如果是,则用值1表示,否则为0 这也意味着如果用户少于7个条目,则所有条目的值都将为0 Input table: +------------------------------+-------------------------+---------------+ | userid | estimationDate
这也意味着如果用户少于7个条目,则所有条目的值都将为0 Input table: +------------------------------+-------------------------+---------------+ | userid | estimationDate | secondsPlayed | +------------------------------+-------------------------+---------------+ | a | 2016-07-14 00:00:00 UTC | 192.5 | | a | 2016-07-15 00:00:00 UTC | 357.3 | | a | 2016-07-16 00:00:00 UTC | 0 | | a | 2016-07-17 00:00:00 UTC | 0 | | a | 2016-07-18 00:00:00 UTC | 0 | | a | 2016-07-19 00:00:00 UTC | 0 | | a | 2016-07-20 00:00:00 UTC | 0 | | a | 2016-07-21 00:00:00 UTC | 0 | | a | 2016-07-22 00:00:00 UTC | 0 | | a | 2016-07-23 00:00:00 UTC | 0 | | a | 2016-07-24 00:00:00 UTC | 0 | | ---------------------------- | ---------------------- | ---- | | b | 2016-07-02 00:00:00 UTC | 31.2 | | b | 2016-07-03 00:00:00 UTC | 42.1 | | b | 2016-07-04 00:00:00 UTC | 41.9 | | b | 2016-07-05 00:00:00 UTC | 43.2 | | b | 2016-07-06 00:00:00 UTC | 91.5 | | b | 2016-07-07 00:00:00 UTC | 0 | | b | 2016-07-08 00:00:00 UTC | 0 | | b | 2016-07-09 00:00:00 UTC | 239.1 | | b | 2016-07-10 00:00:00 UTC | 0 | +------------------------------+-------------------------+---------------+ 输入表: +------------------------------+-------------------------+---------------+ |userid | estimationDate |二次显示| +------------------------------+-------------------------+---------------+ |a | 2016-07-14 00:00:00 UTC | 192.5| |a | 2016-07-15 00:00:00 UTC | 357.3| |a | 2016-07-16 00:00:00 UTC | 0| |a | 2016-07-17 00:00:00 UTC | 0| |a | 2016-07-18 00:00:00 UTC | 0| |a | 2016-07-19 00:00:00 UTC | 0| |a | 2016-07-20 00:00:00 UTC | 0| |a | 2016-07-21 00:00:00 UTC | 0| |a | 2016-07-22 00:00:00 UTC | 0| |a | 2016-07-23 00:00:00 UTC | 0| |a | 2016-07-24 00:00:00 UTC | 0| | ---------------------------- | ---------------------- | ---- | |b | 2016-07-02 00:00:00 UTC | 31.2| |b | 2016-07-03 00:00:00 UTC | 42.1| |b | 2016-07-04 00:00:00 UTC | 41.9| |b | 2016-07-05 00:00:00 UTC | 43.2| |b | 2016-07-06 00:00:00 UTC | 91.5| |b | 2016-07-07 00:00:00 UTC | 0| |b | 2016-07-08 00:00:00 UTC | 0| |b | 2016-07-09 00:00:00 UTC | 239.1| |b | 2016-07-10 00:00:00 UTC | 0| +------------------------------+-------------------------+---------------+ 预期输出表应如下所示: Output table: +------------------------------+-------------------------+---------------+----------+ | userid | estimationDate | secondsPlayed | inactive | +------------------------------+-------------------------+---------------+----------+ | a | 2016-07-14 00:00:00 UTC | 192.5 | 0 | | a | 2016-07-15 00:00:00 UTC | 357.3 | 0 | | a | 2016-07-16 00:00:00 UTC | 0 | 0 | | a | 2016-07-17 00:00:00 UTC | 0 | 0 | | a | 2016-07-18 00:00:00 UTC | 0 | 0 | | a | 2016-07-19 00:00:00 UTC | 0 | 0 | | a | 2016-07-20 00:00:00 UTC | 0 | 0 | | a | 2016-07-21 00:00:00 UTC | 0 | 0 | | a | 2016-07-22 00:00:00 UTC | 0 | 1 | | a | 2016-07-23 00:00:00 UTC | 0 | 1 | | a | 2016-07-24 00:00:00 UTC | 0 | 1 | | ---------------------------- | ----------------------- | ----- | ----- | | b | 2016-07-02 00:00:00 UTC | 31.2 | 0 | | b | 2016-07-03 00:00:00 UTC | 42.1 | 0 | | b | 2016-07-04 00:00:00 UTC | 41.9 | 0 | | b | 2016-07-05 00:00:00 UTC | 43.2 | 0 | | b | 2016-07-06 00:00:00 UTC | 91.5 | 0 | | b | 2016-07-07 00:00:00 UTC | 0 | 0 | | b | 2016-07-08 00:00:00 UTC | 0 | 0 | | b | 2016-07-09 00:00:00 UTC | 239.1 | 0 | | b | 2016-07-10 00:00:00 UTC | 0 | 0 | +------------------------------+-------------------------+---------------+----------+ 输出表: +------------------------------+-------------------------+---------------+----------+ |用户ID |估计日期|二次显示|不活动| +------------------------------+-------------------------+---------------+----------+ |a | 2016-07-14 00:00:00 UTC | 192.5 | 0| |a | 2016-07-15 00:00:00 UTC | 357.3 | 0| |a | 2016-07-16 00:00:00 UTC | 0 | 0| |a | 2016-07-17 00:00:00 UTC | 0 | 0| |a | 2016-07-18 00:00:00 UTC | 0 | 0| |a | 2016-07-19 00:00:00 UTC | 0 | 0| |a | 2016-07-20 00:00:00 UTC | 0 | 0| |a | 2016-07-21 00:00:00 UTC | 0 | 0| |a | 2016-07-22 00:00:00 UTC | 0 | 1| |a | 2016-07-23 00:00:00 UTC | 0 | 1| |a | 2016-07-24 00:00:00 UTC | 0 | 1| | ---------------------------- | ----------------------- | ----- | ----- | |b | 2016-07-02 00:00:00 UTC | 31.2 | 0| |b | 2016-07-03 00:00:00 UTC | 42.1 | 0| |b | 2016-07-04 00:00:00 UTC | 41.9 | 0| |b | 2016-07-05 00:00:00 UTC | 43.2 | 0| |b | 2016-07-06 00:00:00 UTC | 91.5 | 0| |b | 2016-07-07 00:00:00 UTC | 0 | 0| |b | 2016-07-08 00:00:00 UTC | 0 | 0| |b | 2016-07-09 00:00:00 UTC | 239.1 | 0| |b | 2016-07-10 00:00:00 UTC | 0 | 0| +------------------------------+-------------------------+---------------+----------+ 思想 起初,我想使用7偏移量的滞后函数,但这显然与两者之间的任何一个主题无关 我还考虑创建一个7天的滚动窗口/平均值,并评估它是否高于0。不过,这可能比我的技能水平高一点
任何人都有很好的解决方案。假设您每天都有数据(这似乎是一个合理的假设),您可以对窗口函数求和:
select t.*,
(case when sum(secondsplayed) over (partition by userid
order by estimationdate
rows between 6 preceding and current row
) = 0 and
row_number() over (partition by userid order by estimationdate) >= 7
then 1
else 0
end) as inactive
from t;
除了日期上没有孔外,这还假设
secondsplayed
永远不会为负值。(负值可以很容易地合并到逻辑中,但这似乎是不必要的。)根据我的经验,这种类型的输入表不包含非活动项,通常看起来是这样的(这里只显示活动项)
结果
Row userid day secondsPlayed inactive
...
13 a 2016-07-14 192.5 0
14 a 2016-07-15 357.3 0
15 a 2016-07-15 357.3 0
16 a 2016-07-16 0.0 0
17 a 2016-07-17 0.0 0
18 a 2016-07-18 0.0 0
19 a 2016-07-19 0.0 0
20 a 2016-07-20 0.0 0
21 a 2016-07-21 0.0 0
22 a 2016-07-22 0.0 1
23 a 2016-07-23 0.0 1
24 a 2016-07-24 0.0 1
25 b 2016-07-02 31.2 0
26 b 2016-07-03 42.1 0
27 b 2016-07-04 41.9 0
28 b 2016-07-05 43.2 0
29 b 2016-07-06 91.5 0
30 b 2016-07-07 0.0 0
31 b 2016-07-08 0.0 0
32 b 2016-07-09 239.1 0
33 b 2016-07-10 0.0 0
...
行用户ID第二天显示不活动
...
13A 2016-07-14192.5 0
14A 2016-07-15 357.3 0
15A 2016-07-15357.3 0
16A 2016-07-16 0.0
17A 2016-07-17 0.0
18 a 2016-07-18 0.0
19 a 2016-07-19 0.0 0
20A 2016-07-20 0.0
21A 2016-07-21 0.0
22 a 2016-07-22 0.0 1
23A 2016-07-23 0.0 1
24 a 2016-07-24 0.0
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'a' userid, TIMESTAMP '2016-07-14 00:00:00 UTC' estimationDate, 192.5 secondsPlayed UNION ALL
SELECT 'a', '2016-07-15 00:00:00 UTC', 357.3 UNION ALL
SELECT 'b', '2016-07-02 00:00:00 UTC', 31.2 UNION ALL
SELECT 'b', '2016-07-03 00:00:00 UTC', 42.1 UNION ALL
SELECT 'b', '2016-07-04 00:00:00 UTC', 41.9 UNION ALL
SELECT 'b', '2016-07-05 00:00:00 UTC', 43.2 UNION ALL
SELECT 'b', '2016-07-06 00:00:00 UTC', 91.5 UNION ALL
SELECT 'b', '2016-07-09 00:00:00 UTC', 239.1
), time_frame AS (
SELECT day
FROM UNNEST(GENERATE_DATE_ARRAY('2016-07-02', '2016-07-24')) day
)
SELECT
users.userid,
day,
IFNULL(secondsPlayed, 0) secondsPlayed,
CAST(1 - SIGN(SUM(IFNULL(secondsPlayed, 0))
OVER(
PARTITION BY users.userid
ORDER BY UNIX_DATE(day)
RANGE BETWEEN 6 PRECEDING AND CURRENT ROW
)) AS INT64) AS inactive
FROM time_frame tf
CROSS JOIN (SELECT DISTINCT userid FROM `project.dataset.table`) users
LEFT JOIN `project.dataset.table` t
ON day = DATE(estimationDate) AND users.userid = t.userid
ORDER BY userid, day
Row userid day secondsPlayed inactive
...
13 a 2016-07-14 192.5 0
14 a 2016-07-15 357.3 0
15 a 2016-07-15 357.3 0
16 a 2016-07-16 0.0 0
17 a 2016-07-17 0.0 0
18 a 2016-07-18 0.0 0
19 a 2016-07-19 0.0 0
20 a 2016-07-20 0.0 0
21 a 2016-07-21 0.0 0
22 a 2016-07-22 0.0 1
23 a 2016-07-23 0.0 1
24 a 2016-07-24 0.0 1
25 b 2016-07-02 31.2 0
26 b 2016-07-03 42.1 0
27 b 2016-07-04 41.9 0
28 b 2016-07-05 43.2 0
29 b 2016-07-06 91.5 0
30 b 2016-07-07 0.0 0
31 b 2016-07-08 0.0 0
32 b 2016-07-09 239.1 0
33 b 2016-07-10 0.0 0
...