基于最后七个条目设置列的SQL查询 问题

基于最后七个条目设置列的SQL查询 问题,sql,google-bigquery,Sql,Google Bigquery,我很难弄清楚如何创建一个查询,该查询可以判断任何userentry之前是否有7天没有任何活动(secondsPlayed==0),如果是,则用值1表示,否则为0 这也意味着如果用户少于7个条目,则所有条目的值都将为0 Input table: +------------------------------+-------------------------+---------------+ | userid | estimationDate

我很难弄清楚如何创建一个查询,该查询可以判断任何userentry之前是否有7天没有任何活动(secondsPlayed==0),如果是,则用值1表示,否则为0


这也意味着如果用户少于7个条目,则所有条目的值都将为0

Input table: +------------------------------+-------------------------+---------------+ | userid | estimationDate | secondsPlayed | +------------------------------+-------------------------+---------------+ | a | 2016-07-14 00:00:00 UTC | 192.5 | | a | 2016-07-15 00:00:00 UTC | 357.3 | | a | 2016-07-16 00:00:00 UTC | 0 | | a | 2016-07-17 00:00:00 UTC | 0 | | a | 2016-07-18 00:00:00 UTC | 0 | | a | 2016-07-19 00:00:00 UTC | 0 | | a | 2016-07-20 00:00:00 UTC | 0 | | a | 2016-07-21 00:00:00 UTC | 0 | | a | 2016-07-22 00:00:00 UTC | 0 | | a | 2016-07-23 00:00:00 UTC | 0 | | a | 2016-07-24 00:00:00 UTC | 0 | | ---------------------------- | ---------------------- | ---- | | b | 2016-07-02 00:00:00 UTC | 31.2 | | b | 2016-07-03 00:00:00 UTC | 42.1 | | b | 2016-07-04 00:00:00 UTC | 41.9 | | b | 2016-07-05 00:00:00 UTC | 43.2 | | b | 2016-07-06 00:00:00 UTC | 91.5 | | b | 2016-07-07 00:00:00 UTC | 0 | | b | 2016-07-08 00:00:00 UTC | 0 | | b | 2016-07-09 00:00:00 UTC | 239.1 | | b | 2016-07-10 00:00:00 UTC | 0 | +------------------------------+-------------------------+---------------+ 输入表: +------------------------------+-------------------------+---------------+ |userid | estimationDate |二次显示| +------------------------------+-------------------------+---------------+ |a | 2016-07-14 00:00:00 UTC | 192.5| |a | 2016-07-15 00:00:00 UTC | 357.3| |a | 2016-07-16 00:00:00 UTC | 0| |a | 2016-07-17 00:00:00 UTC | 0| |a | 2016-07-18 00:00:00 UTC | 0| |a | 2016-07-19 00:00:00 UTC | 0| |a | 2016-07-20 00:00:00 UTC | 0| |a | 2016-07-21 00:00:00 UTC | 0| |a | 2016-07-22 00:00:00 UTC | 0| |a | 2016-07-23 00:00:00 UTC | 0| |a | 2016-07-24 00:00:00 UTC | 0| | ---------------------------- | ---------------------- | ---- | |b | 2016-07-02 00:00:00 UTC | 31.2| |b | 2016-07-03 00:00:00 UTC | 42.1| |b | 2016-07-04 00:00:00 UTC | 41.9| |b | 2016-07-05 00:00:00 UTC | 43.2| |b | 2016-07-06 00:00:00 UTC | 91.5| |b | 2016-07-07 00:00:00 UTC | 0| |b | 2016-07-08 00:00:00 UTC | 0| |b | 2016-07-09 00:00:00 UTC | 239.1| |b | 2016-07-10 00:00:00 UTC | 0| +------------------------------+-------------------------+---------------+ 预期输出表应如下所示:

Output table: +------------------------------+-------------------------+---------------+----------+ | userid | estimationDate | secondsPlayed | inactive | +------------------------------+-------------------------+---------------+----------+ | a | 2016-07-14 00:00:00 UTC | 192.5 | 0 | | a | 2016-07-15 00:00:00 UTC | 357.3 | 0 | | a | 2016-07-16 00:00:00 UTC | 0 | 0 | | a | 2016-07-17 00:00:00 UTC | 0 | 0 | | a | 2016-07-18 00:00:00 UTC | 0 | 0 | | a | 2016-07-19 00:00:00 UTC | 0 | 0 | | a | 2016-07-20 00:00:00 UTC | 0 | 0 | | a | 2016-07-21 00:00:00 UTC | 0 | 0 | | a | 2016-07-22 00:00:00 UTC | 0 | 1 | | a | 2016-07-23 00:00:00 UTC | 0 | 1 | | a | 2016-07-24 00:00:00 UTC | 0 | 1 | | ---------------------------- | ----------------------- | ----- | ----- | | b | 2016-07-02 00:00:00 UTC | 31.2 | 0 | | b | 2016-07-03 00:00:00 UTC | 42.1 | 0 | | b | 2016-07-04 00:00:00 UTC | 41.9 | 0 | | b | 2016-07-05 00:00:00 UTC | 43.2 | 0 | | b | 2016-07-06 00:00:00 UTC | 91.5 | 0 | | b | 2016-07-07 00:00:00 UTC | 0 | 0 | | b | 2016-07-08 00:00:00 UTC | 0 | 0 | | b | 2016-07-09 00:00:00 UTC | 239.1 | 0 | | b | 2016-07-10 00:00:00 UTC | 0 | 0 | +------------------------------+-------------------------+---------------+----------+ 输出表: +------------------------------+-------------------------+---------------+----------+ |用户ID |估计日期|二次显示|不活动| +------------------------------+-------------------------+---------------+----------+ |a | 2016-07-14 00:00:00 UTC | 192.5 | 0| |a | 2016-07-15 00:00:00 UTC | 357.3 | 0| |a | 2016-07-16 00:00:00 UTC | 0 | 0| |a | 2016-07-17 00:00:00 UTC | 0 | 0| |a | 2016-07-18 00:00:00 UTC | 0 | 0| |a | 2016-07-19 00:00:00 UTC | 0 | 0| |a | 2016-07-20 00:00:00 UTC | 0 | 0| |a | 2016-07-21 00:00:00 UTC | 0 | 0| |a | 2016-07-22 00:00:00 UTC | 0 | 1| |a | 2016-07-23 00:00:00 UTC | 0 | 1| |a | 2016-07-24 00:00:00 UTC | 0 | 1| | ---------------------------- | ----------------------- | ----- | ----- | |b | 2016-07-02 00:00:00 UTC | 31.2 | 0| |b | 2016-07-03 00:00:00 UTC | 42.1 | 0| |b | 2016-07-04 00:00:00 UTC | 41.9 | 0| |b | 2016-07-05 00:00:00 UTC | 43.2 | 0| |b | 2016-07-06 00:00:00 UTC | 91.5 | 0| |b | 2016-07-07 00:00:00 UTC | 0 | 0| |b | 2016-07-08 00:00:00 UTC | 0 | 0| |b | 2016-07-09 00:00:00 UTC | 239.1 | 0| |b | 2016-07-10 00:00:00 UTC | 0 | 0| +------------------------------+-------------------------+---------------+----------+ 思想 起初,我想使用7偏移量的滞后函数,但这显然与两者之间的任何一个主题无关

我还考虑创建一个7天的滚动窗口/平均值,并评估它是否高于0。不过,这可能比我的技能水平高一点


任何人都有很好的解决方案。

假设您每天都有数据(这似乎是一个合理的假设),您可以对窗口函数求和:

select t.*,
       (case when sum(secondsplayed) over (partition by userid
                                           order by estimationdate
                                           rows between 6 preceding and current row
                                          ) = 0 and
                  row_number() over (partition by userid order by estimationdate) >= 7
             then 1
             else 0
        end) as inactive                  
from t;

除了日期上没有孔外,这还假设
secondsplayed
永远不会为负值。(负值可以很容易地合并到逻辑中,但这似乎是不必要的。)

根据我的经验,这种类型的输入表不包含非活动项,通常看起来是这样的(这里只显示活动项)

结果

Row userid day secondsPlayed inactive ... 13 a 2016-07-14 192.5 0 14 a 2016-07-15 357.3 0 15 a 2016-07-15 357.3 0 16 a 2016-07-16 0.0 0 17 a 2016-07-17 0.0 0 18 a 2016-07-18 0.0 0 19 a 2016-07-19 0.0 0 20 a 2016-07-20 0.0 0 21 a 2016-07-21 0.0 0 22 a 2016-07-22 0.0 1 23 a 2016-07-23 0.0 1 24 a 2016-07-24 0.0 1 25 b 2016-07-02 31.2 0 26 b 2016-07-03 42.1 0 27 b 2016-07-04 41.9 0 28 b 2016-07-05 43.2 0 29 b 2016-07-06 91.5 0 30 b 2016-07-07 0.0 0 31 b 2016-07-08 0.0 0 32 b 2016-07-09 239.1 0 33 b 2016-07-10 0.0 0 ... 行用户ID第二天显示不活动 ... 13A 2016-07-14192.5 0 14A 2016-07-15 357.3 0 15A 2016-07-15357.3 0 16A 2016-07-16 0.0 17A 2016-07-17 0.0 18 a 2016-07-18 0.0 19 a 2016-07-19 0.0 0 20A 2016-07-20 0.0 21A 2016-07-21 0.0 22 a 2016-07-22 0.0 1 23A 2016-07-23 0.0 1 24 a 2016-07-24 0.0
#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'a' userid, TIMESTAMP '2016-07-14 00:00:00 UTC' estimationDate, 192.5 secondsPlayed UNION ALL
  SELECT 'a', '2016-07-15 00:00:00 UTC', 357.3 UNION ALL
  SELECT 'b', '2016-07-02 00:00:00 UTC', 31.2 UNION ALL
  SELECT 'b', '2016-07-03 00:00:00 UTC', 42.1 UNION ALL
  SELECT 'b', '2016-07-04 00:00:00 UTC', 41.9 UNION ALL
  SELECT 'b', '2016-07-05 00:00:00 UTC', 43.2 UNION ALL
  SELECT 'b', '2016-07-06 00:00:00 UTC', 91.5 UNION ALL
  SELECT 'b', '2016-07-09 00:00:00 UTC', 239.1 
), time_frame AS (
  SELECT day
  FROM UNNEST(GENERATE_DATE_ARRAY('2016-07-02', '2016-07-24')) day
)
SELECT 
  users.userid, 
  day, 
  IFNULL(secondsPlayed, 0) secondsPlayed,
  CAST(1 - SIGN(SUM(IFNULL(secondsPlayed, 0)) 
    OVER(
      PARTITION BY users.userid 
      ORDER BY UNIX_DATE(day)
      RANGE BETWEEN 6 PRECEDING AND CURRENT ROW
    )) AS INT64) AS inactive 
FROM time_frame tf
CROSS JOIN (SELECT DISTINCT userid FROM `project.dataset.table`) users
LEFT JOIN `project.dataset.table` t
ON day = DATE(estimationDate) AND users.userid = t.userid
ORDER BY userid, day   
Row userid day secondsPlayed inactive ... 13 a 2016-07-14 192.5 0 14 a 2016-07-15 357.3 0 15 a 2016-07-15 357.3 0 16 a 2016-07-16 0.0 0 17 a 2016-07-17 0.0 0 18 a 2016-07-18 0.0 0 19 a 2016-07-19 0.0 0 20 a 2016-07-20 0.0 0 21 a 2016-07-21 0.0 0 22 a 2016-07-22 0.0 1 23 a 2016-07-23 0.0 1 24 a 2016-07-24 0.0 1 25 b 2016-07-02 31.2 0 26 b 2016-07-03 42.1 0 27 b 2016-07-04 41.9 0 28 b 2016-07-05 43.2 0 29 b 2016-07-06 91.5 0 30 b 2016-07-07 0.0 0 31 b 2016-07-08 0.0 0 32 b 2016-07-09 239.1 0 33 b 2016-07-10 0.0 0 ...