Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/file/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Sql 如何为过去六个月的数据连续四周循环BigQuery_Sql_Google Bigquery - Fatal编程技术网

Sql 如何为过去六个月的数据连续四周循环BigQuery

Sql 如何为过去六个月的数据连续四周循环BigQuery,sql,google-bigquery,Sql,Google Bigquery,例如,我在大查询中有如下表(按日期分区)。我必须为下面详细提到的这个问题编写标准sql查询 student_id date duration(in hours) 1 2020-05-10 7 2 2020-05-10 8 3 2020-05-10 8 1 2020-05-11 8 2 2020-05-11 7 3

例如,我在大查询中有如下表(按日期分区)。我必须为下面详细提到的这个问题编写标准sql查询

student_id     date      duration(in hours)
  1          2020-05-10   7             
  2          2020-05-10   8
  3          2020-05-10   8
  1          2020-05-11   8
  2          2020-05-11   7
  3          2020-05-12   6
这是我们几乎每天都在添加数据的表,所以数据增长非常快。 我必须在过去六个月内找到连续四周出勤时间超过7小时的学生ID(每天检查一次,最近几个月本周将增加1周),并将学生类型转换为优秀学生。例如,在编程语言中

for(start week->1 - end_week-> 4 till last six months):
      if duration >=7 for date
        boolean true
      start_week = 2 //start week is incremented by 1 week for next loop
      end_week = 5

对于任何学生来说,如果过去六个月的任何连续4周数据持续时间大于等于7小时,则他是好学生。这对我来说似乎很有挑战性,因为我在bigquery和mysql方面的成绩一般。我不知道如何做到这一点。

如果我理解正确,请将日期截短为周并进行合计。然后使用窗口函数获取所需的标志并进行筛选:

select t.*
from (select student_id, date_trunc(date, week) as wk, sum(duration) as dur,
             min(sum(dur)) over (partition by student_id
                                 order by unix_date(min(date_trunc(date, week)))
                                 range between 21 preceding and current row
                                ) as min_4week_dur
             min(min(date_trunc(date, week))) over (partition by student_id) as min_wk
      from t
      group by 1, 2
     ) t
where datediff(min_wk, wk, week) >= 3 and
      min_4week_dur > 7;
这两个关键思想是:

  • 最小持续时间是计算运行四周期间的最小每周持续时间
  • 有效学生仅在第四周或之后才有资格申请

    • 这里是您的用例示例

      # Only for initiate the test with your data
      with sample as (
        select 1 as ID,  DATE("2020-05-10") as d, 7 as hour
        union all             
        select 2 as ID,  DATE("2020-05-10") as d, 8 as hour
        union all
        select 3 as ID,  DATE("2020-05-10") as d, 8 as hour
        union all
        select 1 as ID,  DATE("2020-05-11") as d, 8 as hour
        union all
        select 2 as ID,  DATE("2020-05-11") as d, 7 as hour
        union all
        select 3 as ID,  DATE("2020-05-12") as d, 6 as hour
      ), 
      # Create an array of date to take into account the missing days (important for the sum over the 28 previous days)
      date_array as (
        select  dd from UNNEST(GENERATE_DATE_ARRAY('2020-05-10', '2020-05-15', INTERVAL 1 DAY)) dd
      ), 
      # Product of existing IDs and possible date on the range
      data_grid as (
        select distinct ID, dd from sample, date_array
      ), 
      # Perform a right outer join to add missing date to the logs that you have in your sample data
      merged_data as (
      select data_grid.ID,d,hour,dd from sample RIGHT outer join data_grid on sample.d = data_grid.dd and sample.ID = data_grid.ID
      )
      # Sum per ID the 27 previous day in sliding windows (every day, the day and the last 27 are added)
      select ID,dd, SUM(hour)
        OVER (
          PARTITION BY ID
          ORDER BY dd
          ROWS BETWEEN 27 PRECEDING AND CURRENT ROW
        ) AS total_purchases
        from merged_data 
      
      查找连续4周出勤时间超过7小时的学生ID(过去几个月每天检查一次,本周增加一周)。在过去六个月内,并将学生转换为优秀学生

      下面是BigQuery标准SQL

      #standardSQL
      SELECT * EXCEPT(duration_4_weeks, qualify_for_6_month_condition),
        IF(qualify_for_6_month_condition AND
          MAX(duration_4_weeks) OVER(PARTITION BY student_id) >= 7, 
          'good student', 
          NULL
        ) type
      FROM (
        SELECT *, 
          SUM(duration) OVER(
            PARTITION BY student_id 
            ORDER BY UNIX_DATE(date) 
            RANGE BETWEEN 27 PRECEDING AND CURRENT ROW
          ) duration_4_weeks, 
          date > DATE_SUB(CURRENT_DATE(), INTERVAL 6 MONTH) qualify_for_6_month_condition
        FROM `project.dataset.table`
      )
      

      感谢您的回复,其中一个问题是ID可以是多个而不是有限的。我的意思是它可以是100个或更多。在这种情况下,我们必须执行多个union allNo,union all在开始时只是模拟您的源数据。在真实数据库中执行查询以获取这些数据!谢谢,我正在尝试。它不会在大查询控制台中显示任何结果。为什么我们在四周内使用21行之前和当前行而不是28行。如果可能的话,请您解释一下这个查询。这个查询使用21,因为4周是前三周加上当前一周。在不使用外部
      where
      子句的情况下运行查询,以查看子查询返回的内容。如果我在注释Hanks commenting--datediff(min_wk,wk,week)>=3,并且持续时间超过300(每天表中的实际持续时间),则表示感谢然后结果连续一周出现一些问题,第一周没有2020-05-03的任何日期,该id的实际日期从2020-05-05开始。学生id,wk,dur,min_4 Week_dur,min_wk 12020-05-03171017102020-05-03 12020-05-1014001400202-05-03 12020-05-1724251400202-05-03 12020-05-24223614002020-05-03 12020-05-312309140020-05-03 12020-06-0712861286202-05-03 12020-09-0648183482020-05-03@akashkumar . . . 如果希望连续几周,请立即使用查询中的逻辑。如果您想查看最近4周的数据,请使用
      行编号()
      。感谢您根据需要稍加修改。