Sql 基于在另一个表中配置的时间窗口期间在事件表上聚合

Sql 基于在另一个表中配置的时间窗口期间在事件表上聚合,sql,join,google-bigquery,aggregate-functions,feature-extraction,Sql,Join,Google Bigquery,Aggregate Functions,Feature Extraction,我有三个表,UpEvent、DownEvent和AnalysisWindow UpEvent: up_event_id | event_date | EventMetric 1 2015-01-01T06:00:00 54 2 2015-01-01T07:30:00 76 DownEvent: down_event_id | event_date | EventMetric

我有三个表,UpEvent、DownEvent和AnalysisWindow

UpEvent:
up_event_id | event_date            |  EventMetric
1              2015-01-01T06:00:00       54
2              2015-01-01T07:30:00       76

DownEvent:
down_event_id | event_date          |  EventMetric
1              2015-01-01T06:46:00         22
2              2015-01-01T07:33:00         34

AnalysisWindow:
window_id |        win_start           |    win_end
1              2015-01-01T00:00:00       2015-01-01T04:00:00
2              2015-01-01T00:00:00       2015-01-01T08:00:00
.
.
我希望在每个AnalysisWindow中进行分析,以便聚合在定义的窗口之间发生的UpEvent和DownEvent

因此,对于每个AnalysisWindow记录,我将得到一个功能行:

WinStart             |  WinEnd               |   TotalUpEvents  |  TotalDownEvents
2015-01-01T00:00:00    2015-01-01T04:00:00         0                  0
2015-01-01T00:00:00    2015-01-01T08:00:00         2                  2
我的第一个想法是

select win.win_start, 
       win.win_end, 
       count(ue.*), 
       sum(ue.EventMetric) 
from AnalysisWindow win
left join UpEvent ue on (ue.event_date between win.win_start and win.win_end)
这显然不起作用


我处理这个问题的方法有误吗?我想对我配置的各个窗口中的表进行窗口分析,并在每个窗口中获得一条聚合记录

一种方法使用相关子查询:

select aw.*,
       (select count(*)
        from UpEvent ue
        where ue.event_date between aw.win_start and aw.win_end)
       ) as ups,
       (select count(*)
        from DownEvent de
        where de.event_date between aw.win_start and aw.win_end)
       ) as downs
from AnalysisWindow aw;
上述工程,至少在制定为:

with UpEvent as (
      select 1 as up_event_id, '2015-01-01T06:00:00' as event_date, 54 as EventMetric union all
      select 2, '2015-01-01T07:30:00', 76
     ),
     DownEvent as (
      select 1 as down_event_id, '2015-01-01T06:46:00' as event_date, 22 as EventMetric union all
      select 2, '2015-01-01T07:33:00', 34
     ),
     AnalysisWindow as (
      select 1 as window_id , '2015-01-01T00:00:00' as win_start, '2015-01-01T04:00:00' as win_end union all
      select 2, '2015-01-01T00:00:00', '2015-01-01T08:00:00'
     )
select aw.*,
       (select count(*)
        from UpEvent ue
        where ue.event_date between aw.win_start and aw.win_end
       ) as ups,
       (select count(*)
        from DownEvent de
        where de.event_date between aw.win_start and aw.win_end
       ) as downs
from AnalysisWindow aw;
另一种方法是使用
union all

 ud as (
  select event_date, 1 as ups, 0 as downs from upevent
  union all
  select event_date, 0 as ups, 1 as downs from downevent
 )
select aw.window_id, aw.win_start, aw.win_end, sum(ups), sum(downs)
from AnalysisWindow aw join
     ud
     ON ud.event_date between aw.win_start and aw.win_end
group by aw.window_id, aw.win_start, aw.win_end
union all
select aw.window_id, aw.win_start, aw.win_end, 0, 0
from AnalysisWindow aw
where not exists (select 1 from ud where ud.event_date between aw.win_start and aw.win_end)

一种方法使用相关子查询:

select aw.*,
       (select count(*)
        from UpEvent ue
        where ue.event_date between aw.win_start and aw.win_end)
       ) as ups,
       (select count(*)
        from DownEvent de
        where de.event_date between aw.win_start and aw.win_end)
       ) as downs
from AnalysisWindow aw;
上述工程,至少在制定为:

with UpEvent as (
      select 1 as up_event_id, '2015-01-01T06:00:00' as event_date, 54 as EventMetric union all
      select 2, '2015-01-01T07:30:00', 76
     ),
     DownEvent as (
      select 1 as down_event_id, '2015-01-01T06:46:00' as event_date, 22 as EventMetric union all
      select 2, '2015-01-01T07:33:00', 34
     ),
     AnalysisWindow as (
      select 1 as window_id , '2015-01-01T00:00:00' as win_start, '2015-01-01T04:00:00' as win_end union all
      select 2, '2015-01-01T00:00:00', '2015-01-01T08:00:00'
     )
select aw.*,
       (select count(*)
        from UpEvent ue
        where ue.event_date between aw.win_start and aw.win_end
       ) as ups,
       (select count(*)
        from DownEvent de
        where de.event_date between aw.win_start and aw.win_end
       ) as downs
from AnalysisWindow aw;
另一种方法是使用
union all

 ud as (
  select event_date, 1 as ups, 0 as downs from upevent
  union all
  select event_date, 0 as ups, 1 as downs from downevent
 )
select aw.window_id, aw.win_start, aw.win_end, sum(ups), sum(downs)
from AnalysisWindow aw join
     ud
     ON ud.event_date between aw.win_start and aw.win_end
group by aw.window_id, aw.win_start, aw.win_end
union all
select aw.window_id, aw.win_start, aw.win_end, 0, 0
from AnalysisWindow aw
where not exists (select 1 from ud where ud.event_date between aw.win_start and aw.win_end)

下面是BigQuery标准SQL(实际上是有效的!)


下面是BigQuery标准SQL(实际上是有效的!)


这将产生
左外联接,如果没有联接两侧字段相等的条件,则不能使用该联接(@MikhailBerlyant…你知道相关子查询何时起作用和何时不起作用的规则吗?我认为这是一条规则-
子查询可以引用源自外部查询的相关列-否则它们将被解析为左连接,这就是字段相等要求出现的地方g这是一条确切的规则——但应该接近真实——你怎么想?@MikhailBerlyant……我的答案中的两个查询都有效,至少在运行(在样本数据上)的意义上是如此这就是诀窍——如果你在CTE中使用硬编码数据,它会起作用——但如果你使用实际的表,它就不会起作用——因此我的评论仍然有效,我认为这将产生
左外连接,如果没有连接两侧字段相等的条件,就不能使用左外连接(@MikhailBerlyant…你知道相关子查询何时起作用和何时不起作用的规则吗?我认为这是一条规则-
子查询可以引用源自外部查询的相关列-否则它们将被解析为左连接,这就是字段相等要求出现的地方g这是一条确切的规则——但应该接近真实——你怎么想?@MikhailBerlyant……我的答案中的两个查询都有效,至少在运行(在样本数据上)的意义上是如此这就是诀窍——如果你在CTE中使用硬编码数据,它会起作用——但如果你使用实际的表格,它就不会起作用——所以我认为我的评论仍然有效