Sql 基于在另一个表中配置的时间窗口期间在事件表上聚合
我有三个表,UpEvent、DownEvent和AnalysisWindowSql 基于在另一个表中配置的时间窗口期间在事件表上聚合,sql,join,google-bigquery,aggregate-functions,feature-extraction,Sql,Join,Google Bigquery,Aggregate Functions,Feature Extraction,我有三个表,UpEvent、DownEvent和AnalysisWindow UpEvent: up_event_id | event_date | EventMetric 1 2015-01-01T06:00:00 54 2 2015-01-01T07:30:00 76 DownEvent: down_event_id | event_date | EventMetric
UpEvent:
up_event_id | event_date | EventMetric
1 2015-01-01T06:00:00 54
2 2015-01-01T07:30:00 76
DownEvent:
down_event_id | event_date | EventMetric
1 2015-01-01T06:46:00 22
2 2015-01-01T07:33:00 34
AnalysisWindow:
window_id | win_start | win_end
1 2015-01-01T00:00:00 2015-01-01T04:00:00
2 2015-01-01T00:00:00 2015-01-01T08:00:00
.
.
我希望在每个AnalysisWindow中进行分析,以便聚合在定义的窗口之间发生的UpEvent和DownEvent
因此,对于每个AnalysisWindow记录,我将得到一个功能行:
WinStart | WinEnd | TotalUpEvents | TotalDownEvents
2015-01-01T00:00:00 2015-01-01T04:00:00 0 0
2015-01-01T00:00:00 2015-01-01T08:00:00 2 2
我的第一个想法是
select win.win_start,
win.win_end,
count(ue.*),
sum(ue.EventMetric)
from AnalysisWindow win
left join UpEvent ue on (ue.event_date between win.win_start and win.win_end)
这显然不起作用
我处理这个问题的方法有误吗?我想对我配置的各个窗口中的表进行窗口分析,并在每个窗口中获得一条聚合记录一种方法使用相关子查询:
select aw.*,
(select count(*)
from UpEvent ue
where ue.event_date between aw.win_start and aw.win_end)
) as ups,
(select count(*)
from DownEvent de
where de.event_date between aw.win_start and aw.win_end)
) as downs
from AnalysisWindow aw;
上述工程,至少在制定为:
with UpEvent as (
select 1 as up_event_id, '2015-01-01T06:00:00' as event_date, 54 as EventMetric union all
select 2, '2015-01-01T07:30:00', 76
),
DownEvent as (
select 1 as down_event_id, '2015-01-01T06:46:00' as event_date, 22 as EventMetric union all
select 2, '2015-01-01T07:33:00', 34
),
AnalysisWindow as (
select 1 as window_id , '2015-01-01T00:00:00' as win_start, '2015-01-01T04:00:00' as win_end union all
select 2, '2015-01-01T00:00:00', '2015-01-01T08:00:00'
)
select aw.*,
(select count(*)
from UpEvent ue
where ue.event_date between aw.win_start and aw.win_end
) as ups,
(select count(*)
from DownEvent de
where de.event_date between aw.win_start and aw.win_end
) as downs
from AnalysisWindow aw;
另一种方法是使用union all
:
ud as (
select event_date, 1 as ups, 0 as downs from upevent
union all
select event_date, 0 as ups, 1 as downs from downevent
)
select aw.window_id, aw.win_start, aw.win_end, sum(ups), sum(downs)
from AnalysisWindow aw join
ud
ON ud.event_date between aw.win_start and aw.win_end
group by aw.window_id, aw.win_start, aw.win_end
union all
select aw.window_id, aw.win_start, aw.win_end, 0, 0
from AnalysisWindow aw
where not exists (select 1 from ud where ud.event_date between aw.win_start and aw.win_end)
一种方法使用相关子查询:
select aw.*,
(select count(*)
from UpEvent ue
where ue.event_date between aw.win_start and aw.win_end)
) as ups,
(select count(*)
from DownEvent de
where de.event_date between aw.win_start and aw.win_end)
) as downs
from AnalysisWindow aw;
上述工程,至少在制定为:
with UpEvent as (
select 1 as up_event_id, '2015-01-01T06:00:00' as event_date, 54 as EventMetric union all
select 2, '2015-01-01T07:30:00', 76
),
DownEvent as (
select 1 as down_event_id, '2015-01-01T06:46:00' as event_date, 22 as EventMetric union all
select 2, '2015-01-01T07:33:00', 34
),
AnalysisWindow as (
select 1 as window_id , '2015-01-01T00:00:00' as win_start, '2015-01-01T04:00:00' as win_end union all
select 2, '2015-01-01T00:00:00', '2015-01-01T08:00:00'
)
select aw.*,
(select count(*)
from UpEvent ue
where ue.event_date between aw.win_start and aw.win_end
) as ups,
(select count(*)
from DownEvent de
where de.event_date between aw.win_start and aw.win_end
) as downs
from AnalysisWindow aw;
另一种方法是使用union all
:
ud as (
select event_date, 1 as ups, 0 as downs from upevent
union all
select event_date, 0 as ups, 1 as downs from downevent
)
select aw.window_id, aw.win_start, aw.win_end, sum(ups), sum(downs)
from AnalysisWindow aw join
ud
ON ud.event_date between aw.win_start and aw.win_end
group by aw.window_id, aw.win_start, aw.win_end
union all
select aw.window_id, aw.win_start, aw.win_end, 0, 0
from AnalysisWindow aw
where not exists (select 1 from ud where ud.event_date between aw.win_start and aw.win_end)
下面是BigQuery标准SQL(实际上是有效的!)
下面是BigQuery标准SQL(实际上是有效的!)
这将产生
左外联接,如果没有联接两侧字段相等的条件,则不能使用该联接(@MikhailBerlyant…你知道相关子查询何时起作用和何时不起作用的规则吗?我认为这是一条规则-子查询可以引用源自外部查询的相关列-否则它们将被解析为左连接,这就是字段相等要求出现的地方g这是一条确切的规则——但应该接近真实——你怎么想?@MikhailBerlyant……我的答案中的两个查询都有效,至少在运行(在样本数据上)的意义上是如此这就是诀窍——如果你在CTE中使用硬编码数据,它会起作用——但如果你使用实际的表,它就不会起作用——因此我的评论仍然有效,我认为这将产生左外连接,如果没有连接两侧字段相等的条件,就不能使用左外连接(@MikhailBerlyant…你知道相关子查询何时起作用和何时不起作用的规则吗?我认为这是一条规则-子查询可以引用源自外部查询的相关列-否则它们将被解析为左连接,这就是字段相等要求出现的地方g这是一条确切的规则——但应该接近真实——你怎么想?@MikhailBerlyant……我的答案中的两个查询都有效,至少在运行(在样本数据上)的意义上是如此这就是诀窍——如果你在CTE中使用硬编码数据,它会起作用——但如果你使用实际的表格,它就不会起作用——所以我认为我的评论仍然有效