SQL(BigQuery)每天分组运行时

SQL(BigQuery)每天分组运行时,sql,google-bigquery,Sql,Google Bigquery,我有以下数据,我想在BigQuery中将其分组为每天几秒钟。 源表: +--------------+---------------------+---------------------+ | ComputerName | StartDatetime | EndDatetime | +--------------+---------------------+---------------------+ | Computer1 | 2020-06-10T21:0

我有以下数据,我想在BigQuery中将其分组为每天几秒钟。 源表:

+--------------+---------------------+---------------------+ | ComputerName | StartDatetime | EndDatetime | +--------------+---------------------+---------------------+ | Computer1 | 2020-06-10T21:01:28 | 2020-06-10T21:20:19 | +--------------+---------------------+---------------------+ | Computer1 | 2020-06-10T22:54:01 | 2020-06-11T05:21:48 | +--------------+---------------------+---------------------+ | Computer2 | 2020-06-08T09:11:54 | 2020-06-10T11:36:27 | +--------------+---------------------+---------------------+ 我希望能够以以下方式可视化数据

+------------+--------------+------------------+ | Date | ComputerName | Runtime(Seconds) | +------------+--------------+------------------+ | 2020-10-10 | Computer1 | 5089 | +------------+--------------+------------------+ | 2020-10-11 | Computer1 | 19308 | +------------+--------------+------------------+ | 2020-10-08 | Computer2 | 53285 | +------------+--------------+------------------+ | 2020-10-09 | Computer2 | 86400 | +------------+--------------+------------------+ | 2020-10-10 | Computer2 | 41787 | +------------+--------------+------------------+
我不太确定我应该如何处理这个问题。如果您能提供一些信息,我们将不胜感激。

这是一个区间重叠问题。您可以通过将每个时间段拆分为单独的几天,然后查看每天的重叠来解决此问题:

with t as (
      select 'Computer1' as computername, datetime '2020-06-10T21:01:28' as startdatetime, datetime '2020-06-10T21:20:19' as enddatetime union all
      select 'Computer1' as computername, datetime '2020-06-10T22:54:01' as startdatetime, datetime '2020-06-11T05:21:48' as enddatetime union all
      select 'Computer2' as computername, datetime '2020-06-08T09:11:54' as startdatetime, datetime '2020-06-10T11:36:27' as enddatetime 
     )
select dte, t.computername,
       sum(case when enddatetime >= dte and
                     startdatetime < date_add(dte, interval 1 day)
                then datetime_diff(least(date_add(dte, interval 1 day), enddatetime), 
                                   greatest(dte, startdatetime),
                                   second)
           end) as runtime_seconds
from (select t.*, 
             generate_date_array(date(t.startdatetime), date(t.enddatetime), interval 1 day) gda
      from t 
     ) t cross join
     unnest(gda) dte
group by dte, t.computername;

下面是BigQuery标准SQL

#standardSQL
select Date, ComputerName, 
  sum(datetime_diff(
    least(datetime (Date + 1), EndDatetime),
    greatest(datetime(Date), StartDatetime),
    second
  )) as Runtime_Seconds
from `project.dataset.table`,
unnest(generate_date_array(date(StartDatetime), date(EndDatetime))) Date
group by Date, ComputerName
如果要应用于问题中的样本数据,请参见下面的示例

#standardSQL
with `project.dataset.table` as (
  select 'Computer1' ComputerName, datetime '2020-06-10T21:01:28' StartDatetime, datetime '2020-06-10T21:20:19' EndDatetime union all
  select 'Computer1', '2020-06-10T22:54:01', '2020-06-11T05:21:48' union all
  select 'Computer2', '2020-06-08T09:11:54', '2020-06-10T11:36:27' 
)
select Date, ComputerName, 
  sum(datetime_diff(
    least(datetime (Date + 1), EndDatetime),
    greatest(datetime(Date), StartDatetime),
    second
  )) as Runtime_Seconds
from `project.dataset.table`,
unnest(generate_date_array(date(StartDatetime), date(EndDatetime))) Date
group by Date, ComputerName
输出为


BigQuery标准SQL的另一个选项

#standardSQL
select Date, ComputerName, 
  sum(datetime_diff(
    least(datetime (Date + 1), EndDatetime),
    greatest(datetime(Date), StartDatetime),
    second
  )) as Runtime_Seconds
from `project.dataset.table`,
unnest(generate_date_array(date(StartDatetime), date(EndDatetime))) Date
group by Date, ComputerName
直截了当、有点愚蠢、几乎没有逻辑的选择,只是愚蠢地在各自的日子里数秒——对我来说仍然是一个选择

#standardSQL
select Date, ComputerName, 
    countif(second >= timestamp(StartDatetime) and second < timestamp(EndDatetime)) as Runtime_Seconds
from `project.dataset.table`,
unnest(generate_date_array(date(StartDatetime), date(EndDatetime))) Date,
unnest(generate_timestamp_array(timestamp(Date + 1), timestamp(Date), interval -1 second)) second with offset
where offset > 0
group by Date, ComputerName
如果应用于问题中的样本数据,则输出为