Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/sql/72.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Sql 按ID BigQuery计算重叠间隔_Sql_Google Bigquery - Fatal编程技术网

Sql 按ID BigQuery计算重叠间隔

Sql 按ID BigQuery计算重叠间隔,sql,google-bigquery,Sql,Google Bigquery,我想根据ID计算我有多少重叠间隔 WITH table AS ( SELECT 1001 as id, 1 AS start_time, 10 AS end_time UNION ALL SELECT 1001, 2, 5 UNION ALL SELECT 1002, 3, 4 UNION ALL SELECT 1003, 5, 8 UNION ALL SELECT 1003, 6, 8 UNION ALL SELECT 1001, 6, 20 ) 在这种情况下,预

我想根据ID计算我有多少重叠间隔

WITH table AS (
  SELECT 1001 as id, 1 AS start_time, 10 AS end_time UNION ALL
  SELECT 1001, 2, 5 UNION ALL
  SELECT 1002, 3, 4 UNION ALL
  SELECT 1003, 5, 8 UNION ALL
  SELECT 1003, 6, 8 UNION ALL
  SELECT 1001, 6, 20 
)

在这种情况下,预期结果应为:

2 overlapping for ID=1001
1 overlapping for ID=1003
0 overlapping for ID=1002
TOT OVERLAPPING = 3
每当有重叠(甚至部分重叠)时,我都需要将其计算为重叠


如何在BigQuery中实现这一点?

下面是针对BigQuery标准SQL的,并且是简单而直接的自连接、检查和计算重叠

#standardSQL
SELECT a.id, 
  COUNTIF(
    a.start_time BETWEEN b.start_time AND b.end_time
    OR a.end_time BETWEEN b.start_time AND b.end_time
    OR b.start_time BETWEEN a.start_time AND a.end_time
    OR b.end_time BETWEEN a.start_time AND a.end_time
  ) overlaps
FROM `project.dataset.table` a
LEFT JOIN `project.dataset.table` b
ON a.id = b.id AND TO_JSON_STRING(a) < TO_JSON_STRING(b)
GROUP BY id
另一个选项(为了避免使用分析功能而进行自连接)


显然,与上一版本相同的结果/输出下面的是BigQuery标准SQL,并且是简单而直接的自连接、检查和计算重叠

#standardSQL
SELECT a.id, 
  COUNTIF(
    a.start_time BETWEEN b.start_time AND b.end_time
    OR a.end_time BETWEEN b.start_time AND b.end_time
    OR b.start_time BETWEEN a.start_time AND a.end_time
    OR b.end_time BETWEEN a.start_time AND a.end_time
  ) overlaps
FROM `project.dataset.table` a
LEFT JOIN `project.dataset.table` b
ON a.id = b.id AND TO_JSON_STRING(a) < TO_JSON_STRING(b)
GROUP BY id
另一个选项(为了避免使用分析功能而进行自连接)

显然,对于与前一版本相同的结果/输出,所有重叠的逻辑比较开始和结束时间:

SELECT t1.id, 
       COUNTIF(t1.end_time > t2.start_time AND t2.start_time < t1.end_time) as num_overlaps
FROM `project.dataset.table` t1 LEFT JOIN
     `project.dataset.table` t2
     ON t1.id = t2.id 
GROUP BY t1.id;
所有重叠的逻辑比较开始和结束时间:

SELECT t1.id, 
       COUNTIF(t1.end_time > t2.start_time AND t2.start_time < t1.end_time) as num_overlaps
FROM `project.dataset.table` t1 LEFT JOIN
     `project.dataset.table` t2
     ON t1.id = t2.id 
GROUP BY t1.id;

谢谢我现在正在我的真实数据(约20GB)上测试它,但这需要时间。谢谢米哈伊尔,它工作起来很有魅力!我对
TO_JSON_STRING(a)
有点困惑。这相当于
a.start\u time
,还是我不明白的更多?@David-如果有不同的开始时间,这类似于a.start\u time到\u JSON\u STRING(a)<到\u JSON\u STRING(b)保证不会将两行连接到两行,感谢您的澄清!当然,如果还没有投票,请考虑:O)谢谢!我现在正在我的真实数据(约20GB)上测试它,但这需要时间。谢谢米哈伊尔,它工作起来很有魅力!我对
TO_JSON_STRING(a)
有点困惑。这相当于
a.start\u time
,还是我不明白的更多?@David-如果有不同的开始时间,这类似于a.start\u time到\u JSON\u STRING(a)<到\u JSON\u STRING(b)保证不会将两行连接到两行,感谢您的澄清!当然,如果还没有答案,请考虑投票:O)
with t as (
      select t.*, row_number() over (partition by id order by start_time) as seqnum
      from `project.dataset.table` t
     )
SELECT t1.id, 
       COUNTIF(t1.end_time > t2.start_time AND t2.start_time < t1.end_time) as num_overlaps
FROM t t1 LEFT JOIN
     t t2
     ON t1.id = t2.id AND t1.seqnum < t2.seqnum
GROUP BY t1.id;