Sql 按ID BigQuery计算重叠间隔
我想根据ID计算我有多少重叠间隔Sql 按ID BigQuery计算重叠间隔,sql,google-bigquery,Sql,Google Bigquery,我想根据ID计算我有多少重叠间隔 WITH table AS ( SELECT 1001 as id, 1 AS start_time, 10 AS end_time UNION ALL SELECT 1001, 2, 5 UNION ALL SELECT 1002, 3, 4 UNION ALL SELECT 1003, 5, 8 UNION ALL SELECT 1003, 6, 8 UNION ALL SELECT 1001, 6, 20 ) 在这种情况下,预
WITH table AS (
SELECT 1001 as id, 1 AS start_time, 10 AS end_time UNION ALL
SELECT 1001, 2, 5 UNION ALL
SELECT 1002, 3, 4 UNION ALL
SELECT 1003, 5, 8 UNION ALL
SELECT 1003, 6, 8 UNION ALL
SELECT 1001, 6, 20
)
在这种情况下,预期结果应为:
2 overlapping for ID=1001
1 overlapping for ID=1003
0 overlapping for ID=1002
TOT OVERLAPPING = 3
每当有重叠(甚至部分重叠)时,我都需要将其计算为重叠
如何在BigQuery中实现这一点?下面是针对BigQuery标准SQL的,并且是简单而直接的自连接、检查和计算重叠
#standardSQL
SELECT a.id,
COUNTIF(
a.start_time BETWEEN b.start_time AND b.end_time
OR a.end_time BETWEEN b.start_time AND b.end_time
OR b.start_time BETWEEN a.start_time AND a.end_time
OR b.end_time BETWEEN a.start_time AND a.end_time
) overlaps
FROM `project.dataset.table` a
LEFT JOIN `project.dataset.table` b
ON a.id = b.id AND TO_JSON_STRING(a) < TO_JSON_STRING(b)
GROUP BY id
另一个选项(为了避免使用分析功能而进行自连接)
显然,与上一版本相同的结果/输出下面的是BigQuery标准SQL,并且是简单而直接的自连接、检查和计算重叠
#standardSQL
SELECT a.id,
COUNTIF(
a.start_time BETWEEN b.start_time AND b.end_time
OR a.end_time BETWEEN b.start_time AND b.end_time
OR b.start_time BETWEEN a.start_time AND a.end_time
OR b.end_time BETWEEN a.start_time AND a.end_time
) overlaps
FROM `project.dataset.table` a
LEFT JOIN `project.dataset.table` b
ON a.id = b.id AND TO_JSON_STRING(a) < TO_JSON_STRING(b)
GROUP BY id
另一个选项(为了避免使用分析功能而进行自连接)
显然,对于与前一版本相同的结果/输出,所有重叠的逻辑比较开始和结束时间:
SELECT t1.id,
COUNTIF(t1.end_time > t2.start_time AND t2.start_time < t1.end_time) as num_overlaps
FROM `project.dataset.table` t1 LEFT JOIN
`project.dataset.table` t2
ON t1.id = t2.id
GROUP BY t1.id;
所有重叠的逻辑比较开始和结束时间:
SELECT t1.id,
COUNTIF(t1.end_time > t2.start_time AND t2.start_time < t1.end_time) as num_overlaps
FROM `project.dataset.table` t1 LEFT JOIN
`project.dataset.table` t2
ON t1.id = t2.id
GROUP BY t1.id;
谢谢我现在正在我的真实数据(约20GB)上测试它,但这需要时间。谢谢米哈伊尔,它工作起来很有魅力!我对
TO_JSON_STRING(a)
有点困惑。这相当于a.start\u time
,还是我不明白的更多?@David-如果有不同的开始时间,这类似于a.start\u time到\u JSON\u STRING(a)<到\u JSON\u STRING(b)
保证不会将两行连接到两行,感谢您的澄清!当然,如果还没有投票,请考虑:O)谢谢!我现在正在我的真实数据(约20GB)上测试它,但这需要时间。谢谢米哈伊尔,它工作起来很有魅力!我对TO_JSON_STRING(a)
有点困惑。这相当于a.start\u time
,还是我不明白的更多?@David-如果有不同的开始时间,这类似于a.start\u time到\u JSON\u STRING(a)<到\u JSON\u STRING(b)
保证不会将两行连接到两行,感谢您的澄清!当然,如果还没有答案,请考虑投票:O)
with t as (
select t.*, row_number() over (partition by id order by start_time) as seqnum
from `project.dataset.table` t
)
SELECT t1.id,
COUNTIF(t1.end_time > t2.start_time AND t2.start_time < t1.end_time) as num_overlaps
FROM t t1 LEFT JOIN
t t2
ON t1.id = t2.id AND t1.seqnum < t2.seqnum
GROUP BY t1.id;