Sql 重叠区间和长度_Sql_Amazon Redshift

Sql 重叠区间和长度

sql amazon-redshift

Sql 重叠区间和长度,sql,amazon-redshift,Sql,Amazon Redshift,我在红移数据库中有一个表，其中包含分组的间隔，这些间隔可能重叠，如下所示： | interval_id | l | u | group | | ----------- | -- | -- | ----- | | 1 | 1 | 10 | A | | 2 | 2 | 5 | A | | 3 | 5 | 15 | A | | 4 | 26 | 30 | B | | 5

我在红移数据库中有一个表，其中包含分组的间隔，这些间隔可能重叠，如下所示：

| interval_id | l  | u  | group |
| ----------- | -- | -- | ----- |
| 1           | 1  | 10 | A     |
| 2           | 2  | 5  | A     |
| 3           | 5  | 15 | A     |
| 4           | 26 | 30 | B     |
| 5           | 28 | 35 | B     |
| 6           | 30 | 31 | B     |
| 7           | 44 | 45 | B     |
| 8           | 56 | 58 | C     |

我想做的是确定组内间隔并集的长度。也就是说，对于每个区间，取u-l，对所有组成员求和，然后减去区间之间重叠的长度

预期结果：

| group | length |
| ----- | ------ |
| A     | 14     |
| B     | 10     |
| C     | 2      |

这个问题，唉，该线程中的所有解决方案似乎都使用了Redshift不支持的功能。

这并不困难，但需要多个步骤。关键是定义每个组中的孤岛，然后在这些孤岛上进行聚合。许多子查询、聚合和窗口函数

select groupId, sum(ul)
from (select groupId, (max(u) - min(l) + 1) as ul
      from (select t.*,
                   sum(case when prev_max_u < l then 1 else 0 end) over (order by l) as grp
            from (select t.*,
                         max(u) over (order by l rows between unbounded preceding and 1 preceding) as prev_max_u
                  from t
                 ) t
           ) t
      group by groupid, grp
     ) g
group by groupId;

其目的是确定每条记录的开头是否有重叠。为此，它对前面的所有记录使用累积最大值函数。然后，通过比较前一个最大值和当前l来确定是否存在重叠-重叠的累积总和定义了一个组

剩下的只是聚合。以及更多的聚合