Sql 如何组合具有相同列的记录并扩展时间范围?

Sql 如何组合具有相同列的记录并扩展时间范围?,sql,postgresql,Sql,Postgresql,如何组合具有相同列的记录并扩展时间范围 示例表: id |日期|从|日期|到|参数1 |参数2 | ----|-----------|---------|--------|---------|-----|--- 1 | 2009 | 2010 |“A”|“A”| 1 | 2009 | 2010 |“A”|“A”| 1 | 2011 | 2012 |“A”|“A”| 1 | 2013 | 2014 |“B”|“B”| 1 | 2015 | 2016 |“A”|“A”| 1 | 2017 | 2

如何组合具有相同列的记录并扩展时间范围

示例表:


id |日期|从|日期|到|参数1 |参数2 |
----|-----------|---------|--------|---------|-----|--- 
1 | 2009 | 2010 |“A”|“A”|
1 | 2009 | 2010 |“A”|“A”|
1 | 2011 | 2012 |“A”|“A”|
1 | 2013 | 2014 |“B”|“B”|
1 | 2015 | 2016 |“A”|“A”|
1 | 2017 | 2018 |“A”|“A”|
必须如此:


id |日期|从|日期|到|参数1 |参数2
----|-----------|---------|--------|--------- 
1 | 2009 | 2012 |“A”|“A”
1 | 2013 | 2014 |“B”|“B”
1 | 2015 | 2018 |“A”|“A”
我尝试使用窗口函数,但不知道下一步该怎么做

SELECT ROW_NUMBER() OVER(partition BY id ORDER BY date_from ASC) AS step, 
       dense_rank() OVER(partition BY id ORDER BY param1, param2) AS rnk, 
       * 
FROM Table
您可以使用
lag()
&查找分组和执行聚合:

select id, min(date_from) as date_from, max(date_to) as date_to, param1, param2 
from (select t.*,
             sum(case when (date_from - prev_to) = 1 then 0 else 1 end) over (partition by id, param1, param2 order by date_from) as grp
      from (select t.*, 
                   lag(date_to) over (partition by id, param1, param2 order by date_from) as prev_to
            from table t
           ) t
     ) t
group by id,  param1, param2, grp
order by id, date_from;
1) 密集秩解:

with tbl_with_group as (
    select 
        t.*,
        dense_rank() over(partition BY id order BY date_from) - dense_rank() OVER(partition BY id, param1, param2 order by date_from) group_number
    from 
        table t
)
select 
    id, min(date_from), max(date_to), param1, param2
from 
    tbl_with_group 
group by 
    id, group_number, param1, param2
with tbl_with_delimiter as (
    select 
        t.*,
        case when 
            lag(param1) over(partition by id order by date_from) = param1 and
            lag(param2) over(partition by id order by date_from) = param2
        then 0 else 1 end is_new_group_start
    from 
        table t
),
tbl_with_group as (
    select 
        t.*,
        sum(is_new_group_start) over(partition by id order by date_from) group_number
    from 
        tbl_with_delimiter t
)
select 
    id, min(date_from), max(date_to), param1, param2
from 
    tbl_with_group 
group by 
    id, group_number, param1, param2
order by
    id, group_number
2) 滞后解决方案:

with tbl_with_group as (
    select 
        t.*,
        dense_rank() over(partition BY id order BY date_from) - dense_rank() OVER(partition BY id, param1, param2 order by date_from) group_number
    from 
        table t
)
select 
    id, min(date_from), max(date_to), param1, param2
from 
    tbl_with_group 
group by 
    id, group_number, param1, param2
with tbl_with_delimiter as (
    select 
        t.*,
        case when 
            lag(param1) over(partition by id order by date_from) = param1 and
            lag(param2) over(partition by id order by date_from) = param2
        then 0 else 1 end is_new_group_start
    from 
        table t
),
tbl_with_group as (
    select 
        t.*,
        sum(is_new_group_start) over(partition by id order by date_from) group_number
    from 
        tbl_with_delimiter t
)
select 
    id, min(date_from), max(date_to), param1, param2
from 
    tbl_with_group 
group by 
    id, group_number, param1, param2
order by
    id, group_number

在没有任何其他解释的情况下,我不得不假设“开始”和“结束”日期可以任意重叠或有间隔。为了解决这个问题,我建议扩展每个年份的数据,然后将其视为一个简单的差距和孤岛问题:

select id, min(date_from), max(date_to), param1, param2
from (select t.*,
             dense_rank() over (partition by param1, param2 order by gs.y) as seqnum
      from t cross join lateral
           generate_series(t.date_from, t.date_to, 1) as gs(y)
     ) t
group by id, (y - seqnum), param1, param2;

谢谢,但是否可以用lag()替换“按id划分、按日期划分、按日期划分、按日期划分、按参数1、按日期划分、按参数2、按日期划分、按日期划分”上的稠密等级()?@Sergey,用lag添加解决方案