PostgreSQL中连续日期的分组

PostgreSQL中连续日期的分组,postgresql,date,Postgresql,Date,我有两个表需要合并,因为有时有些日期在表A中,而不是在表B中,反之亦然。我期望的结果是,对于连续几天的重叠,可以合并 我正在使用PostgreSQL 表A 表B 期望结果 对。我有一个我认为有效的问题。它当然适用于您提供的样本记录。它使用递归CTE 首先,需要合并这两个表。接下来,使用递归CTE获得重叠日期的序列。最后,获取开始和结束日期,并连接回合并表以获取id with recursive allrecords as -- this merges the input tables. Add

我有两个表需要合并,因为有时有些日期在表A中,而不是在表B中,反之亦然。我期望的结果是,对于连续几天的重叠,可以合并

我正在使用PostgreSQL

表A

表B

期望结果


对。我有一个我认为有效的问题。它当然适用于您提供的样本记录。它使用递归CTE

首先,需要合并这两个表。接下来,使用递归CTE获得重叠日期的序列。最后,获取开始和结束日期,并连接回合并表以获取id

with recursive allrecords as -- this merges the input tables. Add a unique row identifier
(
    select *, row_number() over (ORDER BY startdate) as rowid from
    (select * from table1
    UNION
    select * from table2) a
),
 path as ( -- the recursive CTE. This gets the sequences
    select rowid as parent,rowid,startdate,enddate from allrecords a
    union 
    select p.parent,b.rowid,b.startdate,b.enddate from  allrecords b  join path p on (p.enddate + interval '1 day')>=b.startdate and p.startdate <= b.startdate
)


SELECT id,g.startdate,g.enddate FROM -- outer query to get the id

    -- inner query to get the start and end of each sequence
    (select parent,min(startdate) as startdate, max(enddate) as enddate from
    (
        select *, row_number() OVER (partition by rowid order by parent,startdate) as row_number from path
    ) a
    where row_number = 1 -- We only want the first occurrence of each record
    group by parent)g
INNER JOIN allrecords a on a.rowid = parent

下面的片段符合您的意愿。但它可能会非常慢。问题是,用标准方法无法检测不重叠的日期范围,因为一个范围可以分为两部分。 因此,我的代码执行以下操作:

将表A中的日期范围拆分为原子记录,每个记录有一个日期 [与表_b相同] 交叉连接这两个表,我们只对A_not_in_B和B_not_in_A感兴趣,记住它来自哪个L/R外部连接翼。 将结果记录重新聚合为日期范围。
你的意图不清楚。表_b中的id=102发生了什么事?@wildplasser它是从12/28到12/31的序列的一部分……因此,在比较记录时,使用日期或日期范围,但忽略id。那么输出中的id来自何处?@wildplasser编辑了结果id中的打字错误。对不起,我的错误。在这个例子中,id是一个特定的人。我做了一些调整,仍然将id包含在结果中,目前工作正常。我们将对日期组合进行更多测试。这一次我真是太感谢你了!
id  startdate   enddate
--------------------------
101 12/15/2013  12/15/2013
101 12/16/2013  12/16/2013
101 12/28/2013  12/28/2013
101 12/29/2013  12/31/2013
id  startdate   enddate
-------------------------
101 12/15/2013  12/16/2013
101 12/28/2013  12/31/2013
with recursive allrecords as -- this merges the input tables. Add a unique row identifier
(
    select *, row_number() over (ORDER BY startdate) as rowid from
    (select * from table1
    UNION
    select * from table2) a
),
 path as ( -- the recursive CTE. This gets the sequences
    select rowid as parent,rowid,startdate,enddate from allrecords a
    union 
    select p.parent,b.rowid,b.startdate,b.enddate from  allrecords b  join path p on (p.enddate + interval '1 day')>=b.startdate and p.startdate <= b.startdate
)


SELECT id,g.startdate,g.enddate FROM -- outer query to get the id

    -- inner query to get the start and end of each sequence
    (select parent,min(startdate) as startdate, max(enddate) as enddate from
    (
        select *, row_number() OVER (partition by rowid order by parent,startdate) as row_number from path
    ) a
    where row_number = 1 -- We only want the first occurrence of each record
    group by parent)g
INNER JOIN allrecords a on a.rowid = parent
-- EXPLAIN ANALYZE
-- 
WITH  RECURSIVE ranges AS (
            -- Chop up the a-table into atomic date units
    WITH ar AS (
            SELECT generate_series(a.startdate,a.enddate , '1day'::interval)::date  AS thedate
            ,  'A'::text AS which
            , a.id
            FROM a
            )
            -- Same for the b-table
    , br AS (
            SELECT generate_series(b.startdate,b.enddate, '1day'::interval)::date  AS thedate
            ,  'B'::text AS which
            , b.id
            FROM b
            )
            -- combine the two sets, retaining a_not_in_b plus b_not_in_a
    , moments AS (
            SELECT COALESCE(ar.id,br.id) AS id
            , COALESCE(ar.which, br.which) AS which
            , COALESCE(ar.thedate, br.thedate) AS thedate
            FROM ar
            FULL JOIN br ON br.id = ar.id AND br.thedate =  ar.thedate
            WHERE ar.id IS NULL OR br.id IS NULL
            )
            -- use a recursive CTE to re-aggregate the atomic moments into ranges
    SELECT m0.id, m0.which
            , m0.thedate AS startdate
            , m0.thedate AS enddate
    FROM moments m0
    WHERE NOT EXISTS ( SELECT * FROM moments nx WHERE nx.id = m0.id  AND nx.which = m0.which
            AND nx.thedate = m0.thedate -1
            )
    UNION ALL
    SELECT rr.id, rr.which
            , rr.startdate AS startdate
            , m1.thedate AS enddate
    FROM ranges rr
    JOIN moments m1 ON m1.id = rr.id AND m1.which = rr.which AND m1.thedate = rr.enddate +1
    )
SELECT * FROM ranges ra
WHERE NOT EXISTS (SELECT * FROM ranges nx
    -- suppress partial subassemblies
    WHERE nx.id = ra.id AND nx.which = ra.which
    AND nx.startdate = ra.startdate
    AND nx.enddate > ra.enddate
    )
 ;