PostgreSQL中连续日期的分组
我有两个表需要合并,因为有时有些日期在表A中,而不是在表B中,反之亦然。我期望的结果是,对于连续几天的重叠,可以合并 我正在使用PostgreSQL 表A 表B 期望结果PostgreSQL中连续日期的分组,postgresql,date,Postgresql,Date,我有两个表需要合并,因为有时有些日期在表A中,而不是在表B中,反之亦然。我期望的结果是,对于连续几天的重叠,可以合并 我正在使用PostgreSQL 表A 表B 期望结果 对。我有一个我认为有效的问题。它当然适用于您提供的样本记录。它使用递归CTE 首先,需要合并这两个表。接下来,使用递归CTE获得重叠日期的序列。最后,获取开始和结束日期,并连接回合并表以获取id with recursive allrecords as -- this merges the input tables. Add
对。我有一个我认为有效的问题。它当然适用于您提供的样本记录。它使用递归CTE 首先,需要合并这两个表。接下来,使用递归CTE获得重叠日期的序列。最后,获取开始和结束日期,并连接回合并表以获取id
with recursive allrecords as -- this merges the input tables. Add a unique row identifier
(
select *, row_number() over (ORDER BY startdate) as rowid from
(select * from table1
UNION
select * from table2) a
),
path as ( -- the recursive CTE. This gets the sequences
select rowid as parent,rowid,startdate,enddate from allrecords a
union
select p.parent,b.rowid,b.startdate,b.enddate from allrecords b join path p on (p.enddate + interval '1 day')>=b.startdate and p.startdate <= b.startdate
)
SELECT id,g.startdate,g.enddate FROM -- outer query to get the id
-- inner query to get the start and end of each sequence
(select parent,min(startdate) as startdate, max(enddate) as enddate from
(
select *, row_number() OVER (partition by rowid order by parent,startdate) as row_number from path
) a
where row_number = 1 -- We only want the first occurrence of each record
group by parent)g
INNER JOIN allrecords a on a.rowid = parent
下面的片段符合您的意愿。但它可能会非常慢。问题是,用标准方法无法检测不重叠的日期范围,因为一个范围可以分为两部分。 因此,我的代码执行以下操作: 将表A中的日期范围拆分为原子记录,每个记录有一个日期 [与表_b相同] 交叉连接这两个表,我们只对A_not_in_B和B_not_in_A感兴趣,记住它来自哪个L/R外部连接翼。 将结果记录重新聚合为日期范围。
你的意图不清楚。表_b中的id=102发生了什么事?@wildplasser它是从12/28到12/31的序列的一部分……因此,在比较记录时,使用日期或日期范围,但忽略id。那么输出中的id来自何处?@wildplasser编辑了结果id中的打字错误。对不起,我的错误。在这个例子中,id是一个特定的人。我做了一些调整,仍然将id包含在结果中,目前工作正常。我们将对日期组合进行更多测试。这一次我真是太感谢你了!
id startdate enddate
--------------------------
101 12/15/2013 12/15/2013
101 12/16/2013 12/16/2013
101 12/28/2013 12/28/2013
101 12/29/2013 12/31/2013
id startdate enddate
-------------------------
101 12/15/2013 12/16/2013
101 12/28/2013 12/31/2013
with recursive allrecords as -- this merges the input tables. Add a unique row identifier
(
select *, row_number() over (ORDER BY startdate) as rowid from
(select * from table1
UNION
select * from table2) a
),
path as ( -- the recursive CTE. This gets the sequences
select rowid as parent,rowid,startdate,enddate from allrecords a
union
select p.parent,b.rowid,b.startdate,b.enddate from allrecords b join path p on (p.enddate + interval '1 day')>=b.startdate and p.startdate <= b.startdate
)
SELECT id,g.startdate,g.enddate FROM -- outer query to get the id
-- inner query to get the start and end of each sequence
(select parent,min(startdate) as startdate, max(enddate) as enddate from
(
select *, row_number() OVER (partition by rowid order by parent,startdate) as row_number from path
) a
where row_number = 1 -- We only want the first occurrence of each record
group by parent)g
INNER JOIN allrecords a on a.rowid = parent
-- EXPLAIN ANALYZE
--
WITH RECURSIVE ranges AS (
-- Chop up the a-table into atomic date units
WITH ar AS (
SELECT generate_series(a.startdate,a.enddate , '1day'::interval)::date AS thedate
, 'A'::text AS which
, a.id
FROM a
)
-- Same for the b-table
, br AS (
SELECT generate_series(b.startdate,b.enddate, '1day'::interval)::date AS thedate
, 'B'::text AS which
, b.id
FROM b
)
-- combine the two sets, retaining a_not_in_b plus b_not_in_a
, moments AS (
SELECT COALESCE(ar.id,br.id) AS id
, COALESCE(ar.which, br.which) AS which
, COALESCE(ar.thedate, br.thedate) AS thedate
FROM ar
FULL JOIN br ON br.id = ar.id AND br.thedate = ar.thedate
WHERE ar.id IS NULL OR br.id IS NULL
)
-- use a recursive CTE to re-aggregate the atomic moments into ranges
SELECT m0.id, m0.which
, m0.thedate AS startdate
, m0.thedate AS enddate
FROM moments m0
WHERE NOT EXISTS ( SELECT * FROM moments nx WHERE nx.id = m0.id AND nx.which = m0.which
AND nx.thedate = m0.thedate -1
)
UNION ALL
SELECT rr.id, rr.which
, rr.startdate AS startdate
, m1.thedate AS enddate
FROM ranges rr
JOIN moments m1 ON m1.id = rr.id AND m1.which = rr.which AND m1.thedate = rr.enddate +1
)
SELECT * FROM ranges ra
WHERE NOT EXISTS (SELECT * FROM ranges nx
-- suppress partial subassemblies
WHERE nx.id = ra.id AND nx.which = ra.which
AND nx.startdate = ra.startdate
AND nx.enddate > ra.enddate
)
;