如何在MySQL中获取一系列重叠事件
我有一张时间段重叠的桌子。我想对连续的重叠时间事件进行分组,即不以时间间隔分隔如何在MySQL中获取一系列重叠事件,mysql,sql,Mysql,Sql,我有一张时间段重叠的桌子。我想对连续的重叠时间事件进行分组,即不以时间间隔分隔 ID StartDate EndDate 1 2013-01-30 2013-01-31 2 2013-01-31 2013-01-31 3 2013-01-29 2013-01-31 4 2013-01-25 2013-01-28 5 2013-01-29 2013-01-30 6 2013-02-01 2013-02-01 7 2013-01-31
ID StartDate EndDate
1 2013-01-30 2013-01-31
2 2013-01-31 2013-01-31
3 2013-01-29 2013-01-31
4 2013-01-25 2013-01-28
5 2013-01-29 2013-01-30
6 2013-02-01 2013-02-01
7 2013-01-31 2013-02-02
8 2013-02-04 2013-02-05
9 2013-02-05 2013-02-06
10 2013-02-08 2013-02-09
01-24 01-25 01-26 01-27 01-28 01-29 01-30 01-31 02-01 02-02 02-03 02-04 02-05 02-06 02-07 02-08 02-09
1 --------------
2 -----
3 ---------------------
4 -----------------------------
5 ------------
6 -----
7 --------------------
8 -------------
9 -------------
10 --------------
因此,我希望有以下四个时间组:
第一组ID:1、2、3、5、6、7
第2组Id:4
第三组ID:8、9
第4组:Id:10
Sql中有没有一种简单的方法可以做到这一点?下面是我的示例表的创建sql:
DROP TABLE IF EXISTS tb_data_log;
CREATE TABLE tb_data_log (
`event_id` int(10) unsigned NOT NULL,
`startdate` date DEFAULT NULL,
`enddate` date DEFAULT NULL
);
INSERT INTO tb_data_log VALUES (1, '2013-01-30', '2013-01-31');
INSERT INTO tb_data_log VALUES (2, '2013-01-31', '2013-01-31');
INSERT INTO tb_data_log VALUES (3, '2013-01-29', '2013-01-31');
INSERT INTO tb_data_log VALUES (4, '2013-01-25', '2013-01-28');
INSERT INTO tb_data_log VALUES (5, '2013-01-29', '2013-01-30');
INSERT INTO tb_data_log VALUES (6, '2013-02-01', '2013-02-01');
INSERT INTO tb_data_log VALUES (7, '2013-01-31', '2013-02-02');
INSERT INTO tb_data_log VALUES (8, '2013-02-04', '2013-02-05');
INSERT INTO tb_data_log VALUES (9, '2013-02-05', '2013-02-06');
INSERT INTO tb_data_log VALUES (10, '2013-02-08', '2013-02-09');
编辑1:
问题似乎有点难以理解,以下是所需的输出:
GroupID StartDate EndDate Overlapped Id
1 2013-01-29 2013-02-02 1, 2, 3, 5, 6, 7
2 2013-01-25 2013-01-28 4
3 2013-02-04 2013-02-06 8,9
4 2013-02-08 2013-02-09 10
接近答案的东西就在附近
select
tmp.group_id, group_concat(tmp.id)
from
(select
a.event_id as 'group_id', b.event_id as 'id'
from
tb_data_log a
LEFT join tb_data_log b ON (a.startdate BETWEEN b.startdate AND b.enddate)
or (a.enddate BETWEEN b.startdate AND b.enddate)) as tmp
group by group_id
这里有一个解决方案。它应该工作并且不使用存储过程:
我们的想法是首先通过将表本身连接起来来找到所有可能的周期。然后对于每个周期P,确保没有一对周期A,B,因此A在B之前没有重叠,两者都包含在P中,并且它们之间没有事件。还要确保这不是最长的时间
这是我之前发布的解决方案,它更糟。留作参考
这可能不是很有效。我使用了这里选择的答案:
所以请注意,这个查询将在300年后停止工作
select per_start,per_end,group_concat(contained.event_id) from tb_data_log contained,(
select distinct start.startdate as per_start,
finish.enddate as per_end
from tb_data_log start, tb_data_log finish
where start.startdate <= finish.enddate -- first find all possible periods
and not exists (-- make sure there are no two consecutive days that are not contained in some event period.
select * from
(select adddate('1970-01-01',t4*10000 + t3*1000 + t2*100 + t1*10 + t0) day1, adddate('1970-01- 01',t4*10000 + t3*1000 + t2*100 + t1*10 + t0+1) day2 from
(select 0 t0 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t0,
(select 0 t1 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t1,
(select 0 t2 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t2,
(select 0 t3 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t3,
(select 0 t4 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t4) v
where day1 between start.startdate and finish.enddate and day2 between start.startdate and finish.enddate
and not exists (
select * from tb_data_log where tb_data_log.startdate <= cast(day1 as date) and tb_data_log.enddate >= cast(day2 as date)
)
)
and not exists (-- make sure there is no longer period
select * from tb_data_log later where later.startdate<=finish.enddate and later.enddate >finish.enddate
)
and not exists (-- make sure there is no longer period
select * from tb_data_log earlier where earlier.startdate<start.startdate and earlier.enddate >=start.startdate
)
) periods where contained.enddate<=per_end and contained.startdate>=per_start
group by per_start,per_end
我们的想法是首先通过将表本身连接起来来找到所有可能的周期。然后,对于每个时段,确保该时段中没有包含但未包含在表中某个事件时段中的连续天数对。还要确保这不是最长的时间
我认为该查询的性能可以有所提高。它与所需的结果相差甚远-按重叠分组的项目。我编辑了答案,使其更接近真实的结果。这可能有助于用户2724602或其他人获得最终答案谢谢大家的回答,我已编辑了问题并添加了一个示例表,其中包含所需的输出以便于理解。这是一个有趣的问题,但我担心,如果不访问MySQL不支持的某种递归函数,我无法找到一种简单的方法来实现这一点。我担心可能需要用一种普通的语言编写代码,可能需要多个查询。
select per_start,per_end,group_concat(contained.event_id) from tb_data_log contained,(
select distinct start.startdate as per_start,
finish.enddate as per_end
from tb_data_log start, tb_data_log finish
where start.startdate <= finish.enddate -- first find all possible periods
and not exists (-- make sure there are no two consecutive days that are not contained in some event period.
select * from
(select adddate('1970-01-01',t4*10000 + t3*1000 + t2*100 + t1*10 + t0) day1, adddate('1970-01- 01',t4*10000 + t3*1000 + t2*100 + t1*10 + t0+1) day2 from
(select 0 t0 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t0,
(select 0 t1 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t1,
(select 0 t2 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t2,
(select 0 t3 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t3,
(select 0 t4 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t4) v
where day1 between start.startdate and finish.enddate and day2 between start.startdate and finish.enddate
and not exists (
select * from tb_data_log where tb_data_log.startdate <= cast(day1 as date) and tb_data_log.enddate >= cast(day2 as date)
)
)
and not exists (-- make sure there is no longer period
select * from tb_data_log later where later.startdate<=finish.enddate and later.enddate >finish.enddate
)
and not exists (-- make sure there is no longer period
select * from tb_data_log earlier where earlier.startdate<start.startdate and earlier.enddate >=start.startdate
)
) periods where contained.enddate<=per_end and contained.startdate>=per_start
group by per_start,per_end