Sql 如何在数据集中高效地查询连续的日期集?

Sql 如何在数据集中高效地查询连续的日期集?,sql,oracle,Sql,Oracle,我有一个表,每个站点每月每天(每年每月)包含一条记录。我需要能够确定给定月份的站点是否至少有15条连续记录,并且我需要知道连续天数序列的开始和结束日期。我可以在存储过程中实现这一点,但我希望这可以在单个查询中实现。我正在处理一个相当大的数据集,每月至少有3000万条记录 示例结果: site_id | start_date | end_date 1 | oct 1, 08 | oct 2, 08 1 | oct 2, 08 | oct 3, 08 ...

我有一个表,每个站点每月每天(每年每月)包含一条记录。我需要能够确定给定月份的站点是否至少有15条连续记录,并且我需要知道连续天数序列的开始和结束日期。我可以在存储过程中实现这一点,但我希望这可以在单个查询中实现。我正在处理一个相当大的数据集,每月至少有3000万条记录

示例结果:

site_id | start_date | end_date
      1 | oct  1, 08 | oct  2, 08
      1 | oct  2, 08 | oct  3, 08
 ...
      1 | oct 30, 08 | oct 31, 08
      2 | oct  1, 08 | oct  2, 08
      2 | oct  2, 08 | oct  3, 08
 ...
      2 | oct 30, 08 | oct 31, 08

谢谢你的帮助

这绝对是很有可能的。几个月前,我在SQL Server中解决了一个类似的问题。我对Oracle语法一无所知,因此我恐怕无法为您转换if,但如果您对Oracle非常熟悉,应该足以实现这一目标。

您的数据库结构不适合您的业务逻辑:

  • 结束日期总是在开始日期后的第二天,那么为什么必须将其存储在数据库中
  • 我看到在您提供的数据示例中,单个站点的日期范围内没有空格。这意味着不必存储所有日期,只需开始和停止日期即可
每月3000万条记录对于您必须编写的查询来说确实是一张表格。
对该表进行结构重构是我的建议。

以下是一个如何进行此类查询的示例:

site_id | contiguous_start_date | contiguous_end_date
      1 | oct 5, 2008           | oct 20, 2008
      2 | oct 10                | oct 30, 2008
      3 | oct 1                 | oct 31, 2008 
然后查询:

SQL> create table t (site_id,start_date,end_date)
  2  as
  3  select 1, date '2008-10-01', date '2008-10-02' from dual union all
  4  select 1, date '2008-10-02', date '2008-10-03' from dual union all
  5  select 1, date '2008-10-03', date '2008-10-30' from dual union all
  6  select 1, date '2008-10-30', date '2008-10-31' from dual union all
  7  select 2, date '2008-10-01', date '2008-10-02' from dual union all
  8  select 2, date '2008-10-02', date '2008-10-03' from dual union all
  9  select 2, date '2008-10-03', date '2008-10-04' from dual union all
 10  select 2, date '2008-10-04', date '2008-10-05' from dual union all
 11  select 2, date '2008-10-05', date '2008-10-06' from dual union all
 12  select 2, date '2008-10-06', date '2008-10-07' from dual union all
 13  select 2, date '2008-10-07', date '2008-10-08' from dual union all
 14  select 2, date '2008-10-08', date '2008-10-09' from dual union all
 15  select 2, date '2008-10-09', date '2008-10-10' from dual union all
 16  select 2, date '2008-10-10', date '2008-10-11' from dual union all
 17  select 2, date '2008-10-11', date '2008-10-12' from dual union all
 18  select 2, date '2008-10-12', date '2008-10-13' from dual union all
 19  select 2, date '2008-10-13', date '2008-10-14' from dual union all
 20  select 2, date '2008-10-14', date '2008-10-15' from dual union all
 21  select 2, date '2008-10-15', date '2008-10-16' from dual union all
 22  select 2, date '2008-10-16', date '2008-10-17' from dual union all
 23  select 2, date '2008-10-17', date '2008-10-18' from dual union all
 24  select 2, date '2008-10-18', date '2008-10-19' from dual union all
 25  select 2, date '2008-10-19', date '2008-10-20' from dual union all
 26  select 3, date '2008-10-01', date '2008-10-02' from dual union all
 27  select 3, date '2008-10-02', date '2008-10-03' from dual union all
 28  select 3, date '2008-10-03', date '2008-10-04' from dual union all
 29  select 3, date '2008-10-04', date '2008-10-05' from dual union all
 30  select 3, date '2008-10-05', date '2008-10-06' from dual union all
 31  select 3, date '2008-10-06', date '2008-10-07' from dual union all
 32  select 3, date '2008-10-07', date '2008-10-08' from dual union all
 33  select 3, date '2008-10-08', date '2008-10-09' from dual union all
 34  select 3, date '2008-10-09', date '2008-10-10' from dual union all
 35  select 3, date '2008-10-30', date '2008-10-31' from dual
 36  /

Tabel is aangemaakt.
结果是:

SQL> select site_id
  2       , min(start_date) contiguous_start_date
  3       , max(end_date) contiguous_end_date
  4       , count(*) number_of_contiguous_records
  5    from ( select site_id
  6                , start_date
  7                , end_date
  8                , max(rn) over (partition by site_id order by start_date) maxrn
  9             from ( select site_id
 10                         , start_date
 11                         , end_date
 12                         , case lag(end_date) over (partition by site_id order by start_date)
 13                             when start_date then null
 14                             else rownum
 15                           end rn
 16                      from t
 17                  )
 18          )
 19   group by site_id
 20       , maxrn
 21   order by site_id
 22       , contiguous_start_date
 23  /
问候,,
罗布。

解决问题的聪明方法。谢谢。什么时候“结束日期”不等于“开始日期+1天”?因为如果不需要同时查看这两列,查询会更简单。
   SITE_ID CONTIGUOUS_START_DA CONTIGUOUS_END_DATE NUMBER_OF_CONTIGUOUS_RECORDS
---------- ------------------- ------------------- ----------------------------
         1 01-10-2008 00:00:00 31-10-2008 00:00:00                            4
         2 01-10-2008 00:00:00 20-10-2008 00:00:00                           19
         3 01-10-2008 00:00:00 10-10-2008 00:00:00                            9
         3 30-10-2008 00:00:00 31-10-2008 00:00:00                            1

4 rijen zijn geselecteerd.