sql中的连续天数

sql中的连续天数,sql,postgresql,Sql,Postgresql,我发现很多关于连续几天的stackoverflow QNA。 答案仍然太短,我无法理解到底发生了什么 为了具体起见,我将制作一个模型或一张桌子 我正在使用postgresql,如果它能起作用的话 CREATE TABLE work ( id integer NOT NULL, user_id integer NOT NULL, arrived_at timestamp with time zone NOT NULL ); insert into work(user_i

我发现很多关于连续几天的stackoverflow QNA。 答案仍然太短,我无法理解到底发生了什么

为了具体起见,我将制作一个模型或一张桌子 我正在使用postgresql,如果它能起作用的话

CREATE TABLE work (
    id integer NOT NULL,
    user_id integer NOT NULL,
    arrived_at timestamp with time zone NOT NULL
);


insert into work(user_id, arrived_at) values(1, '01/03/2011');
insert into work(user_id, arrived_at) values(1, '01/04/2011');
以给定用户的最简单形式,我希望找到最后一个连续的日期范围

对于给定的用户,我的最终目标是找到他连续的工作日。 如果他昨天来上班,到今天为止,他还有机会连续工作几天。所以我给他看了昨天之前的连续几天。 但如果他错过了昨天的比赛,那么他的连续几天要么是0天,要么是1天,这取决于他今天来不来

假设今天是第八天

3 * 5 6 7 * = 3 days (5 to 7)
3 * 5 6 7 8 = 4 days (5 to 8)
3 4 5 * 7 * = 1 day (7 to 7)
3 * * * * * = 0 day 
3 * * * * 8 = 1 day (8 to 8)

下面是我使用CTE解决这个问题的方法

检查以下位置的代码:

以下是查询的工作方式:

它从考勤表中选择今天的记录。如果今天的记录不可用,则选择昨天的记录 然后,它在最短日期的前一天不断递归地添加记录 如果要选择最新的连续日期范围,而不考虑用户的最新出席日期是什么时候、昨天还是x天前,则CTE的初始化部分必须替换为以下代码段:

SELECT MAX(attendanceDate) FROM attendance
[编辑] 下面是SQL FIDLE的查询,它解决了您的问题1:

结果:

CREATE TABLE
INSERT 0 14
 user_id | first_day  |  last_day  | nday 
---------+------------+------------+------
       1 | 2014-02-05 | 2014-02-07 |    3
       2 | 2014-02-05 | 2014-02-08 |    4
(2 rows)

可以使用以下范围类型创建聚合:

Create function sfunc (tstzrange, timestamptz)
    returns tstzrange
    language sql strict as $$
        select case when $2 - upper($1) <= '1 day'::interval
                then tstzrange(lower($1), $2, '[]')
                else tstzrange($2, $2, '[]') end
    $$;

Create aggregate consecutive (timestamptz) (
        sfunc = sfunc,
        stype = tstzrange,
        initcond = '[,]'
);
在窗口函数中使用聚合:

Select *,
        consecutive(arrived_at)
                over (partition by user_id order by arrived_at)
    from work;

    ┌────┬─────────┬────────────────────────┬─────────────────────────────────────────────────────┐
    │ id │ user_id │       arrived_at       │                     consecutive                     │
    ├────┼─────────┼────────────────────────┼─────────────────────────────────────────────────────┤
    │  1 │       1 │ 2011-01-03 00:00:00+02 │ ["2011-01-03 00:00:00+02","2011-01-03 00:00:00+02"] │
    │  2 │       1 │ 2011-01-04 00:00:00+02 │ ["2011-01-03 00:00:00+02","2011-01-04 00:00:00+02"] │
    │  3 │       1 │ 2011-01-05 00:00:00+02 │ ["2011-01-03 00:00:00+02","2011-01-05 00:00:00+02"] │
    │  4 │       2 │ 2011-01-06 00:00:00+02 │ ["2011-01-06 00:00:00+02","2011-01-06 00:00:00+02"] │
    └────┴─────────┴────────────────────────┴─────────────────────────────────────────────────────┘
查询结果以查找所需内容:

With work_detail as (select *,
            consecutive(arrived_at)
                    over (partition by user_id order by arrived_at)
        from work)
    select arrived_at, upper(consecutive) - lower(consecutive) as days
        from work_detail
            where user_id = 1 and upper(consecutive) != lower(consecutive)
            order by arrived_at desc
                limit 1;

    ┌────────────────────────┬────────┐
    │       arrived_at       │  days  │
    ├────────────────────────┼────────┤
    │ 2011-01-05 00:00:00+02 │ 2 days │
    └────────────────────────┴────────┘

您甚至可以在不使用递归CTE的情况下执行此操作: 使用generate_series、LEFT JOIN、row_count和最终限制1:

1今天加上截至昨天的连续几天:

SELECT count(*)   -- 1 / 0  for "today"
     + COALESCE(( -- + optional count of consecutive days up until "yesterday"
       SELECT ct
       FROM  (
          SELECT d.ct, count(w.arrived_at) OVER (ORDER BY d.ct) AS day_ct
          FROM   generate_series(1, 8) AS d(ct)   -- maximum = 8
          LEFT   JOIN work w ON  w.arrived_at >= current_date -  d.ct
                             AND w.arrived_at <  current_date - (d.ct - 1)
                             AND w.user_id = 1    -- given user
          ) sub
       WHERE  ct = day_ct
       ORDER  BY ct DESC
       LIMIT  1
       ), 0) AS total
FROM   work
WHERE  arrived_at >= current_date  -- no future timestamps
AND    user_id = 1                 -- given user

有趣的问题…你们能添加表的模式吗?模式和样本数据作为创建表和插入以及期望的结果。请添加真实的DDL+样本数据。请不要用速记法。你能给我原来的小提琴吗?它似乎解决了我的第一个问题?没有今天/昨天的考虑,以便我可以首先了解您查询的基本内容?如果用户每天可以参加一次以上,仍然将其视为一天的出席人数,是否需要完全重写您的代码?否,我想我们只需要从attendanceDate中提取日期部分,无论它在查询中使用在哪里,rest都应该工作正常。这比CTE解决方案快吗?@eugene:可能是的。考虑简化的更新。你能在你的数据上运行EXPLAIN ANALYSE和EXPLAIN ANALYSE吗?我还没有足够大的数据集。我花了很长时间才把答案转换成我的实际模式
Select *,
        consecutive(arrived_at)
                over (partition by user_id order by arrived_at)
    from work;

    ┌────┬─────────┬────────────────────────┬─────────────────────────────────────────────────────┐
    │ id │ user_id │       arrived_at       │                     consecutive                     │
    ├────┼─────────┼────────────────────────┼─────────────────────────────────────────────────────┤
    │  1 │       1 │ 2011-01-03 00:00:00+02 │ ["2011-01-03 00:00:00+02","2011-01-03 00:00:00+02"] │
    │  2 │       1 │ 2011-01-04 00:00:00+02 │ ["2011-01-03 00:00:00+02","2011-01-04 00:00:00+02"] │
    │  3 │       1 │ 2011-01-05 00:00:00+02 │ ["2011-01-03 00:00:00+02","2011-01-05 00:00:00+02"] │
    │  4 │       2 │ 2011-01-06 00:00:00+02 │ ["2011-01-06 00:00:00+02","2011-01-06 00:00:00+02"] │
    └────┴─────────┴────────────────────────┴─────────────────────────────────────────────────────┘
With work_detail as (select *,
            consecutive(arrived_at)
                    over (partition by user_id order by arrived_at)
        from work)
    select arrived_at, upper(consecutive) - lower(consecutive) as days
        from work_detail
            where user_id = 1 and upper(consecutive) != lower(consecutive)
            order by arrived_at desc
                limit 1;

    ┌────────────────────────┬────────┐
    │       arrived_at       │  days  │
    ├────────────────────────┼────────┤
    │ 2011-01-05 00:00:00+02 │ 2 days │
    └────────────────────────┴────────┘
SELECT count(*)   -- 1 / 0  for "today"
     + COALESCE(( -- + optional count of consecutive days up until "yesterday"
       SELECT ct
       FROM  (
          SELECT d.ct, count(w.arrived_at) OVER (ORDER BY d.ct) AS day_ct
          FROM   generate_series(1, 8) AS d(ct)   -- maximum = 8
          LEFT   JOIN work w ON  w.arrived_at >= current_date -  d.ct
                             AND w.arrived_at <  current_date - (d.ct - 1)
                             AND w.user_id = 1    -- given user
          ) sub
       WHERE  ct = day_ct
       ORDER  BY ct DESC
       LIMIT  1
       ), 0) AS total
FROM   work
WHERE  arrived_at >= current_date  -- no future timestamps
AND    user_id = 1                 -- given user
CREATE INDEX foo_idx ON work (user_id,arrived_at);