sql中的连续天数
我发现很多关于连续几天的stackoverflow QNA。 答案仍然太短,我无法理解到底发生了什么 为了具体起见,我将制作一个模型或一张桌子 我正在使用postgresql,如果它能起作用的话sql中的连续天数,sql,postgresql,Sql,Postgresql,我发现很多关于连续几天的stackoverflow QNA。 答案仍然太短,我无法理解到底发生了什么 为了具体起见,我将制作一个模型或一张桌子 我正在使用postgresql,如果它能起作用的话 CREATE TABLE work ( id integer NOT NULL, user_id integer NOT NULL, arrived_at timestamp with time zone NOT NULL ); insert into work(user_i
CREATE TABLE work (
id integer NOT NULL,
user_id integer NOT NULL,
arrived_at timestamp with time zone NOT NULL
);
insert into work(user_id, arrived_at) values(1, '01/03/2011');
insert into work(user_id, arrived_at) values(1, '01/04/2011');
以给定用户的最简单形式,我希望找到最后一个连续的日期范围
对于给定的用户,我的最终目标是找到他连续的工作日。
如果他昨天来上班,到今天为止,他还有机会连续工作几天。所以我给他看了昨天之前的连续几天。
但如果他错过了昨天的比赛,那么他的连续几天要么是0天,要么是1天,这取决于他今天来不来
假设今天是第八天
3 * 5 6 7 * = 3 days (5 to 7)
3 * 5 6 7 8 = 4 days (5 to 8)
3 4 5 * 7 * = 1 day (7 to 7)
3 * * * * * = 0 day
3 * * * * 8 = 1 day (8 to 8)
下面是我使用CTE解决这个问题的方法 检查以下位置的代码: 以下是查询的工作方式: 它从考勤表中选择今天的记录。如果今天的记录不可用,则选择昨天的记录 然后,它在最短日期的前一天不断递归地添加记录 如果要选择最新的连续日期范围,而不考虑用户的最新出席日期是什么时候、昨天还是x天前,则CTE的初始化部分必须替换为以下代码段:
SELECT MAX(attendanceDate) FROM attendance
[编辑]
下面是SQL FIDLE的查询,它解决了您的问题1:
结果:
CREATE TABLE
INSERT 0 14
user_id | first_day | last_day | nday
---------+------------+------------+------
1 | 2014-02-05 | 2014-02-07 | 3
2 | 2014-02-05 | 2014-02-08 | 4
(2 rows)
可以使用以下范围类型创建聚合:
Create function sfunc (tstzrange, timestamptz)
returns tstzrange
language sql strict as $$
select case when $2 - upper($1) <= '1 day'::interval
then tstzrange(lower($1), $2, '[]')
else tstzrange($2, $2, '[]') end
$$;
Create aggregate consecutive (timestamptz) (
sfunc = sfunc,
stype = tstzrange,
initcond = '[,]'
);
在窗口函数中使用聚合:
Select *,
consecutive(arrived_at)
over (partition by user_id order by arrived_at)
from work;
┌────┬─────────┬────────────────────────┬─────────────────────────────────────────────────────┐
│ id │ user_id │ arrived_at │ consecutive │
├────┼─────────┼────────────────────────┼─────────────────────────────────────────────────────┤
│ 1 │ 1 │ 2011-01-03 00:00:00+02 │ ["2011-01-03 00:00:00+02","2011-01-03 00:00:00+02"] │
│ 2 │ 1 │ 2011-01-04 00:00:00+02 │ ["2011-01-03 00:00:00+02","2011-01-04 00:00:00+02"] │
│ 3 │ 1 │ 2011-01-05 00:00:00+02 │ ["2011-01-03 00:00:00+02","2011-01-05 00:00:00+02"] │
│ 4 │ 2 │ 2011-01-06 00:00:00+02 │ ["2011-01-06 00:00:00+02","2011-01-06 00:00:00+02"] │
└────┴─────────┴────────────────────────┴─────────────────────────────────────────────────────┘
查询结果以查找所需内容:
With work_detail as (select *,
consecutive(arrived_at)
over (partition by user_id order by arrived_at)
from work)
select arrived_at, upper(consecutive) - lower(consecutive) as days
from work_detail
where user_id = 1 and upper(consecutive) != lower(consecutive)
order by arrived_at desc
limit 1;
┌────────────────────────┬────────┐
│ arrived_at │ days │
├────────────────────────┼────────┤
│ 2011-01-05 00:00:00+02 │ 2 days │
└────────────────────────┴────────┘
您甚至可以在不使用递归CTE的情况下执行此操作: 使用generate_series、LEFT JOIN、row_count和最终限制1: 1今天加上截至昨天的连续几天:
SELECT count(*) -- 1 / 0 for "today"
+ COALESCE(( -- + optional count of consecutive days up until "yesterday"
SELECT ct
FROM (
SELECT d.ct, count(w.arrived_at) OVER (ORDER BY d.ct) AS day_ct
FROM generate_series(1, 8) AS d(ct) -- maximum = 8
LEFT JOIN work w ON w.arrived_at >= current_date - d.ct
AND w.arrived_at < current_date - (d.ct - 1)
AND w.user_id = 1 -- given user
) sub
WHERE ct = day_ct
ORDER BY ct DESC
LIMIT 1
), 0) AS total
FROM work
WHERE arrived_at >= current_date -- no future timestamps
AND user_id = 1 -- given user
有趣的问题…你们能添加表的模式吗?模式和样本数据作为创建表和插入以及期望的结果。请添加真实的DDL+样本数据。请不要用速记法。你能给我原来的小提琴吗?它似乎解决了我的第一个问题?没有今天/昨天的考虑,以便我可以首先了解您查询的基本内容?如果用户每天可以参加一次以上,仍然将其视为一天的出席人数,是否需要完全重写您的代码?否,我想我们只需要从attendanceDate中提取日期部分,无论它在查询中使用在哪里,rest都应该工作正常。这比CTE解决方案快吗?@eugene:可能是的。考虑简化的更新。你能在你的数据上运行EXPLAIN ANALYSE和EXPLAIN ANALYSE吗?我还没有足够大的数据集。我花了很长时间才把答案转换成我的实际模式
Select *,
consecutive(arrived_at)
over (partition by user_id order by arrived_at)
from work;
┌────┬─────────┬────────────────────────┬─────────────────────────────────────────────────────┐
│ id │ user_id │ arrived_at │ consecutive │
├────┼─────────┼────────────────────────┼─────────────────────────────────────────────────────┤
│ 1 │ 1 │ 2011-01-03 00:00:00+02 │ ["2011-01-03 00:00:00+02","2011-01-03 00:00:00+02"] │
│ 2 │ 1 │ 2011-01-04 00:00:00+02 │ ["2011-01-03 00:00:00+02","2011-01-04 00:00:00+02"] │
│ 3 │ 1 │ 2011-01-05 00:00:00+02 │ ["2011-01-03 00:00:00+02","2011-01-05 00:00:00+02"] │
│ 4 │ 2 │ 2011-01-06 00:00:00+02 │ ["2011-01-06 00:00:00+02","2011-01-06 00:00:00+02"] │
└────┴─────────┴────────────────────────┴─────────────────────────────────────────────────────┘
With work_detail as (select *,
consecutive(arrived_at)
over (partition by user_id order by arrived_at)
from work)
select arrived_at, upper(consecutive) - lower(consecutive) as days
from work_detail
where user_id = 1 and upper(consecutive) != lower(consecutive)
order by arrived_at desc
limit 1;
┌────────────────────────┬────────┐
│ arrived_at │ days │
├────────────────────────┼────────┤
│ 2011-01-05 00:00:00+02 │ 2 days │
└────────────────────────┴────────┘
SELECT count(*) -- 1 / 0 for "today"
+ COALESCE(( -- + optional count of consecutive days up until "yesterday"
SELECT ct
FROM (
SELECT d.ct, count(w.arrived_at) OVER (ORDER BY d.ct) AS day_ct
FROM generate_series(1, 8) AS d(ct) -- maximum = 8
LEFT JOIN work w ON w.arrived_at >= current_date - d.ct
AND w.arrived_at < current_date - (d.ct - 1)
AND w.user_id = 1 -- given user
) sub
WHERE ct = day_ct
ORDER BY ct DESC
LIMIT 1
), 0) AS total
FROM work
WHERE arrived_at >= current_date -- no future timestamps
AND user_id = 1 -- given user
CREATE INDEX foo_idx ON work (user_id,arrived_at);