Sql 仅当每个用户的连续天数等于或大于30天时获取记录
我从查询中返回了以下数据。本质上,我把它放在一个临时表中,所以现在它在一个临时表中,我可以查询(很明显,在现实生活中有很多数据,我只是展示一个示例):Sql 仅当每个用户的连续天数等于或大于30天时获取记录,sql,sql-server,sql-server-2005,Sql,Sql Server,Sql Server 2005,我从查询中返回了以下数据。本质上,我把它放在一个临时表中,所以现在它在一个临时表中,我可以查询(很明显,在现实生活中有很多数据,我只是展示一个示例): 我只需要返回日期列中连续30天或以上的EmpId。我还需要返回这些连续工作30天或以上的员工的天数。可能有2组或多组不同的连续天数,即30天或更多天。在这个例子中,我想返回多行。因此,如果员工的日期为2011-01-01至2011-02-20,则返回此日期和一行中的计数。如果该员工的日期为2011-05-01至2011-07-01,则在另一行中返
我只需要返回日期列中连续30天或以上的EmpId。我还需要返回这些连续工作30天或以上的员工的天数。可能有2组或多组不同的连续天数,即30天或更多天。在这个例子中,我想返回多行。因此,如果员工的日期为2011-01-01至2011-02-20,则返回此日期和一行中的计数。如果该员工的日期为2011-05-01至2011-07-01,则在另一行中返回该日期。基本上,连续几天的所有休息都被视为一个单独的记录。像这样的事情应该可以做到,但还没有测试过
SELECT
a.empid
, count(*) as consecutive_count
, min(a.mydate) as startdate
FROM (SELECT * FROM logins ORDER BY mydate) a
INNER JOIN (SELECT * FROM logins ORDER BY mydate) b
ON (a.empid = b.empid AND datediff(day,a.mydate,b.mydate) = 1
GROUP BY a.empid, startdate
HAVING consecutive_count > 30
使用应做到以下几点:
;WITH sampledata
AS (SELECT 1 AS id, DATEADD(day, -0, GETDATE())AS somedate
UNION ALL SELECT 1, DATEADD(day, -1, GETDATE())
UNION ALL SELECT 1, DATEADD(day, -2, GETDATE())
UNION ALL SELECT 1, DATEADD(day, -3, GETDATE())
UNION ALL SELECT 1, DATEADD(day, -4, GETDATE())
UNION ALL SELECT 1, DATEADD(day, -5, GETDATE())
UNION ALL SELECT 1, DATEADD(day, -10, GETDATE())
UNION ALL SELECT 1, '2011-01-01 00:00:00'
UNION ALL SELECT 1, '2010-12-31 00:00:00'
UNION ALL SELECT 1, '2011-02-01 00:00:00'
UNION ALL SELECT 1, DATEADD(day, -10, GETDATE())
UNION ALL SELECT 2, DATEADD(day, 0, GETDATE())
UNION ALL SELECT 2, DATEADD(day, -1, GETDATE())
UNION ALL SELECT 2, DATEADD(day, -2, GETDATE())
UNION ALL SELECT 2, DATEADD(day, -6, GETDATE())
UNION ALL SELECT 3, DATEADD(day, 0, GETDATE())
UNION ALL SELECT 4, DATEADD(day, 0, GETDATE())
UNION ALL SELECT 5, DATEADD(day, 0, GETDATE()))
, ranking
AS (SELECT *, DENSE_RANK()OVER(PARTITION BY id ORDER BY DATEDIFF(day, 0, somedate)) - DATEDIFF(day, 0, somedate)AS dategroup
FROM sampledata)
SELECT id
, MIN(somedate)AS range_start
, MAX(somedate)AS range_end
, DATEDIFF(day, MIN(somedate), MAX(somedate)) + 1 AS consecutive_days
FROM ranking
GROUP BY id, dategroup
--HAVING DATEDIFF(day, MIN(somedate), MAX(somedate)) + 1 >= 30 --change as needed
ORDER BY id, range_start
这是递归CTE的一个很好的例子。我从@Davin那里偷了数据表:
with data AS --sample data
( SELECT 1 as id ,DATEADD(DD,-0,GETDATE()) as date UNION ALL
SELECT 1 as id ,DATEADD(DD,-1,GETDATE()) as date UNION ALL
SELECT 1 as id ,DATEADD(DD,-2,GETDATE()) as date UNION ALL
SELECT 1 as id ,DATEADD(DD,-3,GETDATE()) as date UNION ALL
SELECT 1 as id ,DATEADD(DD,-4,GETDATE()) as date UNION ALL
SELECT 1 as id ,DATEADD(DD,-5,GETDATE()) as date UNION ALL
SELECT 1 as id ,DATEADD(DD,-10,GETDATE()) as date UNION ALL
SELECT 1 as id ,'2011-01-01 00:00:00.000' as date UNION ALL
SELECT 1 as id ,'2010-12-31 00:00:00.000' as date UNION ALL
SELECT 1 as id ,'2011-02-01 00:00:00.000' as date UNION ALL
SELECT 1 as id ,DATEADD(DD,-10,GETDATE()) as date UNION ALL
SELECT 2 as id ,DATEADD(DD,0,GETDATE()) as date UNION ALL
SELECT 2 as id ,DATEADD(DD,-1,GETDATE()) as date UNION ALL
SELECT 2 as id ,DATEADD(DD,-2,GETDATE()) as date UNION ALL
SELECT 2 as id ,DATEADD(DD,-6,GETDATE()) as date UNION ALL
SELECT 3 as id ,DATEADD(DD,0,GETDATE()) as date UNION ALL
SELECT 4 as id ,DATEADD(DD,0,GETDATE()) as date UNION ALL
SELECT 5 as id ,DATEADD(DD,0,GETDATE()) as date )
,CTE AS
(
SELECT id, CAST(date as date) Date, Consec = 1
FROM data
UNION ALL
SELECT t.id, CAST(t.date as DATE) Date, Consec = (c.Consec + 1)
FROM data T
INNER JOIN CTE c
ON T.id = c.id
AND CAST(t.date as date) = CAST(DATEADD(day, 1, c.date) as date)
)
SELECT id, MAX(consec)
FROM CTE
GROUP BY id
ORDER BY id
基本上,这会为每个人生成许多行,并测量每个日期代表的行中的天数。假设同一员工没有重复的日期:
;WITH ranged AS (
SELECT
EmpId,
Date,
RangeId = DATEDIFF(DAY, 0, Date)
- ROW_NUMBER() OVER (PARTITION BY EmpId ORDER BY Date)
FROM atable
)
SELECT
EmpId,
StartDate = MIN(Date),
EndDate = MAX(Date),
DayCount = DATEDIFF(DAY, MIN(Date), MAX(Date)) + 1
FROM ranged
GROUP BY EmpId, RangeId
HAVING DATEDIFF(DAY, MIN(Date), MAX(Date)) + 1 >= 30
ORDER BY EmpId, MIN(Date)
DATEDIFF将日期转换为整数(0日期(1900-01-01
)和date
)之间的天数差)。如果日期是连续的,则整数也是连续的。以问题中的数据样本为例,DATEDIFF结果为:
EmpId Date DATEDIFF
----- ---------- --------
1 2011-01-01 40542
1 2011-01-02 40543
1 2011-01-03 40544
2 2011-02-03 40575
3 2011-03-01 40601
4 2011-03-02 40602
5 2011-01-02 40543
现在,如果您获取每个员工的行,按日期顺序为其分配行号,并获取数字表示形式和行号之间的差异,您将发现连续数字(因此,连续日期)的差异保持不变。使用稍微不同的示例进行更好的说明,它将如下所示:
Date DATEDIFF RowNum RangeId
---------- -------- ------ -------
2011-01-01 40542 1 40541
2011-01-02 40543 2 40541
2011-01-03 40544 3 40541
2011-01-05 40546 4 40542
2011-01-07 40548 5 40543
2011-01-08 40549 6 40543
2011-01-09 40550 7 40543
RangeId
的具体值并不重要,重要的是它在连续日期中保持不变。基于这一事实,您可以使用它作为分组标准来计算组中的日期并获得范围边界
上面的查询使用
DATEDIFF(DAY,MIN(Date),MAX(Date))+1
来计算天数,但您也可以简单地使用count(*)
。检查这个问题:适用于同一个月内的范围,但在月份转换时会中断。@Andriy M-更改了它,以便它也可以跨年跨月工作。,事实证明我们的想法是一样的ROW_NUMBER()可能比DENS_RANK()快,但是使用稠密的_RANK()应该可以解释重复的日期。(只有我会在OVER条款中将orderbydate
更改为orderbydatediff(day,0,date)
。@Andriy M-同意,认为解释重复更有用。你知道如何将代码从这里复制并粘贴到SSMS中,而不只是将其全部放在SSMS中的一行吗?我不认为如果一个id有超过一个连续30天的范围,这将返回多行…我知道这是一篇非常古老的帖子,但是你能解释RangeId是如何工作的吗?我不明白您是如何找到一个可以使用Row_Number()进行分组的值的。
Date DATEDIFF RowNum RangeId
---------- -------- ------ -------
2011-01-01 40542 1 40541
2011-01-02 40543 2 40541
2011-01-03 40544 3 40541
2011-01-05 40546 4 40542
2011-01-07 40548 5 40543
2011-01-08 40549 6 40543
2011-01-09 40550 7 40543