Sql 在(分区依据…)上使用行数()时出现问题
我正在使用SQLServer2008R2。我有一个名为EmployeeHistory的表,其中包含以下结构和示例数据:Sql 在(分区依据…)上使用行数()时出现问题,sql,sql-server,sql-server-2008,row-number,gaps-and-islands,Sql,Sql Server,Sql Server 2008,Row Number,Gaps And Islands,我正在使用SQLServer2008R2。我有一个名为EmployeeHistory的表,其中包含以下结构和示例数据: EmployeeID Date DepartmentID SupervisorID 10001 20130101 001 10009 10001 20130909 001 10019 10001 20131201 002 10018 10001 20140501 00
EmployeeID Date DepartmentID SupervisorID
10001 20130101 001 10009
10001 20130909 001 10019
10001 20131201 002 10018
10001 20140501 002 10017
10001 20141001 001 10015
10001 20141201 001 10014
请注意,随着时间的推移,员工10001已经更换了两个部门和几个主管。我想做的是,按日期字段列出该员工在每个部门的工作开始和结束日期。因此,输出将如下所示:
EmployeeID DateStart DateEnd DepartmentID
10001 20130101 20131201 001
10001 20131201 20141001 002
10001 20141001 NULL 001
;WITH x
AS (SELECT *,
Row_number()
OVER(
partition BY employeeid
ORDER BY datestart) rn
FROM employeehistory)
SELECT *
FROM x x1
LEFT OUTER JOIN x x2
ON x1.rn = x2.rn + 1
我打算使用下面的查询对数据进行分区,但失败了。部门从001更改为002,然后再返回001。显然我不能按部门ID分区。。。我肯定我忽略了显而易见的事实。有什么帮助吗?提前谢谢你
SELECT * ,ROW_NUMBER() OVER (PARTITION BY EmployeeID, DepartmentID
ORDER BY [Date]) RN FROM EmployeeHistory
我会这样做:
EmployeeID DateStart DateEnd DepartmentID
10001 20130101 20131201 001
10001 20131201 20141001 002
10001 20141001 NULL 001
;WITH x
AS (SELECT *,
Row_number()
OVER(
partition BY employeeid
ORDER BY datestart) rn
FROM employeehistory)
SELECT *
FROM x x1
LEFT OUTER JOIN x x2
ON x1.rn = x2.rn + 1
或者可能是x2.rn-1。你得看看。在任何情况下,你都明白了。一旦将表本身连接起来,就可以进行筛选、分组、排序等操作,以获得所需的内容。有点复杂。最简单的方法是引用我为您创建的,生成精确结果的方法。出于性能或其他方面的考虑,您可以通过一些方法来改进它,但希望这至少比其他方法更清晰 要点是,首先获得数据的标准排名,然后使用该排名将数据分为多个组,然后为每个组找到结束日期,然后删除任何中间行。行数和交叉应用在易读性方面有很大帮助 编辑2019: 事实上,出于某种原因,SQL Fiddle看起来确实坏了,但在SQL Fiddle站点上似乎是个问题。以下是刚刚在SQL Server 2016上测试的完整版本:
CREATE TABLE Source
(
EmployeeID int,
DateStarted date,
DepartmentID int
)
INSERT INTO Source
VALUES
(10001,'2013-01-01',001),
(10001,'2013-09-09',001),
(10001,'2013-12-01',002),
(10001,'2014-05-01',002),
(10001,'2014-10-01',001),
(10001,'2014-12-01',001)
SELECT *,
ROW_NUMBER() OVER (PARTITION BY EmployeeID ORDER BY DateStarted) AS EntryRank,
newid() as GroupKey,
CAST(NULL AS date) AS EndDate
INTO #RankedData
FROM Source
;
UPDATE #RankedData
SET GroupKey = beginDate.GroupKey
FROM #RankedData sup
CROSS APPLY
(
SELECT TOP 1 GroupKey
FROM #RankedData sub
WHERE sub.EmployeeID = sup.EmployeeID AND
sub.DepartmentID = sup.DepartmentID AND
NOT EXISTS
(
SELECT *
FROM #RankedData bot
WHERE bot.EmployeeID = sup.EmployeeID AND
bot.EntryRank BETWEEN sub.EntryRank AND sup.EntryRank AND
bot.DepartmentID <> sup.DepartmentID
)
ORDER BY DateStarted ASC
) beginDate (GroupKey);
UPDATE #RankedData
SET EndDate = nextGroup.DateStarted
FROM #RankedData sup
CROSS APPLY
(
SELECT TOP 1 DateStarted
FROM #RankedData sub
WHERE sub.EmployeeID = sup.EmployeeID AND
sub.DepartmentID <> sup.DepartmentID AND
sub.EntryRank > sup.EntryRank
ORDER BY EntryRank ASC
) nextGroup (DateStarted);
SELECT * FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY GroupKey ORDER BY EntryRank ASC) AS GroupRank FROM #RankedData
) FinalRanking
WHERE GroupRank = 1
ORDER BY EntryRank;
DROP TABLE #RankedData
DROP TABLE Source
这似乎是一个常见的缺口和孤岛问题。行号rn1和rn2的两个序列之间的差异给出了组号 逐个CTE运行此查询,并检查中间结果以了解其工作原理 样本数据 我对问题中的样本数据进行了一些扩展
DECLARE @Source TABLE
(
EmployeeID int,
DateStarted date,
DepartmentID int
)
INSERT INTO @Source
VALUES
(10001,'2013-01-01',001),
(10001,'2013-09-09',001),
(10001,'2013-12-01',002),
(10001,'2014-05-01',002),
(10001,'2014-10-01',001),
(10001,'2014-12-01',001),
(10005,'2013-05-01',001),
(10005,'2013-11-09',001),
(10005,'2013-12-01',002),
(10005,'2014-10-01',001),
(10005,'2016-12-01',001);
查询SQL Server 2008
SQLServer2008中没有LEAD函数,因此我必须通过outerApply使用self-join来获取DateEnd的下一行的值
查询SQL Server 2012+
从SQL Server 2012开始,有一个LEAD功能可以提高此任务的效率
WITH
CTE
AS
(
SELECT
EmployeeID
,DateStarted
,DepartmentID
,ROW_NUMBER() OVER (PARTITION BY EmployeeID ORDER BY DateStarted) AS rn1
,ROW_NUMBER() OVER (PARTITION BY EmployeeID, DepartmentID ORDER BY DateStarted) AS rn2
FROM @Source
)
,CTE_Groups
AS
(
SELECT
EmployeeID
,MIN(DateStarted) AS DateStart
,DepartmentID
FROM CTE
GROUP BY
EmployeeID
,DepartmentID
,rn1 - rn2
)
SELECT
CTE_Groups.EmployeeID
,CTE_Groups.DepartmentID
,CTE_Groups.DateStart
,LEAD(CTE_Groups.DateStart) OVER (PARTITION BY CTE_Groups.EmployeeID ORDER BY CTE_Groups.DateStart) AS DateEnd
FROM
CTE_Groups
ORDER BY
EmployeeID
,DateStart
;
结果
两年多后,@coding_idiot,但我已经通过一个完整的脚本对其进行了更正,以便在您自己的服务器上运行。@DominicP,我认为可以用一种更简单、更高效的方式完成。看看我的答案。