如何在SQL中将顺序的、带时间戳的行组合在一起,并返回每个组的日期范围
我有一个MS SQL 2008数据库表,如下所示: 注册|日期| DriverID | TrailerID 下面是一些数据的示例:如何在SQL中将顺序的、带时间戳的行组合在一起,并返回每个组的日期范围,sql,sql-server-2008,Sql,Sql Server 2008,我有一个MS SQL 2008数据库表,如下所示: 注册|日期| DriverID | TrailerID 下面是一些数据的示例: AB53EDH,2013/07/03 10:00,54,23 AB53EDH,2013/07/03 10:01,54,23 ... AB53EDH,2013/07/03 10:45,54,23 AB53EDH,2013/07/03 10:46,54,NULL <-- Trailer changed AB53EDH,2013/07/03 10:47,54,NUL
AB53EDH,2013/07/03 10:00,54,23
AB53EDH,2013/07/03 10:01,54,23
...
AB53EDH,2013/07/03 10:45,54,23
AB53EDH,2013/07/03 10:46,54,NULL <-- Trailer changed
AB53EDH,2013/07/03 10:47,54,NULL
...
AB53EDH,2013/07/03 11:05,54,NULL
AB53EDH,2013/07/03 11:06,54,102 <-- Trailer changed
AB53EDH,2013/07/03 11:07,54,102
...
AB53EDH,2013/07/03 12:32,54,102
AB53EDH,2013/07/03 12:33,72,102 <-- Driver changed
AB53EDH,2013/07/03 12:34,72,102
您将如何通过SQL实现这一点
更新:感谢到目前为止的答案。不幸的是,当我将其应用于我拥有的生产数据时,它们停止了工作。目前提交的查询在应用于部分数据时无法正常工作
下面是一些用于生成数据表并用上面的虚拟数据填充它的示例查询。这里的数据比上述示例中的数据要多:重复了驾驶员、拖车组合54、23和54、NULL,以确保查询识别出这是两个不同的组。我还使用不同的日期范围复制了相同的数据三次,以测试在部分数据集上运行查询时是否有效:
CREATE TABLE [dbo].[TempTable](
[Registration] [nvarchar](50) NOT NULL,
[Date] [datetime] NOT NULL,
[DriverID] [int] NULL,
[TrailerID] [int] NULL
)
INSERT INTO dbo.TempTable
VALUES
('AB53EDH','2013/07/03 10:00', 54,23),
('AB53EDH','2013/07/03 10:01', 54,23),
('AB53EDH','2013/07/03 10:45', 54,23),
('AB53EDH','2013/07/03 10:46', 54,NULL),
('AB53EDH','2013/07/03 10:47', 54,NULL),
('AB53EDH','2013/07/03 11:05', 54,NULL),
('AB53EDH','2013/07/03 11:06', 54,102),
('AB53EDH','2013/07/03 11:07', 54,102),
('AB53EDH','2013/07/03 12:32', 54,102),
('AB53EDH','2013/07/03 12:33', 72,102),
('AB53EDH','2013/07/03 12:34', 72,102),
('AB53EDH','2013/07/03 13:00', 54,102),
('AB53EDH','2013/07/03 13:01', 54,102),
('AB53EDH','2013/07/03 13:02', 54,102),
('AB53EDH','2013/07/03 13:03', 54,102),
('AB53EDH','2013/07/03 13:04', 54,23),
('AB53EDH','2013/07/03 13:05', 54,23),
('AB53EDH','2013/07/03 13:06', 54,23),
('AB53EDH','2013/07/03 13:07', 54,NULL),
('AB53EDH','2013/07/03 13:08', 54,NULL),
('AB53EDH','2013/07/03 13:09', 54,NULL),
('AB53EDH','2013/07/03 13:10', 54,NULL),
('AB53EDH','2013/07/03 13:11', NULL,NULL)
INSERT INTO dbo.TempTable
SELECT Registration, DATEADD(M, -1, Date), DriverID, TrailerID
FROM dbo.TempTable
WHERE Date > '2013/07/01'
INSERT INTO dbo.TempTable
SELECT Registration, DATEADD(M, 1, Date), DriverID, TrailerID
FROM dbo.TempTable
WHERE Date > '2013/07/01'
试试-:
下面是一种使用相关子查询的方法:
with tt as (
select tt.*,
(select top 1 date
from TempTable tt2
where tt2.Registration = tt.Registration and
tt2.DriverID = tt.DriverID and
(tt2.TrailerID = tt.TrailerID or tt2.TrailerID is null and tt.TrailerID is null) and
tt2.Date < tt.Date
order by date desc
) prevDate
from TempTable tt
)
select registration, min(date) as startdate, max(date) as enddate, driverid, trailerid
from (select tt.*,
(select top 1 date
from tt tt3
where prevDate is NULL and
tt3.Date <= tt.date
order by Date desc
) as grp
from TempTable tt
) tt
group by grp, Registration, DriverID, trailerid;
此查询使用CTE执行以下操作: 创建按注册分组的有序记录集合 对于每个记录,捕获上一个记录的数据 比较当前和以前的数据以确定当前记录 是驾驶员/拖车分配的新实例 只获取新记录 对于每个新记录,获取新驾驶员/拖车之前的最后日期 分配发生 链接到 代码如下:
;WITH c AS (
-- Group records by Registration, assign row numbers in order of date
SELECT
ROW_NUMBER() OVER (
PARTITION BY Registration
ORDER BY Registration, [Date])
AS Rn,
Registration,
[Date],
DriverID,
TrailerID
FROM
TempTable
)
,c2 AS (
-- Self join to table to get Driver and Trailer from previous record
SELECT
t1.Rn,
t1.Registration,
t1.[Date],
t1.DriverID,
t1.TrailerID,
t2.DriverID AS PrevDriverID,
t2.TrailerID AS PrevTrailerID
FROM
c t1
LEFT OUTER JOIN
c t2
ON
t1.Registration = t2.Registration
AND
t2.Rn = t1.Rn - 1
)
,c3 AS (
-- Use INTERSECT to determine if this record is new in sequence
SELECT
Rn,
Registration,
[Date],
DriverID,
TrailerID,
CASE WHEN NOT EXISTS (
SELECT DriverID, TrailerID
INTERSECT
SELECT PrevDriverID, PrevTrailerID)
THEN 1
ELSE 0
END AS IsNew
FROM c2
)
-- For all new records in sequence,
-- get the last date logged before a new record appeared
SELECT
Registration,
[Date] AS StartDate,
COALESCE (
(
SELECT TOP 1 [Date]
FROM c3
WHERE Registration = t.Registration
AND Rn < (
SELECT TOP 1 Rn
FROM c3
WHERE Registration = t.Registration
AND Rn > t.Rn
AND IsNew = 1
ORDER BY Rn )
ORDER BY Rn DESC
)
, [Date]) AS EndDate,
DriverID,
TrailerID
FROM
c3 t
WHERE
IsNew = 1
ORDER BY
Registration,
StartDate
我认为预期结果数据中存在错误:AB53EDH,2013/07/03 10:062013/07/03 12:32,54102应该是AB53EDH,2013/07/03 1**1**:062013/07/03 12:32,54102+1。您的问题中有工作代码。这是一个创作灵感。@armen:谢谢-corrected@Amr,使用空驱动程序和空拖车的注册应该如何显示在摘要中?@8kb:它们是一个有效的组合,应该像任何其他组合一样在结果中表示。遗憾的是,此方法仍然存在问题。如果相同的驾驶员和拖车组合稍后出现,则查询将它们视为一个组,而不是识别出这两个不同的组由一个时间段分隔。我已经更新了原始问题,添加了更多数据点来帮助说明这一点。你会注意到54,23和54的组合,对于司机来说是空的,拖车重复了两次,但是你的查询结果并没有反映出这一点。哦,现在这是一个完全不同的故事,一旦我提出了一个与你的问题非常相似的问题,它仍然需要回答:,我很高兴听到这篇文章的任何消息!真可惜!我唯一能想到的另一种方法是按顺序使用游标,但对于大量数据,我发现这可能会非常缓慢,我希望如果有一种基于集合的方法来执行相同的任务,我可能会获得更好的性能。我们将看到进一步的发展。首先,感谢您抽出时间回答我的问题。当我一直在研究这个查询并试图将其应用到我的生产数据时,我意识到不幸的是,它并不是在所有情况下都有效。查询似乎依赖于每个驱动程序/拖车组合前面有另一个组合,即使其prevDate设置为NULL的组合是不同的,也就是说,这是该组合在数据中的第一次出现。不幸的是,在生产数据中,情况并非如此,因为所有组合都可能在同一数据范围内多次出现。@AmrBekhit。我不理解你的评论。此查询根据日期捕获具有相同注册、driverId和trailerId的多个行序列。同一个三元组可以在不同的日期出现多次,并且它们将在数据中相应地出现多次。prevDate的计算决定了每个序列的起始位置。我修改了您的查询以处理我已将查询粘贴到此处的表中的数据子集:。尝试使用问题中更新的代码和新表重新创建诱惑,然后运行查询。您会注意到,对于2013/08/03的组,分组是不正确的。事实上,即使您只是重新创建临时表并运行原始查询,您仍然会看到此问题。
with tt as (
select tt.*,
(select top 1 date
from TempTable tt2
where tt2.Registration = tt.Registration and
tt2.DriverID = tt.DriverID and
(tt2.TrailerID = tt.TrailerID or tt2.TrailerID is null and tt.TrailerID is null) and
tt2.Date < tt.Date
order by date desc
) prevDate
from TempTable tt
)
select registration, min(date) as startdate, max(date) as enddate, driverid, trailerid
from (select tt.*,
(select top 1 date
from tt tt3
where prevDate is NULL and
tt3.Date <= tt.date
order by Date desc
) as grp
from TempTable tt
) tt
group by grp, Registration, DriverID, trailerid;
with tt as (
select tt.*, tt3.date as PrevDate
from (select tt.*,
(select top 1 date
from TempTable tt2
where tt2.date < tt.date
order by date desc
) prevDate1
from TempTable tt
) tt left outer join
TempTable tt3
on tt.prevdate1 = tt3.date and
tt3.Registration = tt.Registration and
tt3.DriverID = tt.DriverID and
(tt3.TrailerID = tt.TrailerID or tt3.TrailerID is null and tt.TrailerID is null)
)
select registration, count(*), min(date) as startdate, max(date) as enddate, driverid, trailerid
from (select tt.*,
(select top 1 date
from tt tt3
where prevDate is NULL and
tt3.Date <= tt.date
order by Date desc
) as grp
from TempTable tt
) tt
group by grp, Registration, DriverID, trailerid;
;WITH c AS (
-- Group records by Registration, assign row numbers in order of date
SELECT
ROW_NUMBER() OVER (
PARTITION BY Registration
ORDER BY Registration, [Date])
AS Rn,
Registration,
[Date],
DriverID,
TrailerID
FROM
TempTable
)
,c2 AS (
-- Self join to table to get Driver and Trailer from previous record
SELECT
t1.Rn,
t1.Registration,
t1.[Date],
t1.DriverID,
t1.TrailerID,
t2.DriverID AS PrevDriverID,
t2.TrailerID AS PrevTrailerID
FROM
c t1
LEFT OUTER JOIN
c t2
ON
t1.Registration = t2.Registration
AND
t2.Rn = t1.Rn - 1
)
,c3 AS (
-- Use INTERSECT to determine if this record is new in sequence
SELECT
Rn,
Registration,
[Date],
DriverID,
TrailerID,
CASE WHEN NOT EXISTS (
SELECT DriverID, TrailerID
INTERSECT
SELECT PrevDriverID, PrevTrailerID)
THEN 1
ELSE 0
END AS IsNew
FROM c2
)
-- For all new records in sequence,
-- get the last date logged before a new record appeared
SELECT
Registration,
[Date] AS StartDate,
COALESCE (
(
SELECT TOP 1 [Date]
FROM c3
WHERE Registration = t.Registration
AND Rn < (
SELECT TOP 1 Rn
FROM c3
WHERE Registration = t.Registration
AND Rn > t.Rn
AND IsNew = 1
ORDER BY Rn )
ORDER BY Rn DESC
)
, [Date]) AS EndDate,
DriverID,
TrailerID
FROM
c3 t
WHERE
IsNew = 1
ORDER BY
Registration,
StartDate