SQL获取行之间经过的天数,并进行分组
环境:SQLAzure-因此一些外来函数有些有限 我有一个记录资产物流事件的表,我想从该表中计算资产在设施中的天数。见下表示例:SQL获取行之间经过的天数,并进行分组,sql,sql-server,azure-sql-database,Sql,Sql Server,Azure Sql Database,环境:SQLAzure-因此一些外来函数有些有限 我有一个记录资产物流事件的表,我想从该表中计算资产在设施中的天数。见下表示例: AssetID LocationID SublocationID MoveDate CAR1 LOC1 SUB1 1/1/2015 01:01:01 CAR1 LOC1 SUB2 1/3/2015 03:03:03 CAR1 LOC1 SUB1 1/4/2015
AssetID LocationID SublocationID MoveDate
CAR1 LOC1 SUB1 1/1/2015 01:01:01
CAR1 LOC1 SUB2 1/3/2015 03:03:03
CAR1 LOC1 SUB1 1/4/2015 04:04:04
CAR1 LOC99 SUB99 1/5/2015 05:05:05
CAR1 LOC1 SUB1 1/9/2015 09:09:09
此表记录从一个位置/子位置移动到另一个位置。我不在乎次定位。我只需要报告资产在每个位置的天数。起初我走的是这条路:
SELECT AssetID,
LocationID,
DATEDIFF(DAY, MIN(MoveDate), MAX(MoveDate))
FROM TABLE
GROUP BY AssetID, LocationID
然而,这很快揭示了一个陷阱,在数据中,您可以看到资产从LOC1移动到LOC2,然后再返回LOC1。我的查询将计算2015年1月1日至2015年1月9日期间LOC1的所有天数,而实际上它在LOC99处花费了1/5至1/9之间的时间
有没有一种纯SQL的方法来实现这一点 使用
快进
光标,按日期顺序在表上迭代,并在临时表中构建结果集
这可以通过
LEAD
或LAG
来完成,但Azure中没有。毫无疑问,某种非游标T-SQL解决方案是可能的,但我怀疑它的性能会比游标更好。具有“快进”功能的光标通常比包含相关子查询的查询性能更好。它应该如下所示:
SELECT [details].[AssetId],
[details].[LocationId],
DATEDIFF(DAY, MIN([details].[MovedInDate]), [details].[MoveOutDate]) AS DaysIn
FROM (
SELECT DISTINCT movedInRow.[AssetId], [movedInRow].[LocationId], [movedInRow].[MoveDate] AS MovedInDate, ISNULL(nm.[MoveDate], GETDATE()) AS MoveOutDate
FROM [dbo].[t1] movedInRow
OUTER APPLY (
SELECT TOP 1 [MoveDate]
FROM [dbo].[t1]
WHERE
[AssetId] = movedInRow.[AssetId]
AND [LocationId] != movedInRow.[LocationId]
AND [MoveDate] >= [movedInRow].[MoveDate]
ORDER BY [MoveDate] DESC
) nm
) AS details
GROUP BY
[details].[AssetId],
[details].[LocationId],
[details].[MoveOutDate];
无论出于何种原因,两个位置的MoveDate可能是相同的,本例不检查这种可能性
AssetId LocationId DaysIn
CAR1 LOC1 4
CAR1 LOC1 13
CAR1 LOC99 4
不使用窗口函数(如LEAD或LAG)和任何t-sql编码,您可以使用递归CTE使其工作:
/*Create table and sample data*/
create table #mov (
AssetID varchar(10),
LocationID varchar(10),
SublocationID varchar(10),
MoveDate datetime
)
insert into #mov
select 'CAR1', 'LOC1', 'SUB1', '1/1/2015 01:01:01' union all
select 'CAR1', 'LOC1', 'SUB2', '1/3/2015 03:03:03' union all
select 'CAR1', 'LOC1', 'SUB1', '1/4/2015 04:04:04' union all
select 'CAR1', 'LOC99', 'SUB99', '1/5/2015 05:05:05' union all
select 'CAR1', 'LOC1', 'SUB1' , '1/9/2015 09:09:09' union all
select 'CAR2', 'LOC1', 'SUB1', '1/1/2015 01:01:01' union all
select 'CAR2', 'LOC1', 'SUB2', '1/3/2015 03:03:03' union all
select 'CAR2', 'LOC1', 'SUB1', '1/4/2015 04:04:04' union all
select 'CAR2', 'LOC99', 'SUB99', '1/5/2015 05:05:05' union all
select 'CAR2', 'LOC1', 'SUB1' , '1/9/2015 09:09:09'
/*Create CTEs*/
/*1. cteMov - adds the row number to the dataset*/
;with cteMov as (
select AssetID, LocationID, MoveDate, row_number() over(partition by AssetID order by MoveDate) as rn
from #mov
),
/*recursive CTE to get records groups*/
rec as (
select AssetID, LocationID, MoveDate, rn, 1 as rnk
from cteMov
where rn = 1
union all
select c.AssetID, c.LocationID, c.MoveDate, c.rn, case when c.LocationID = rec.LocationID then rec.rnk else rec.rnk + 1 end as rnk
from cteMov as c
join rec on c.AssetID = rec.AssetID and c.rn = rec.rn + 1
)
/*3. Final query*/
select
rec1.AssetID, rec1.LocationID,
datediff(dd, min(rec1.MoveDate), isnull(max(rec2.MoveDate), getdate())) as DaysSpent,
rec1.rnk
from rec as rec1
left join rec as rec2 on rec1.rnk = rec2.rnk - 1
group by rec1.AssetID, rec1.LocationID, rec1.rnk
order by rec1.AssetID, rec1.rnk
option(MAXRECURSION 0)
/*drop temp table */
drop table #mov
结果是:
AssetID LocationID DaysSpent rnk
---------- ---------- ----------- -----------
CAR1 LOC1 4 1
CAR1 LOC99 4 2
CAR1 LOC1 13 3
CAR2 LOC1 4 1
CAR2 LOC99 4 2
CAR2 LOC1 13 3
使用先前响应中的样本数据:
create table t1 (
AssetID varchar(10),
LocationID varchar(10),
SublocationID varchar(10),
MoveDate datetime
);
insert into t1
select 'CAR1', 'LOC1', 'SUB1', '1/1/2015 01:01:01' union all
select 'CAR1', 'LOC1', 'SUB2', '1/3/2015 03:03:03' union all
select 'CAR1', 'LOC1', 'SUB1', '1/4/2015 04:04:04' union all
select 'CAR1', 'LOC99', 'SUB99', '1/5/2015 05:05:05' union all
select 'CAR1', 'LOC1', 'SUB1' , '1/9/2015 09:09:09' union all
select 'CAR2', 'LOC1', 'SUB1', '1/1/2015 01:01:01' union all
select 'CAR2', 'LOC1', 'SUB2', '1/3/2015 03:03:03' union all
select 'CAR2', 'LOC1', 'SUB1', '1/4/2015 04:04:04' union all
select 'CAR2', 'LOC99', 'SUB99', '1/5/2015 05:05:05' union all
select 'CAR2', 'LOC1', 'SUB1' , '1/9/2015 09:09:09';
select * from t1;
╔═════════╦════════════╦═══════════════╦════════════════════════════════╗
║ ASSETID ║ LOCATIONID ║ SUBLOCATIONID ║ MOVEDATE ║
╠═════════╬════════════╬═══════════════╬════════════════════════════════╣
║ CAR1 ║ LOC1 ║ SUB1 ║ January, 01 2015 01:01:01+0000 ║
║ CAR2 ║ LOC1 ║ SUB1 ║ January, 01 2015 01:01:01+0000 ║
║ CAR2 ║ LOC1 ║ SUB2 ║ January, 03 2015 03:03:03+0000 ║
║ CAR1 ║ LOC1 ║ SUB2 ║ January, 03 2015 03:03:03+0000 ║
║ CAR1 ║ LOC1 ║ SUB1 ║ January, 04 2015 04:04:04+0000 ║
║ CAR2 ║ LOC1 ║ SUB1 ║ January, 04 2015 04:04:04+0000 ║
║ CAR2 ║ LOC99 ║ SUB99 ║ January, 05 2015 05:05:05+0000 ║
║ CAR1 ║ LOC99 ║ SUB99 ║ January, 05 2015 05:05:05+0000 ║
║ CAR1 ║ LOC1 ║ SUB1 ║ January, 09 2015 09:09:09+0000 ║
║ CAR2 ║ LOC1 ║ SUB1 ║ January, 09 2015 09:09:09+0000 ║
╚═════════╩════════════╩═══════════════╩════════════════════════════════╝
如果支持lead()分析函数,则更好的解决方案(简单性和性能方面)是:
select AssetID, LocationID,
sum(datediff(dd,MoveDate,isnull(nextMoveDate,getDate()))) daysAtLoc
from (
select AssetID, LocationID, MoveDate,
lead(MoveDate) over (partition by AssetID
order by MoveDate) nextMoveDate
from t1
) t2
group by AssetID, LocationID
order by AssetID, LocationID;
╔═════════╦════════════╦═══════════╗
║ ASSETID ║ LOCATIONID ║ DAYSATLOC ║
╠═════════╬════════════╬═══════════╣
║ CAR1 ║ LOC1 ║ 18 ║
║ CAR1 ║ LOC99 ║ 4 ║
║ CAR2 ║ LOC1 ║ 18 ║
║ CAR2 ║ LOC99 ║ 4 ║
╚═════════╩════════════╩═══════════╝
纯SQL解决方案:无分析、无递归CTE、无外部应用/相关子查询;只是简单的连接。我从未使用过Azure SQL,但如果它不支持这一点(仍然称自己为SQL),我会非常惊讶
当然,您不需要使用dtu范围的表格,我这样做只是为了同时测试各种条件。我确实更喜欢使用[currentstart,nextstart]的日期范围,因为编写SQL变得更容易,而不会重叠,例如月报
create table dt_range
(thisStartDate date,
nextStartDate date,
primary key (thisStartDate,nextStartDate));
insert into dt_range
select '01-dec-2014','01-jan-2015' union
select '01-jan-2015','01-feb-2015' union
select '02-jan-2015','09-jan-2015' union
select '01-feb-2015','01-mar-2015' ;
╔═══════════════╦═══════════════╗
║ THISSTARTDATE ║ NEXTSTARTDATE ║
╠═══════════════╬═══════════════╣
║ 2014-12-01 ║ 2015-01-01 ║
║ 2015-01-01 ║ 2015-02-01 ║
║ 2015-01-02 ║ 2015-01-09 ║
║ 2015-02-01 ║ 2015-03-01 ║
╚═══════════════╩═══════════════╝
以及查询:
select thisStartDate, nextStartDate, t.AssetID, ArrivalLocation,
round(sum(datediff(ss,ArrivalTime, DepartureTime))/(24.0*60*60),1) DaysAtLoc
from (
select thisStartDate, nextStartDate, t.AssetID, ArrivalLocation, ArrivalTime,
coalesce(min(MoveDate),nextStartDate) DepartureTime
from (
select assetsInRange.thisStartDate, assetsInRange.nextStartDate, assetsInRange.assetID,
coalesce(ArrivalLocation,InitialLocation) ArrivalLocation,
coalesce(ArrivalTime,assetsInRange.thisStartDate) ArrivalTime
from
(
select thisStartDate, nextStartDate, assetID
from dt_range
join t1 on MoveDate < nextStartDate
group by thisStartDate, nextStartDate, assetID
) assetsInRange
left outer join
(
select thisStartDate, nextStartDate, assetID,
max(MoveDate) precedingDtRangeMoveDt
from dt_range
join t1
on MoveDate < thisStartDate
group by thisStartDate, nextStartDate, assetID
)
precedingMoveDt
on (assetsInRange.assetID = precedingMoveDt.assetID)
left outer join
(
select AssetID, MoveDate precedingDtRangeMoveDt, LocationID initialLocation
from t1
)
precedingMoveLoc
on (precedingMoveDt.assetID = precedingMoveLoc.AssetID
and precedingMoveDt.precedingDtRangeMoveDt = precedingMoveLoc.precedingDtRangeMoveDt)
left outer join
(
select AssetId, LocationId ArrivalLocation, MoveDate ArrivalTime
from t1
)
arrivals
on assetsInRange.AssetID = arrivals.AssetId
and ArrivalTime >= assetsInRange.thisStartDate
and ArrivalTime < assetsInRange.nextStartDate
group by assetsInRange.thisStartDate, assetsInRange.nextStartDate, assetsInRange.AssetId,
coalesce(ArrivalLocation,InitialLocation) ,
coalesce(ArrivalTime,assetsInRange.thisStartDate)
) t
left join t1 on t.assetID = t1.assetID
and t1.MoveDate > ArrivalTime
and t1.MoveDate < nextStartDate
group by thisStartDate, nextStartDate, t.AssetID, ArrivalLocation, ArrivalTime
) t
group by thisStartDate, nextStartDate, t.AssetID, ArrivalLocation
order by 1, 3;
注意-我假设资产的第一条记录表明它以前在任何位置都不存在……因此2014年12月-2015年1月的测试月不会显示在结果中,因为没有具有2014年移动日期的资产。您可以使用滞后窗口功能来执行此操作,但我不知道Azure数据库是否支持它。否则,您可能不得不这样做使用CTE或进行一些子查询,非常感谢您的帮助!不幸的是,SQL Azure没有超前或滞后功能。简单的连接就可以了!!!我的下一个挑战是如何添加日期范围回答以下问题:对于2015年6月1日至2015年8月1日的日期范围,资产在每个位置停留了多少天。我可以必须跳转到存储过程才能找到答案。是的。假设MoveDate列包含项目移入的日期,那么它第一次在LOC1位置停留了4天,然后在LOC 99停留了4天,直到今天(2015年1月22日)它仍然停留在LOC1位置这是13天。我喜欢纯SQL解决方案。非常感谢!有没有办法修改它以拥有一个日期范围?因此,在两个日期之间,资产驻留在哪里,即使在该范围内没有移动事件…我想我将不得不为此执行存储过程…再次感谢!是的-我将附加到主答案以便于格式化。
create table dt_range
(thisStartDate date,
nextStartDate date,
primary key (thisStartDate,nextStartDate));
insert into dt_range
select '01-dec-2014','01-jan-2015' union
select '01-jan-2015','01-feb-2015' union
select '02-jan-2015','09-jan-2015' union
select '01-feb-2015','01-mar-2015' ;
╔═══════════════╦═══════════════╗
║ THISSTARTDATE ║ NEXTSTARTDATE ║
╠═══════════════╬═══════════════╣
║ 2014-12-01 ║ 2015-01-01 ║
║ 2015-01-01 ║ 2015-02-01 ║
║ 2015-01-02 ║ 2015-01-09 ║
║ 2015-02-01 ║ 2015-03-01 ║
╚═══════════════╩═══════════════╝
select thisStartDate, nextStartDate, t.AssetID, ArrivalLocation,
round(sum(datediff(ss,ArrivalTime, DepartureTime))/(24.0*60*60),1) DaysAtLoc
from (
select thisStartDate, nextStartDate, t.AssetID, ArrivalLocation, ArrivalTime,
coalesce(min(MoveDate),nextStartDate) DepartureTime
from (
select assetsInRange.thisStartDate, assetsInRange.nextStartDate, assetsInRange.assetID,
coalesce(ArrivalLocation,InitialLocation) ArrivalLocation,
coalesce(ArrivalTime,assetsInRange.thisStartDate) ArrivalTime
from
(
select thisStartDate, nextStartDate, assetID
from dt_range
join t1 on MoveDate < nextStartDate
group by thisStartDate, nextStartDate, assetID
) assetsInRange
left outer join
(
select thisStartDate, nextStartDate, assetID,
max(MoveDate) precedingDtRangeMoveDt
from dt_range
join t1
on MoveDate < thisStartDate
group by thisStartDate, nextStartDate, assetID
)
precedingMoveDt
on (assetsInRange.assetID = precedingMoveDt.assetID)
left outer join
(
select AssetID, MoveDate precedingDtRangeMoveDt, LocationID initialLocation
from t1
)
precedingMoveLoc
on (precedingMoveDt.assetID = precedingMoveLoc.AssetID
and precedingMoveDt.precedingDtRangeMoveDt = precedingMoveLoc.precedingDtRangeMoveDt)
left outer join
(
select AssetId, LocationId ArrivalLocation, MoveDate ArrivalTime
from t1
)
arrivals
on assetsInRange.AssetID = arrivals.AssetId
and ArrivalTime >= assetsInRange.thisStartDate
and ArrivalTime < assetsInRange.nextStartDate
group by assetsInRange.thisStartDate, assetsInRange.nextStartDate, assetsInRange.AssetId,
coalesce(ArrivalLocation,InitialLocation) ,
coalesce(ArrivalTime,assetsInRange.thisStartDate)
) t
left join t1 on t.assetID = t1.assetID
and t1.MoveDate > ArrivalTime
and t1.MoveDate < nextStartDate
group by thisStartDate, nextStartDate, t.AssetID, ArrivalLocation, ArrivalTime
) t
group by thisStartDate, nextStartDate, t.AssetID, ArrivalLocation
order by 1, 3;
╔═══════════════╦═══════════════╦═════════╦═════════════════╦═══════════╗
║ THISSTARTDATE ║ NEXTSTARTDATE ║ ASSETID ║ ARRIVALLOCATION ║ DAYSATLOC ║
╠═══════════════╬═══════════════╬═════════╬═════════════════╬═══════════╣
║ 2015-01-01 ║ 2015-02-01 ║ CAR1 ║ LOC1 ║ 26.8 ║
║ 2015-01-01 ║ 2015-02-01 ║ CAR1 ║ LOC99 ║ 4.2 ║
║ 2015-01-01 ║ 2015-02-01 ║ CAR2 ║ LOC1 ║ 24.7 ║
║ 2015-01-01 ║ 2015-02-01 ║ CAR2 ║ LOC99 ║ 4.2 ║
║ 2015-01-01 ║ 2015-02-01 ║ CAR3 ║ LOC2 ║ 16.4 ║
║ 2015-01-02 ║ 2015-01-09 ║ CAR1 ║ LOC1 ║ 2.1 ║
║ 2015-01-02 ║ 2015-01-09 ║ CAR1 ║ LOC99 ║ 3.8 ║
║ 2015-01-02 ║ 2015-01-09 ║ CAR2 ║ LOC1 ║ 2.1 ║
║ 2015-01-02 ║ 2015-01-09 ║ CAR2 ║ LOC99 ║ 3.8 ║
║ 2015-02-01 ║ 2015-03-01 ║ CAR1 ║ LOC1 ║ 28 ║
║ 2015-02-01 ║ 2015-03-01 ║ CAR2 ║ LOC1 ║ 28 ║
║ 2015-02-01 ║ 2015-03-01 ║ CAR3 ║ LOC2 ║ 28 ║
╚═══════════════╩═══════════════╩═════════╩═════════════════╩═══════════╝