Sql 将事务配对到日期范围行
我有一个如下结构的表,它显示了员工作为特定角色从帐户中添加(operation=I)或删除(operation=D)的时间Sql 将事务配对到日期范围行,sql,sql-server,tsql,sql-server-2014,Sql,Sql Server,Tsql,Sql Server 2014,我有一个如下结构的表,它显示了员工作为特定角色从帐户中添加(operation=I)或删除(operation=D)的时间 Account | Employee | Role | Operation | OperationTimestamp ABC | 1 | Rep | I | 1/1/2018 DEF | 1 | Mgr | I | 1/1/2018 ABC | 1 | Rep | D
Account | Employee | Role | Operation | OperationTimestamp
ABC | 1 | Rep | I | 1/1/2018
DEF | 1 | Mgr | I | 1/1/2018
ABC | 1 | Rep | D | 3/31/2018
ABC | 1 | Rep | I | 7/1/2018
ABC | 1 | Rep | D | 12/31/2018
ABC | 2 | Mgr | I | 1/1/2018
DEF | 2 | Exc | I | 1/1/2018
ABC | 2 | Mgr | D | 3/31/2018
ABC | 2 | Mgr | I | 6/1/2018
ABC | 2 | Mgr | D | 10/31/2018
(I=插入,D=删除)
我需要开发一个查询,返回该员工在该帐户上的帐户、员工、角色和日期范围,如下所示:
Account | Employee | Role | StartingDate | EndingDate
ABC | 1 | Rep | 1/1/2018 | 3/31/2018
DEF | 1 | Mgr | 1/1/2018 | NULL
ABC | 1 | Rep | 7/1/2018 | 12/31/2018
ABC | 2 | Mgr | 1/1/2018 | 3/31/2018
DEF | 2 | Exc | 1/1/2018 | NULL
ABC | 2 | Mgr | 6/1/2018 | 10/31/2018
因此,从结果集中可以看到,如果员工已添加到帐户,但尚未删除,则EndingDate应为NULL
我最担心的是,您可以多次和/或以多个角色从一个帐户中添加/删除同一名员工。我的直觉告诉我,我需要按帐户>员工>角色>日期对交易进行排序,并以某种方式将每两行分组在一起(因为它应该始终是一个I操作,后面是一个D操作),但我不确定如何处理“缺失”事务如果他们仍然在某个帐户上,则删除。假设:对于相同的组合(帐户、员工、角色),一个
I
操作之后永远不会有另一个I
;如果有下一行(可能不是该组合),则它始终是D
数据:
如果上述为真,那么我将使用以下查询:
with
x as (
select
account, employee, role, operationtimestamp, operation,
lead(operation)
over(partition by account, employee, role
order by account, employee, role, operationtimestamp)
as next_op,
lead(operationtimestamp)
over(partition by account, employee, role
order by account, employee, role, operationtimestamp)
as next_ts
from my_table
),
y as(
select
account, employee, role,
operationtimestamp as startingdate,
next_ts as endingdate
from x
where operation = 'I'
)
select *
from y
order by employee, startingdate
结果:
account employee role startingdate endingdate
------- -------- ---- --------------------- ---------------------
ABC 1 Rep 2018-01-01 00:00:00.0 2018-03-31 00:00:00.0
DEF 1 Mgr 2018-01-01 00:00:00.0 <null>
ABC 1 Rep 2018-07-01 00:00:00.0 2018-12-31 00:00:00.0
ABC 2 Mgr 2018-01-01 00:00:00.0 2018-03-31 00:00:00.0
DEF 2 Exc 2018-01-01 00:00:00.0 <null>
ABC 2 Mgr 2018-06-01 00:00:00.0 2018-10-31 00:00:00.0
账户员工角色开始日期结束日期
------- -------- ---- --------------------- ---------------------
ABC 1代表2018-01-01 00:00:00.0 2018-03-31 00:00:00.0
DEF 1经理2018-01-01 00:00:00.0
ABC 1代表2018-07-01 00:00:00.0 2018-12-31 00:00:00.0
ABC 2经理2018-01-01 00:00:00.0 2018-03-31 00:00:00.0
DEF 2 Exc 2018-01-01 00:00:00.0
ABC 2经理2018-06-01 00:00:00.0 2018-10-31 00:00:00.0
假设:对于同一组合(账户、员工、角色),一个I
操作后不会再出现另一个I
;如果有下一行(可能不是该组合),则它始终是D
数据:
如果上述为真,那么我将使用以下查询:
with
x as (
select
account, employee, role, operationtimestamp, operation,
lead(operation)
over(partition by account, employee, role
order by account, employee, role, operationtimestamp)
as next_op,
lead(operationtimestamp)
over(partition by account, employee, role
order by account, employee, role, operationtimestamp)
as next_ts
from my_table
),
y as(
select
account, employee, role,
operationtimestamp as startingdate,
next_ts as endingdate
from x
where operation = 'I'
)
select *
from y
order by employee, startingdate
结果:
account employee role startingdate endingdate
------- -------- ---- --------------------- ---------------------
ABC 1 Rep 2018-01-01 00:00:00.0 2018-03-31 00:00:00.0
DEF 1 Mgr 2018-01-01 00:00:00.0 <null>
ABC 1 Rep 2018-07-01 00:00:00.0 2018-12-31 00:00:00.0
ABC 2 Mgr 2018-01-01 00:00:00.0 2018-03-31 00:00:00.0
DEF 2 Exc 2018-01-01 00:00:00.0 <null>
ABC 2 Mgr 2018-06-01 00:00:00.0 2018-10-31 00:00:00.0
账户员工角色开始日期结束日期
------- -------- ---- --------------------- ---------------------
ABC 1代表2018-01-01 00:00:00.0 2018-03-31 00:00:00.0
DEF 1经理2018-01-01 00:00:00.0
ABC 1代表2018-07-01 00:00:00.0 2018-12-31 00:00:00.0
ABC 2经理2018-01-01 00:00:00.0 2018-03-31 00:00:00.0
DEF 2 Exc 2018-01-01 00:00:00.0
ABC 2经理2018-06-01 00:00:00.0 2018-10-31 00:00:00.0
带有行号和自连接
这非常简单:
declare @t table(Account varchar(3), Employee int, EmpRole varchar(3), Operation varchar(1), OperationTimestamp datetime);
insert into @t values
('ABC',1,'Rep','I','20180101')
,('DEF',1,'Mgr','I','20180101')
,('ABC',1,'Rep','D','20180331')
,('ABC',1,'Rep','I','20180701')
,('ABC',1,'Rep','D','20181231')
,('ABC',2,'Mgr','I','20180101')
,('DEF',2,'Exc','I','20180101')
,('ABC',2,'Mgr','D','20180331')
,('ABC',2,'Mgr','I','20180601')
,('ABC',2,'Mgr','D','20181031');
with d as
(
select Account
,Employee
,EmpRole
,Operation
,OperationTimestamp
,row_number() over (partition by Account, Employee, EmpRole order by OperationTimestamp) as ord
from @t
)
select s.Account
,s.Employee
,s.EmpRole
,s.OperationTimestamp as OperationTimestampStart
,e.OperationTimestamp as OperationTimestampEnd
from d as s
left join d as e
on s.Account = e.Account
and s.Employee = e.Employee
and s.EmpRole = e.EmpRole
and s.ord = e.ord-1
where s.Operation = 'I';
输出
使用行号
和self连接
非常简单:
declare @t table(Account varchar(3), Employee int, EmpRole varchar(3), Operation varchar(1), OperationTimestamp datetime);
insert into @t values
('ABC',1,'Rep','I','20180101')
,('DEF',1,'Mgr','I','20180101')
,('ABC',1,'Rep','D','20180331')
,('ABC',1,'Rep','I','20180701')
,('ABC',1,'Rep','D','20181231')
,('ABC',2,'Mgr','I','20180101')
,('DEF',2,'Exc','I','20180101')
,('ABC',2,'Mgr','D','20180331')
,('ABC',2,'Mgr','I','20180601')
,('ABC',2,'Mgr','D','20181031');
with d as
(
select Account
,Employee
,EmpRole
,Operation
,OperationTimestamp
,row_number() over (partition by Account, Employee, EmpRole order by OperationTimestamp) as ord
from @t
)
select s.Account
,s.Employee
,s.EmpRole
,s.OperationTimestamp as OperationTimestampStart
,e.OperationTimestamp as OperationTimestampEnd
from d as s
left join d as e
on s.Account = e.Account
and s.Employee = e.Employee
and s.EmpRole = e.EmpRole
and s.ord = e.ord-1
where s.Operation = 'I';
输出
我想您只需要lead()
或累计min()
。我的意思是:
select account, employee, role, OperationTimestamp, EndingDate
from (select t.*,
min(case when operation = 'D' then OperationTimestamp end) over
(partition by account, employee, role
order by OperationTimestamp desc
) as EndingDate
from t
) t
where operation = 'I';
我想您只需要lead()
或累计min()
。我的意思是:
select account, employee, role, OperationTimestamp, EndingDate
from (select t.*,
min(case when operation = 'D' then OperationTimestamp end) over
(partition by account, employee, role
order by OperationTimestamp desc
) as EndingDate
from t
) t
where operation = 'I';
对于一个给定的角色,可以有多个连续的I,或者I后面总是紧跟着D?Gordon现在删除的答案是正确的,这确实是一个缺口和孤岛问题。你们都可以用它来测试你们的查询。@TimBiegeleisen,除非他们想要比问题中所述的更多的东西,我认为这比缺口和孤岛问题要简单得多。只需为每个账户|员工|角色
组合查找紧跟在I
记录之后的单个D
记录。是的,但您如何做到这一点呢?嗯……我想说的是,对于同一账户/emp/角色,假设D紧跟I是安全的,但我现在发现了一些奇怪的地方,那里有D,没有原始的I。我相信这是在数据库迁移过程中发生的(当员工最初加入帐户时没有I,因为他们作为迁移的一部分被附加到帐户),但当他们第一次离开时,会出现一个D事务。因此,我可能要处理一个更复杂的场景。不激动……对于一个给定的角色,可以有多个连续的I,或者I后面总是紧跟着D?戈登现在删除的答案是正确的,这确实是差距和孤岛问题。你们都可以用它来测试你们的查询。@TimBiegeleisen,除非他们想要比问题中所述的更多的东西,我认为这比缺口和孤岛问题要简单得多。只需为每个账户|员工|角色
组合查找紧跟在I
记录之后的单个D
记录。是的,但您如何做到这一点呢?嗯……我想说的是,对于同一账户/emp/角色,假设D紧跟I是安全的,但我现在发现了一些奇怪的地方,那里有D,没有原始的I。我相信这是在数据库迁移过程中发生的(当员工最初加入帐户时没有I,因为他们作为迁移的一部分被附加到帐户),但当他们第一次离开时,会出现一个D事务。因此,我可能要处理一个更复杂的场景。不激动……你不需要你的y
cte
在那里。只需在第二个语句的末尾按
排序即可。我没有一个很好的测试环境,所以你知道LEAD()与@iamdave的答案中的左连接相比是如何执行的吗?我倾向于认为,LEAD()
是通过“排序”操作执行的,而左连接是通过NLS(或哈希连接,或合并连接)执行的。我认为分拣操作应该更快。要真正找到答案,您需要检索两个查询的执行计划并对它们进行比较。SQL优化器有时是正确的,而其他时候则不是那么好。无论如何,比较一下这两个计划的成本。显然,即使我们使用的是SQL2014,兼容性级别已经设置好了