Sql server SQL Server中的重叠日期时间更正
我已经有一段时间没有编写任何SQL了,我正在努力完成数据清理脚本的最后阶段。我现有脚本的一个示例输出是Sql server SQL Server中的重叠日期时间更正,sql-server,sql-server-2008,common-table-expression,Sql Server,Sql Server 2008,Common Table Expression,我已经有一段时间没有编写任何SQL了,我正在努力完成数据清理脚本的最后阶段。我现有脚本的一个示例输出是 MRN ID ADTM SDTM WardDays WardMins 45 45_1 2016-03-24 06:28:00.000 2016-03-24 18:15:00.000 0 707 45 45_2 2016-03-24 11:07:00.000 2016-03-
MRN ID ADTM SDTM WardDays WardMins
45 45_1 2016-03-24 06:28:00.000 2016-03-24 18:15:00.000 0 707
45 45_2 2016-03-24 11:07:00.000 2016-03-24 18:15:00.000 0 428
MRN ID ADTM SDTM TDays Tminutes
381 381_1 2016-01-30 00:25:00.000 2016-01-31 16:53:00.000 0 1415
381 381_1 2016-01-31 00:00:00.000 2016-01-31 16:53:00.000 0 1013
381 381_2 2016-01-31 11:30:00.000 2016-01-31 16:53:00.000 0 323
381 381_3 2016-01-31 16:53:00.000 2016-02-01 17:50:00.000 0 427
381 381_3 2016-02-01 00:00:00.000 2016-02-01 17:50:00.000 0 1070
问题在于同一[non-unique][ID]字段的日期重叠。对于第一种情况,我想用斜体字更正的输出是:
至于第二组记录:
MRN ID ADTM SDTM TDays Tminutes
381 381_1 2016-01-30 00:25:00.000 _2016-01-31 00:00:00.000_ 0 1415
381 381_1 2016-01-31 00:00:00.000 _2016-01-31 11:30:00.000_ 0 690
381 381_2 2016-01-31 11:30:00.000 2016-01-31 16:53:00.000 0 323
381 381_3 2016-01-31 16:53:00.000 _2016-02-01 00:00:00.000_ 0 427
381 381_3 2016-02-01 00:00:00.000 2016-02-01 17:50:00.000 0 1070
所以您可以看到,我不希望任何两条记录的结束日期时间[SDTM]与下一条记录的开始日期时间[ADTM]重叠。我认为这将分两个阶段进行:
根据上述数据集概述的逻辑更新日期
更新每个记录的TDays和TMinutes
要设置数据集,请使用:
CREATE TABLE T (
MRN int, ID varchar(5), ADTM varchar(23), SDTM varchar(23), TDays int, TMinutes int);
INSERT INTO T
(MRN, ID, ADTM, SDTM, TDays, TMinutes)
VALUES
(45, '45_1', '2016-03-24 06:28:00.000', '2016-03-24 18:15:00.000', 0, 707),
(45, '45_2', '2016-03-24 11:07:00.000', '2016-03-24 18:15:00.000', 0, 428),
(381, '381_1', '2016-01-30 00:25:00.000', '2016-01-31 16:53:00.000', 0, 1415),
(381, '381_1', '2016-01-31 00:00:00.000', '2016-01-31 16:53:00.000', 0, 1013),
(381, '381_3', '2016-01-31 16:53:00.000', '2016-02-01 17:50:00.000', 0, 427),
(381, '381_3', '2016-02-01 00:00:00.000', '2016-02-01 17:50:00.000', 0, 1070),
(381, '381_2', '2016-01-31 11:30:00.000', '2016-01-31 16:53:00.000', 0, 323);
第一部分。我一直在玩弄一个CTE查询,但这只是合并重叠的记录。我需要查询前面的记录,以检查所需的条件,我很快就迷路了
; WITH StartD AS
(
SELECT ID, ADTM, ROW_NUMBER()
OVER(PARTITION BY ID ORDER BY ADTM) AS Rn
FROM
WD AS t
WHERE
NOT EXISTS
(
SELECT *
FROM WD AS p
WHERE p.ID = t.ID
AND p.ADTM < t.ADTM
AND t.ADTM <= DATEADD(day, 1, p.SDTM)
)
) , EndD AS
(
SELECT ID, SDTM, ROW_NUMBER()
OVER(PARTITION BY ID ORDER BY SDTM) AS Rn
FROM
WD AS t
WHERE
NOT EXISTS
(
SELECT *
FROM WD AS p
WHERE p.ID = t.ID
AND DATEADD(day, -1, p.ADTM) <= t.SDTM
AND t.SDTM < p.SDTM
)
) SELECT s.ID, s.ADTM, e.SDTM
FROM StartD AS s JOIN EndD AS e
ON e.ID = s.ID AND e.Rn = s.Rn;
新表格如下:
CREATE TABLE T (
MRN int, ID varchar(5), ADTM varchar(23), SDTM varchar(23), TDays int, TMinutes int);
INSERT INTO T
(MRN, ID, ADTM, SDTM, TDays, TMinutes)
VALUES
(45, '45_1', '2016-03-24 06:28:00.000', '2016-03-24 18:15:00.000', 0, 707),
(45, '45_2', '2016-03-24 11:07:00.000', '2016-03-24 18:15:00.000', 0, 428),
(381, '381_1', '2016-01-30 00:25:00.000', '2016-01-31 00:00:00.000', 0, 1415),
(381, '381_2', '2016-01-31 11:30:00.000', '2016-02-01 00:00:00.000', 0, 323),
(381, '381_3', '2016-01-31 16:53:00.000', '2016-02-01 00:00:00.000', 0, 427);
这将使您在sql 2008中得到所需的内容
SELECT t1.ID,
t1.ADTM,
COALESCE(t2.ADTM,t1.SDTM) SDTM,
DATEDIFF(MINUTE,t1.ADTM,COALESCE(t2.ADTM,t1.SDTM)) Tminutes
FROM T t1
OUTER APPLY (SELECT TOP 1
*
FROM T t2
WHERE t2.MRN = t1.MRN
AND t2.ADTM > t1.ADTM
AND t2.ADTM <> t1.SDTM
ORDER BY adtm
) t2
ORDER BY t1.ID
这将使您在sql 2008中得到所需的内容
SELECT t1.ID,
t1.ADTM,
COALESCE(t2.ADTM,t1.SDTM) SDTM,
DATEDIFF(MINUTE,t1.ADTM,COALESCE(t2.ADTM,t1.SDTM)) Tminutes
FROM T t1
OUTER APPLY (SELECT TOP 1
*
FROM T t2
WHERE t2.MRN = t1.MRN
AND t2.ADTM > t1.ADTM
AND t2.ADTM <> t1.SDTM
ORDER BY adtm
) t2
ORDER BY t1.ID
这似乎是正确的开始方式:
declare @T table ( MRN int, ID varchar(5), ADTM varchar(23), SDTM varchar(23),
TDays int, TMinutes int);
INSERT INTO @T (MRN, ID, ADTM, SDTM, TDays, TMinutes) VALUES
(45, '45_1', '2016-03-24 06:28:00.000', '2016-03-24 18:15:00.000', 0, 707),
(45, '45_2', '2016-03-24 11:07:00.000', '2016-03-24 18:15:00.000', 0, 428),
(381, '381_1', '2016-01-30 00:25:00.000', '2016-01-31 16:53:00.000', 0, 1415),
(381, '381_1', '2016-01-31 00:00:00.000', '2016-01-31 16:53:00.000', 0, 1013),
(381, '381_3', '2016-01-31 16:53:00.000', '2016-02-01 17:50:00.000', 0, 427),
(381, '381_3', '2016-02-01 00:00:00.000', '2016-02-01 17:50:00.000', 0, 1070),
(381, '381_2', '2016-01-31 11:30:00.000', '2016-01-31 16:53:00.000', 0, 323);
;With Ordered as (
select
*,
ROW_NUMBER() OVER (PARTITION BY MRN order by ADTM) as rn
from
@T
), Ends as (
select
o1.MRN,
o1.ID,
o1.ADTM,
CASE WHEN o2.ADTM < o1.SDTM THEN o2.ADTM ELSE o1.SDTM END as SDTM
from
Ordered o1
left join
Ordered o2
on
o1.MRN = o2.MRN and
o1.rn= o2.rn - 1
)
select
*,
DATEDIFF(minute,ADTM,SDTM) as TMinutes
from Ends
除非您的样本数据不完整或我遗漏了什么,否则我们总是在按ADTM排序后将每一行与下一行匹配,然后获取当前SDTM或下一行ADTM,以案例中较早的为准。这似乎是正确的开始方式:
declare @T table ( MRN int, ID varchar(5), ADTM varchar(23), SDTM varchar(23),
TDays int, TMinutes int);
INSERT INTO @T (MRN, ID, ADTM, SDTM, TDays, TMinutes) VALUES
(45, '45_1', '2016-03-24 06:28:00.000', '2016-03-24 18:15:00.000', 0, 707),
(45, '45_2', '2016-03-24 11:07:00.000', '2016-03-24 18:15:00.000', 0, 428),
(381, '381_1', '2016-01-30 00:25:00.000', '2016-01-31 16:53:00.000', 0, 1415),
(381, '381_1', '2016-01-31 00:00:00.000', '2016-01-31 16:53:00.000', 0, 1013),
(381, '381_3', '2016-01-31 16:53:00.000', '2016-02-01 17:50:00.000', 0, 427),
(381, '381_3', '2016-02-01 00:00:00.000', '2016-02-01 17:50:00.000', 0, 1070),
(381, '381_2', '2016-01-31 11:30:00.000', '2016-01-31 16:53:00.000', 0, 323);
;With Ordered as (
select
*,
ROW_NUMBER() OVER (PARTITION BY MRN order by ADTM) as rn
from
@T
), Ends as (
select
o1.MRN,
o1.ID,
o1.ADTM,
CASE WHEN o2.ADTM < o1.SDTM THEN o2.ADTM ELSE o1.SDTM END as SDTM
from
Ordered o1
left join
Ordered o2
on
o1.MRN = o2.MRN and
o1.rn= o2.rn - 1
)
select
*,
DATEDIFF(minute,ADTM,SDTM) as TMinutes
from Ends
除非您的样本数据不完整或我遗漏了什么,否则我们总是在按ADTM排序后将每一行与下一行进行匹配,然后获取当前SDTM或下一行ADTM,在您的示例设置中,日期定义为varchar23,那么在您的实际数据库中也是这样定义的吗?您将它们称为日期时间,并且对它们使用日期函数,因此我感到困惑抱歉,我使用SQLFIddle创建了它,它更改了类型。我现在就编辑。。。它们应该是DATETIME。您使用的是什么版本的sql server?我需要它在2008+上工作。不过,很高兴看到2012 only方法……真的没有包含将行连接在一起的数据的列吗?我从您的示例中了解到,如果行集合在ID的第一部分中具有相同的值,则它们是相关的,直到下划线为止?在您的示例设置中,日期定义为varchar23,那么它们在实际数据库中也是如何定义的?您将它们称为日期时间,并且对它们使用日期函数,因此我感到困惑抱歉,我使用SQLFIddle创建了它,它更改了类型。我现在就编辑。。。它们应该是DATETIME。您使用的是什么版本的sql server?我需要它在2008+上工作。不过,很高兴看到2012 only方法……真的没有包含将行连接在一起的数据的列吗?我从您的示例中收集到,如果行集合在ID的第一部分中具有相同的值,则它们是相关的,直到下划线为止?是的,这很好,有时我离开一段时间后,就无法重新自由地编写SQL。非常感谢您抽出时间,非常感谢您……是的,这很好,有时我离开很长时间后,无法自由地重新编写SQL。非常感谢您抽出时间,非常感谢……这是一件很棒的事情,有时候我离开很长时间后,就无法自由地重新编写SQL。非常感谢您抽出时间,非常感谢……这是一件很棒的事情,有时候我离开很长时间后,就无法自由地重新编写SQL。非常感谢您抽出时间,非常感谢。。。
MRN ID ADTM SDTM TMinutes
----------- ----- ----------------------- ----------------------- -----------
45 45_1 2016-03-24 06:28:00.000 2016-03-24 11:07:00.000 279
45 45_2 2016-03-24 11:07:00.000 2016-03-24 18:15:00.000 428
381 381_1 2016-01-30 00:25:00.000 2016-01-31 00:00:00.000 1415
381 381_1 2016-01-31 00:00:00.000 2016-01-31 11:30:00.000 690
381 381_2 2016-01-31 11:30:00.000 2016-01-31 16:53:00.000 323
381 381_3 2016-01-31 16:53:00.000 2016-02-01 00:00:00.000 427
381 381_3 2016-02-01 00:00:00.000 2016-02-01 17:50:00.000 1070