在T-SQL中查找开始和结束日期(基于集合)
我有下面的答案在T-SQL中查找开始和结束日期(基于集合),sql,sql-server,sql-server-2005,tsql,gaps-and-islands,Sql,Sql Server,Sql Server 2005,Tsql,Gaps And Islands,我有下面的答案 Name Date A 2011-01-01 01:00:00.000 A 2011-02-01 02:00:00.000 A 2011-03-01 03:00:00.000 B 2011-04-01 04:00:00.000 A 2011-05-01 07:00:00.000 所需输出为 Name StartDate EndDate ------------------------------
Name Date
A 2011-01-01 01:00:00.000
A 2011-02-01 02:00:00.000
A 2011-03-01 03:00:00.000
B 2011-04-01 04:00:00.000
A 2011-05-01 07:00:00.000
所需输出为
Name StartDate EndDate
-------------------------------------------------------------------
A 2011-01-01 01:00:00.000 2011-04-01 04:00:00.000
B 2011-04-01 04:00:00.000 2011-05-01 07:00:00.000
A 2011-05-01 07:00:00.000 NULL
如何在基于集合的方法中使用TSQL实现同样的功能
如下图所示
DECLARE @t TABLE(PersonName VARCHAR(32), [Date] DATETIME)
INSERT INTO @t VALUES('A', '2011-01-01 01:00:00')
INSERT INTO @t VALUES('A', '2011-01-02 02:00:00')
INSERT INTO @t VALUES('A', '2011-01-03 03:00:00')
INSERT INTO @t VALUES('B', '2011-01-04 04:00:00')
INSERT INTO @t VALUES('A', '2011-01-05 07:00:00')
Select * from @t
cte的另一个答案是好的。另一种选择是在任何情况下都要迭代集合。它不是基于设置的,但它是另一种方式 您将需要迭代到A.为对应于其事务的每个记录分配唯一的id,或B.以实际获得输出 TSQL不适合在记录上进行迭代,特别是当您有很多记录时,因此我建议使用其他方法,一个小的.net程序或更擅长迭代的程序
;WITH cte1
AS (SELECT *,
ROW_NUMBER() OVER (ORDER BY Date) -
ROW_NUMBER() OVER (PARTITION BY PersonName
ORDER BY Date) AS G
FROM @t),
cte2
AS (SELECT PersonName,
MIN([Date]) StartDate,
ROW_NUMBER() OVER (ORDER BY MIN([Date])) AS rn
FROM cte1
GROUP BY PersonName,
G)
SELECT a.PersonName,
a.StartDate,
b.StartDate AS EndDate
FROM cte2 a
LEFT JOIN cte2 b
ON a.rn + 1 = b.rn
因为CTE的结果通常不会具体化
如果你实现了目标,你会发现你的表现会更好
中间结果如下
DECLARE @t2 TABLE (
rn INT IDENTITY(1, 1) PRIMARY KEY,
PersonName VARCHAR(32),
StartDate DATETIME );
INSERT INTO @t2
SELECT PersonName,
MIN([Date]) StartDate
FROM (SELECT *,
ROW_NUMBER() OVER (ORDER BY Date) -
ROW_NUMBER() OVER (PARTITION BY PersonName
ORDER BY Date) AS G
FROM @t) t
GROUP BY PersonName,
G
ORDER BY StartDate
SELECT a.PersonName,
a.StartDate,
b.StartDate AS EndDate
FROM @t2 a
LEFT JOIN @t2 b
ON a.rn + 1 = b.rn
获取行号,以便知道上一条记录的位置。然后,记下一个记录,然后再记下下下一个记录。当状态改变时,我们有一个候选行
select
state,
min(start_timestamp),
max(end_timestamp)
from
(
select
first.state,
first.timestamp_ as start_timestamp,
second.timestamp_ as end_timestamp
from
(
select
*, row_number() over (order by timestamp_) as id
from test
) as first
left outer join
(
select
*, row_number() over (order by timestamp_) as id
from test
) as second
on
first.id = second.id - 1
and first.state != second.state
) as agg
group by state
having max(end_timestamp) is not null
union
-- last row wont have a ending row
--(select state, timestamp_, null from test order by timestamp_ desc limit 1)
-- I think it something like this for sql server
(select top state, timestamp_, null from test order by timestamp_ desc)
order by 2
;
使用PostgreSQL进行测试,但也应使用SQL Server
SELECT
PersonName,
StartDate = MIN(Date),
EndDate
FROM (
SELECT
PersonName,
Date,
EndDate = (
/* get the earliest date after current date
associated with a different person */
SELECT MIN(t1.Date)
FROM @t AS t1
WHERE t1.Date > t.Date
AND t1.PersonName <> t.PersonName
)
FROM @t AS t
) s
GROUP BY PersonName, EndDate
ORDER BY 2
基本上,对于每一个日期,我们都会找到它后面最近的日期,这样它就与不同的人名相关联。这就给了我们EndDate,它现在可以为我们区分同一个人的连续日期组
现在我们只需要按PersonName和EndDate对数据进行分组,并将每个组中的最小日期作为StartDate。是的,当然可以按起始日期对数据进行排序。有一种非常快速的方法,使用一些间隙和孤岛理论:
WITH CTE as (SELECT PersonName, [Date]
, Row_Number() over (ORDER BY [Date])
- Row_Number() over (ORDER BY PersonName, [Date]) as Island
FROM @t)
Select PersonName, Min([Date]), Max([Date])
from CTE
GROUP BY Island, PersonName
ORDER BY Min([Date])
我不明白你是如何计算出期望的输出的。你如何决定结束日期?例如,在所需的输出中,有一条记录:名称开始日期结束日期a 2011-01-01 01 01:00:00.000 2011-04-01 04:00:00.000,但输入的日期2011-04-01 04:00:00.000与名称B关联。我们如何确定记录的结束日期。记录的定义是什么?记录的结束日期是另一个记录的开始日期。所以A的起始日期是2011-01-01 01:00:00.000,而B的起始日期是2011-04-01 04:00:00.000。所以A的结束日期是2011-04-01 04:00:00.000。类似地,在B之后的A的开始日期为2011-05-01 07:00:00.000,这是B的结束日期。但您如何知道选择哪个记录作为特定记录的结束日期?在名称中找到的第一个差异。i、 e.e A在开头出现了3次,然后B出现在第4排。所以1笔交易结束了。第五排又来了一辆。因此,从4号到5号有一个新的事务不幸的是,没有某种逻辑来确定如何选择结束日期并将数据合并到输出中,我认为没有任何方法可以帮助编写SQL以获得您想要的输出。必须遵循某种逻辑才能从A输入到B输出。