Sql 在一个字段更改时查找开始和结束日期
我在一张表格里有这些数据Sql 在一个字段更改时查找开始和结束日期,sql,sliding-window,Sql,Sliding Window,我在一张表格里有这些数据 FIELD_A FIELD_B FIELD_D 249052903 10/15/2011 N 249052903 11/15/2011 P ------------- VALUE CHANGED 249052903 12/15/2011 P 249052903 1/15/2012 N ------------- VALUE CHANGED 249052903 2/15/2012 N 249052903 3/15/2012
FIELD_A FIELD_B FIELD_D
249052903 10/15/2011 N
249052903 11/15/2011 P ------------- VALUE CHANGED
249052903 12/15/2011 P
249052903 1/15/2012 N ------------- VALUE CHANGED
249052903 2/15/2012 N
249052903 3/15/2012 N
249052903 4/15/2012 N
249052903 5/15/2012 N
249052903 6/15/2012 N
249052903 7/15/2012 N
249052903 8/15/2012 N
249052903 9/15/2012 N
当字段_D中的值发生变化时,它会形成一个组,我需要该组中的最小和最大日期。查询应该返回
FIELD_A GROUP_START GROUP_END
249052903 10/15/2011 10/15/2011
249052903 11/15/2011 12/15/2011
249052903 1/15/2012 9/15/2012
到目前为止,我所看到的例子中,字段D中的数据是独一无二的。这里的数据可以重复显示,首先是N,然后变为P,然后变回N
任何帮助都将不胜感激
感谢您不要使用SQL来解决此问题,因为在SQL中,单表扫描是不可能的,因为它需要记录之间的比较。它需要一个完整的表扫描,再加上至少一个自身的连接。用命令式语言实现解决方案很简单,只需要一次表扫描。
编辑:最好是存储过程。使用子查询在SQL中很容易表达这一点:
select Field_A, Field_D, min(Field_B) as Group_Start, max(Field_B) as Group_End
from (select t.*,
(select min(field_B)
from t t2
where t2.field_A = t.field_A and
t2.field_B > t.field_B and
t2.Field_D <> t.field_D
) as TheGroup
from t
) t
group by Field_A, Field_D, TheGroup
这是使用相关子查询分配组标识符。标识符是字段_D更改的字段_B的第一个值
您没有提到您正在使用的数据库,因此它使用标准SQL。您可以使用分析函数-滞后、超前和累加,如果SQL实现支持它们,它们将发挥您的优势。SQL小提琴
我对答案做了一些修改,你有多个字段。这应该始终有效:-
WITH EndsMarked
AS
(
SELECT
[Field_A]
,[Field_B]
,CASE
WHEN LAG([Field_D],1) OVER (PARTITION BY [Field_A] ORDER BY [Field_A],[Field_B]) IS NULL
AND ROW_NUMBER() OVER (PARTITION BY [Field_A] ORDER BY [Field_B]) = 1
THEN 1
WHEN LAG([Field_D],1) OVER (PARTITION BY [Field_A] ORDER BY [Field_A],[Field_B]) > 0
<> LAG([Field_D],0) OVER (PARTITION BY [Field_A] ORDER BY [Field_A],[Field_B]) > 0
THEN 1
ELSE 0
END AS IS_START
,CASE
WHEN LEAD([Field_D],1) OVER (PARTITION BY [Field_A] ORDER BY [Field_A],[Field_B]) IS NULL
AND ROW_NUMBER() OVER (PARTITION BY [Field_A] ORDER BY [Field_B] DESC) = 1
THEN 1
WHEN LEAD([Field_D],0) OVER (PARTITION BY [Field_A] ORDER BY [Field_A],[Field_B])
<> LEAD([Field_D],1) OVER (PARTITION BY [Field_A] ORDER BY [Field_A],[Field_B])
THEN 1
ELSE 0
END AS IS_END
FROM
(
SELECT
[Field_A]
,[Field_B]
,[Field_D]
,[Aantal Facturen]
FROM [T]
) F
)
,GroupsNumbered
AS
(
SELECT
[Field_A]
,[Field_B]
,IS_START
,IS_END
,COUNT(CASE
WHEN IS_START = 1
THEN 1
END) OVER (ORDER BY [Field_A]
,[Field_B]) AS GroupNum
FROM EndsMarked
WHERE IS_START = 1
OR IS_END = 1
)
SELECT
[Field_A]
,MIN([Field_B]) AS GROUP_START
,MAX([Field_B]) AS GROUP_END
FROM GroupsNumbered
GROUP BY [Field_A], GroupNum
此查询仅创建两个组。该组需要附加条件t2.Field_B>t.Field_B。我仍然认为存储过程将是一种更快、更易于维护的解决方案。@koriander。感谢你指出我遗漏了字段B上的条件。不过,我不同意你其余的评论。我实际上喜欢你SQL查询的优雅。但如果性能是个问题,我肯定会研究基本的顺序文件处理。原始数据看起来像来自日志,自然排序,不需要索引。这种说法完全错误。SQL中的表本质上是无序的。为了以正确的顺序检索adta,您需要使用order by子句,这需要多次读取/写入数据。根据理论关系模型,关系是无序的。但是,实际上,可以使用聚集索引按顺序存储表,请参阅。此外,解决此问题的SQL将使用多个索引,order_by需要一个索引。每个索引只需要读取一次表本身,而不是像您声称的那样多次。事实上,仅就这个问题而言,将此表存储为单个顺序文件(而不是存储在DBMS中)似乎是最好的解决方案。感谢您的回答,但是这不会产生正确的答案,因为只返回一行。EndsMarked只创建一个组。您是否运行了SQL FIDLE?查询返回三行-正是您所说的查询应该返回的三行。感谢您的后续操作。这是我的错误。查询返回了正确的数据。谢谢你的帮助。
WITH EndsMarked
AS
(
SELECT
[Field_A]
,[Field_B]
,CASE
WHEN LAG([Field_D],1) OVER (PARTITION BY [Field_A] ORDER BY [Field_A],[Field_B]) IS NULL
AND ROW_NUMBER() OVER (PARTITION BY [Field_A] ORDER BY [Field_B]) = 1
THEN 1
WHEN LAG([Field_D],1) OVER (PARTITION BY [Field_A] ORDER BY [Field_A],[Field_B]) > 0
<> LAG([Field_D],0) OVER (PARTITION BY [Field_A] ORDER BY [Field_A],[Field_B]) > 0
THEN 1
ELSE 0
END AS IS_START
,CASE
WHEN LEAD([Field_D],1) OVER (PARTITION BY [Field_A] ORDER BY [Field_A],[Field_B]) IS NULL
AND ROW_NUMBER() OVER (PARTITION BY [Field_A] ORDER BY [Field_B] DESC) = 1
THEN 1
WHEN LEAD([Field_D],0) OVER (PARTITION BY [Field_A] ORDER BY [Field_A],[Field_B])
<> LEAD([Field_D],1) OVER (PARTITION BY [Field_A] ORDER BY [Field_A],[Field_B])
THEN 1
ELSE 0
END AS IS_END
FROM
(
SELECT
[Field_A]
,[Field_B]
,[Field_D]
,[Aantal Facturen]
FROM [T]
) F
)
,GroupsNumbered
AS
(
SELECT
[Field_A]
,[Field_B]
,IS_START
,IS_END
,COUNT(CASE
WHEN IS_START = 1
THEN 1
END) OVER (ORDER BY [Field_A]
,[Field_B]) AS GroupNum
FROM EndsMarked
WHERE IS_START = 1
OR IS_END = 1
)
SELECT
[Field_A]
,MIN([Field_B]) AS GROUP_START
,MAX([Field_B]) AS GROUP_END
FROM GroupsNumbered
GROUP BY [Field_A], GroupNum