Sql server 如何识别表中的哪些行满足特定条件,但该条件基于前几行中的数据?提供的例子

Sql server 如何识别表中的哪些行满足特定条件,但该条件基于前几行中的数据?提供的例子,sql-server,tsql,Sql Server,Tsql,我正在使用一个包含以下数据的表: ObjectId EventId EventDate 1 342 2017-10-27 1 342 2018-01-06 1 343 2018-04-18 1 401 2018-10-15 1 342 2018-11-12 1 342 2018-11-29 1

我正在使用一个包含以下数据的表:

ObjectId   EventId   EventDate
1          342       2017-10-27
1          342       2018-01-06
1          343       2018-04-18
1          401       2018-10-15
1          342       2018-11-12
1          342       2018-11-29
1          401       2018-12-10
1          342       2019-02-21
1          343       2019-04-23
1          401       2019-11-04
1          343       2020-02-15
2          342       2018-06-08
2          343       2018-09-18
2          342       2018-10-02
我需要标记对象(由ObjectId标识)的所有3个事件(由EventId值342、343和401标识)发生的第一个记录。然后,该过程应使用剩余的记录再次开始。我曾尝试使用窗口函数来实现这一点,但识别任何其他事件的“重新开始”过程让我感到困惑

对上述数据集执行的该算法的输出为:

ObjectId   EventId   EventDate    EventsComplete
1          342       2017-10-27   0
1          342       2018-01-06   0
1          343       2018-04-18   0
1          401       2018-10-15   1
1          342       2018-11-12   0
1          342       2018-11-29   0
1          401       2018-12-10   0
1          342       2019-02-21   0
1          343       2019-04-23   1
1          401       2019-11-04   0
1          343       2020-02-15   0
2          342       2018-06-08   0
2          343       2018-09-18   0
2          342       2018-10-02   0
下面是一个将在示例中创建数据集的查询

select 1 as ObjectId, 342 as EventId, cast('2017-10-27' as date) as EventDate
union select 1 as ObjectId, 342 as EventId, cast('2018-01-06' as date) as EventDate
union select 1 as ObjectId, 343 as EventId, cast('2018-04-18' as date) as EventDate
union select 1 as ObjectId, 401 as EventId, cast('2018-10-15' as date) as EventDate
union select 1 as ObjectId, 342 as EventId, cast('2018-11-12' as date) as EventDate
union select 1 as ObjectId, 342 as EventId, cast('2018-11-29' as date) as EventDate
union select 1 as ObjectId, 401 as EventId, cast('2018-12-10' as date) as EventDate
union select 1 as ObjectId, 342 as EventId, cast('2019-02-21' as date) as EventDate
union select 1 as ObjectId, 343 as EventId, cast('2019-04-23' as date) as EventDate
union select 1 as ObjectId, 401 as EventId, cast('2019-11-04' as date) as EventDate
union select 1 as ObjectId, 343 as EventId, cast('2020-02-15' as date) as EventDate
union select 2 as ObjectId, 342 as EventId, cast('2018-06-08' as date) as EventDate
union select 2 as ObjectId, 343 as EventId, cast('2018-09-18' as date) as EventDate
union select 2 as ObjectId, 342 as EventId, cast('2018-10-02' as date) as EventDate

也许有一种方法可以通过CTE或直接SQL来实现这一点,但我无法使用这两种方法中的任何一种找到有效的解决方案

我能想到的最好的解决方案是通过RBAR(一行接一行)使用要处理的数据的非游标结果集。这是我了解如何管理当前ObjectId的事件状态的唯一方法

您可以在SSMS中运行以下操作:

-- Declare a temporary table for housing the queried data.
DECLARE @Data TABLE ( ObjectId INT, EventId INT, EventDate DATE, EventsComplete BIT DEFAULT (0), pk INT IDENTITY(1,1) );

-- Fetch the queried data into a table variable for processing.
INSERT INTO @Data ( ObjectId, EventId, EventDate ) VALUES
    ( 1, 342, '2017-10-27' ),( 1, 342, '2018-01-06' ),( 1, 343, '2018-04-18' ),( 1, 401, '2018-10-15' ),( 1, 342, '2018-11-12' ),
    ( 1, 342, '2018-11-29' ),( 1, 401, '2018-12-10' ),( 1, 342, '2019-02-21' ),( 1, 343, '2019-04-23' ),( 1, 401, '2019-11-04' ),
    ( 1, 343, '2020-02-15' ),( 2, 342, '2018-06-08' ),( 2, 343, '2018-09-18' ),( 2, 342, '2018-10-02' );

/*
    I'm inserting the sample data you provided, however, in your code you would simply SELECT/INSERT 
    the required data into the temporary table @Data while sorting on your ObjectId and EventDate.
*/

-- Declare some variables for processing.
DECLARE 
    @ObjectId INT, 
    @EventId INT, 
    @PrevObjId INT, 
    @Flag342 BIT,
    @Flag343 BIT,
    @Flag401 BIT;
    
-- For-each row in @Data (non-cursor)...
DECLARE @pk INT = 1;
WHILE @pk <= ( SELECT MAX ( pk ) FROM @Data ) BEGIN

    -- Current row.
    SELECT
        @ObjectId = ObjectId,
        @PrevObjId = ISNULL ( @PrevObjId, ObjectId ),
        @EventId = EventId
    FROM @Data WHERE pk = @pk;

    -- Set the event flags.
    IF @EventId = 342
        SET @Flag342 = 1;

    IF @EventID = 343
        SET @Flag343 = 1;

    IF @EventId = 401
        SET @Flag401 = 1;

    IF @ObjectId = @PrevObjId BEGIN

        -- Check for a completed event.
        IF ( @Flag342 = 1 AND @Flag343 = 1 AND @Flag401 = 1 ) BEGIN

            -- Set the EventsComplete flag.
            UPDATE @Data SET EventsComplete = 1 WHERE pk = @pk;

            -- Reset the event flag values.
            SELECT @Flag342 = 0, @Flag343 = 0, @Flag401 = 0;

        END

    END ELSE BEGIN

        -- New ObjectId, reset the event flag values.
        SELECT 
            @Flag342 = CASE WHEN @EventId = 342 THEN 1 ELSE 0 END, 
            @Flag343 = CASE WHEN @EventId = 343 THEN 1 ELSE 0 END, 
            @Flag401 = CASE WHEN @EventId = 401 THEN 1 ELSE 0 END;

    END

    -- Next row.
    SELECT
        @PrevObjId = @ObjectId,
        @pk = ( @pk + 1 );

END

-- Return the updated resultset.
SELECT
    ObjectId, EventId, EventDate, EventsComplete
FROM @Data ORDER BY pk;

下面是基于集合的解决方案

除使用位字段外,未尝试任何优化过程。它起作用了,我受够了。我可以看到一些可能的简化点

我应该补充一点,事实上,这个问题目前还没有定义,因为如果两个不同的事件可以在同一天发生,那么我们对它们发生的顺序没有定义。因此,在这些情况下,在第一个CTE中分配的行号是任意的。样本数据中未出现此类情况

使用字符串连接路径-150毫秒

切换到位而不是字符串,仍然比光标慢(~30ms)


下面的代码演示了使用CTE解决问题的另一种方法。第一阶段添加一列(
RN
)以对下一步的数据进行排序,并添加几个标志列(
E342Done
,…)以指示该行表示的事件。第二阶段使用递归CTE为每个
ObjectId
以正确的顺序处理行。由于TSQL不太擅长实现布尔逻辑,因此有时更容易使用算术来“伪造”逻辑

-- Sample data.
declare @ObjectEvents as Table ( ObjectId Int, EventId Int, EventDate Date );

insert into @ObjectEvents ( ObjectId, EventId, EventDate ) values
    ( 1, 342, '2017-10-27' ),( 1, 342, '2018-01-06' ),( 1, 343, '2018-04-18' ),( 1, 401, '2018-10-15' ),( 1, 342, '2018-11-12' ),
    ( 1, 342, '2018-11-29' ),( 1, 401, '2018-12-10' ),( 1, 342, '2019-02-21' ),( 1, 343, '2019-04-23' ),( 1, 401, '2019-11-04' ),
    ( 1, 343, '2020-02-15' ),( 2, 342, '2018-06-08' ),( 2, 343, '2018-09-18' ),( 2, 342, '2018-10-02' );

select * from @ObjectEvents order by ObjectId, EventDate;

-- Do the deed.
with
  OrderedEventsByObject as (
    -- Number the rows for each   ObjectId   in   EventDate   order and add flags for the events.
    select ObjectId, EventId, EventDate,
      Row_Number() over ( partition by ObjectId order by EventDate ) as RN,
      case when EventId = 342 then 1 else 0 end as E342Done,
      case when EventId = 343 then 1 else 0 end as E343Done,
      case when EventId = 401 then 1 else 0 end as E401Done
      from @ObjectEvents ),
  ProcessedEvents as (
    -- Process the events in order for each   ObjectId .
    -- Start with the first row for the   ObjectId ...
    select ObjectId, EventId, EventDate, RN, E342Done, E343Done, E401Done,
      0 as EventsComplete
      from OrderedEventsByObject
      where RN = 1
    union all
    -- ... then add the next row, if any, for each   ObjectId :
    select OEBO.ObjectId, OEBO.EventId, OEBO.EventDate, OEBO.RN,
      -- Use arithmetic as a shorthand for: ( PE.E342Done or OEBO.E342Done ) and not PH.EventsComplete .
      Sign( ( PE.E342Done + OEBO.E342Done ) * ( 1 - PH.EventsComplete ) ),
      Sign( ( PE.E343Done + OEBO.E343Done ) * ( 1 - PH.EventsComplete ) ),
      Sign( ( PE.E401Done + OEBO.E401Done ) * ( 1 - PH.EventsComplete ) ),
      PH.EventsComplete
      from ProcessedEvents as PE inner join
        OrderedEventsByObject as OEBO on OEBO.ObjectId = PE.ObjectId and OEBO.RN = PE.RN + 1 cross apply
        -- Use   cross apply   to make the   EventsCompleted   column available within the recursive part of the CTE.
        -- Arithmetic is used again to check for one of every event type being completed.
        ( select case when Sign( PE.E342Done + OEBO.E342Done ) + Sign( PE.E343Done + OEBO.E343Done ) + Sign( PE.E401Done + OEBO.E401Done ) = 3 then 1 else 0 end as EventsComplete ) as PH
     )
  -- You can uncomment the following   select   statements to see the intermediate results:
  -- select * from OrderedEventsByObject;
  -- select * from ProcessedEvents;
  select ObjectId, EventId, EventDate, EventsComplete
    from ProcessedEvents
    order by ObjectId, RN;

事件是否必须按特定顺序发生?一个日期可以发生多个事件吗?@HABO事件可以以任何顺序发生。一个日期可以发生多个事件。同一事件可能会在同一个日期发生两次。我自己还没有玩过这一次,但我已经将它添加到书签,让它试一试。然而,我看到它的第一反应是“这可能是使用光标的好时机”。这两个原因都是因为逻辑变得非常简单,而且我敢打赌,由于性能接近于使用行号和多根递归之类的神秘的基于集合的解决方案,乍一看,在递归CTE中使用相同的逻辑应该是简单的,如果有点笨拙的话,递归已经失控了,特别是考虑到不知道它能深入多少层。这就是为什么我选择这种方法来保持它的简单性——更不用说可维护性了。递归元素只需要两个级别,因为它可以使用类似
where next.eventid!=this.eventid或last.eventid和next.row number>this.row number和next.objectid=this.objectid
。一旦你做了两次,你必须得到3个不同的EventID。但是,是的,我还没有真正尝试编写它,所以也许我应该闭嘴:D挑战是跟踪/知道您已经为特定ObjectId标记了一个已完成的事件,并在仍然处理同一ObjectId的情况下重置为零。这就是我在划船时被困的地方。我很想看看你的CTE,如果你能让它工作的话。干得好。那个CTE是残酷的尝试和解释。@CriticalError哈哈是的。使用字符串连接而不是位掩码更容易理解,但速度较慢。编号为的CTE相当清楚,然后递归CTE基本上表示“对于数据中的每一行,生成一个我们迄今为止看到的不同事件id的“路径”。
和p.bits&n.bits=0
条件是使其不同的逻辑。然后,
candidates
CTE为每一行查找包含所有3个事件ID的最短路径,并将其标记为可能的解决方案。棘手的是,一条路径可能与另一条路径重叠。最后一个“不存在”消除了重叠。
符号((PE.E342Done+OEBO.E342Done)*(1-PH.EventsComplete))
我肯定是在偷取这个模式作为“迭代状态跟踪器”。
select 1 as ObjectId, 342 as EventId, cast('2017-10-27' as date) as EventDate
into t
union all select 1, 342, cast('2018-01-06' as date)
union all select 1, 343, cast('2018-04-18' as date)
union all select 1, 401, cast('2018-10-15' as date)
union all select 1, 342, cast('2018-11-12' as date)
union all select 1, 342, cast('2018-11-29' as date)
union all select 1, 401, cast('2018-12-10' as date)
union all select 1, 342, cast('2019-02-21' as date)
union all select 1, 343, cast('2019-04-23' as date)
union all select 1, 401, cast('2019-11-04' as date)
union all select 1, 343, cast('2020-02-15' as date)
union all select 2, 342, cast('2018-06-08' as date)
union all select 2, 343, cast('2018-09-18' as date)
union all select 2, 342, cast('2018-10-02' as date);
go

with numbered as  
-- just adding a row number to make it easier to follow
(
   select   objectid, 
            eventid, 
            eventdate, 
            rn = row_number() over (partition by objectid order by eventdate asc),
            bits = cast(power(2, case eventid when 342 then 0 when 343 then 1 else 2 end) as tinyint)
   from     t
),
paths as  
-- the concatenated paths of distinct eventid for each row, as a bitfield
(
   select      n.objectid, 
               n.eventid, 
               n.eventdate, 
               root = n.rn, 
               n.rn, 
               bits
   from        numbered n
   union all   
   select      n.objectid, 
               n.eventid, 
               n.eventdate, 
               p.root, 
               n.rn, 
               p.bits | n.bits
   from        paths       p
   join        numbered    n  on n.objectid = p.objectid
                                 and n.rn > p.rn
                                 and p.bits & n.bits = 0 
),
candidates as 
-- a row that has a path containing all 3 values (bits = 7)
(
   select   *
   from     (
               select   root, 
                        rn,
                        candidate = iif
                        (
                           rn = min(rn) over (partition by root), 
                           1, 0
                        )
               from     paths
               where    bits = 7
            ) c            
   where    c.candidate = 1
)
-- get the candidate rows where no earlier candidiate in row number order
-- has a root-to-end path which overlaps the path for this candidate
select      distinct 
            n.objectid,
            n.eventid,
            n.eventdate,
            isnull(c.candidate, 0)
from        numbered   n
left join   candidates c on c.rn = n.rn
                            and not exists 
                            (
                               select * 
                               from candidates prev
                               where prev.rn < c.rn
                                     and prev.rn > c.root
                                     and prev.root < c.rn
                            )
order by    n.objectid, 
            n.eventdate, 
            n.eventid

declare @triplets table(objectid int, eventid int, eventdate date);
declare c cursor fast_forward for 
select objectid, eventid, eventdate from t order by objectid, eventdate asc;
declare 
   @ob int, @prevob int, @event int, @dt date, 
   @bits tinyint = 0;
open c;
fetch next from c into @ob, @event, @dt;
while @@fetch_status = 0
begin
    if (@ob = @prevob)
    begin           
        if @event = 342 set @bits |= 1;
        else if @event = 343 set @bits |= 2;
        else if @event = 401 set @bits |= 4;

        if (@bits = 7) 
        begin
            insert @triplets values (@ob, @event, @dt);
            set @bits = 0
        end
    end
    else select @bits = 0, @prevob = @ob;
    fetch next from c into @ob, @event, @dt;
end
close c;
deallocate c;
select      t.*, iif(tt.objectid is null, 0, 1)
from        t
left join   @triplets tt    on  t.objectid = tt.objectid 
                                and t.eventid = tt.eventid
                                and t.eventdate = tt.eventdate;
-- Sample data.
declare @ObjectEvents as Table ( ObjectId Int, EventId Int, EventDate Date );

insert into @ObjectEvents ( ObjectId, EventId, EventDate ) values
    ( 1, 342, '2017-10-27' ),( 1, 342, '2018-01-06' ),( 1, 343, '2018-04-18' ),( 1, 401, '2018-10-15' ),( 1, 342, '2018-11-12' ),
    ( 1, 342, '2018-11-29' ),( 1, 401, '2018-12-10' ),( 1, 342, '2019-02-21' ),( 1, 343, '2019-04-23' ),( 1, 401, '2019-11-04' ),
    ( 1, 343, '2020-02-15' ),( 2, 342, '2018-06-08' ),( 2, 343, '2018-09-18' ),( 2, 342, '2018-10-02' );

select * from @ObjectEvents order by ObjectId, EventDate;

-- Do the deed.
with
  OrderedEventsByObject as (
    -- Number the rows for each   ObjectId   in   EventDate   order and add flags for the events.
    select ObjectId, EventId, EventDate,
      Row_Number() over ( partition by ObjectId order by EventDate ) as RN,
      case when EventId = 342 then 1 else 0 end as E342Done,
      case when EventId = 343 then 1 else 0 end as E343Done,
      case when EventId = 401 then 1 else 0 end as E401Done
      from @ObjectEvents ),
  ProcessedEvents as (
    -- Process the events in order for each   ObjectId .
    -- Start with the first row for the   ObjectId ...
    select ObjectId, EventId, EventDate, RN, E342Done, E343Done, E401Done,
      0 as EventsComplete
      from OrderedEventsByObject
      where RN = 1
    union all
    -- ... then add the next row, if any, for each   ObjectId :
    select OEBO.ObjectId, OEBO.EventId, OEBO.EventDate, OEBO.RN,
      -- Use arithmetic as a shorthand for: ( PE.E342Done or OEBO.E342Done ) and not PH.EventsComplete .
      Sign( ( PE.E342Done + OEBO.E342Done ) * ( 1 - PH.EventsComplete ) ),
      Sign( ( PE.E343Done + OEBO.E343Done ) * ( 1 - PH.EventsComplete ) ),
      Sign( ( PE.E401Done + OEBO.E401Done ) * ( 1 - PH.EventsComplete ) ),
      PH.EventsComplete
      from ProcessedEvents as PE inner join
        OrderedEventsByObject as OEBO on OEBO.ObjectId = PE.ObjectId and OEBO.RN = PE.RN + 1 cross apply
        -- Use   cross apply   to make the   EventsCompleted   column available within the recursive part of the CTE.
        -- Arithmetic is used again to check for one of every event type being completed.
        ( select case when Sign( PE.E342Done + OEBO.E342Done ) + Sign( PE.E343Done + OEBO.E343Done ) + Sign( PE.E401Done + OEBO.E401Done ) = 3 then 1 else 0 end as EventsComplete ) as PH
     )
  -- You can uncomment the following   select   statements to see the intermediate results:
  -- select * from OrderedEventsByObject;
  -- select * from ProcessedEvents;
  select ObjectId, EventId, EventDate, EventsComplete
    from ProcessedEvents
    order by ObjectId, RN;