Sql server 删除基于日期的时间线中连续重复出现的值

Sql server 删除基于日期的时间线中连续重复出现的值,sql-server,tsql,sql-server-2012,Sql Server,Tsql,Sql Server 2012,我有一个表,其中包含基于日期的用户操作。该表用作事件的时间线。下面的示例显示了两个人如何随着时间的推移改变他们的工作角色: DECLARE @tbl TABLE ( UserID int, ActionID int, ActionDesc nvarchar(50), ActionDate datetime ); INSERT INTO @tbl (UserID, ActionID, ActionDesc, ActionDate) VALUES -- Fi

我有一个表,其中包含基于日期的用户操作。该表用作事件的时间线。下面的示例显示了两个人如何随着时间的推移改变他们的工作角色:

DECLARE @tbl TABLE (
    UserID int,
    ActionID int,
    ActionDesc nvarchar(50),
    ActionDate datetime
);
INSERT INTO @tbl (UserID, ActionID, ActionDesc, ActionDate)
VALUES 
    -- First person
    (1, 200, 'Promoted',   '2000-01-01'),   
    (1, 200, 'Promoted',   '2001-01-01'),   
    (1, 200, 'Promoted',   '2002-02-01'),   
    (1, 300, 'Moved',      '2004-03-01'),   
    (1, 200, 'Promoted',   '2005-03-01'),   
    (1, 200, 'Promoted',   '2006-03-01'),
    -- Second person
    (2, 200, 'Promoted',   '2006-01-01'),   
    (2, 300, 'Moved',      '2007-01-01'),
    (2, 200, 'Promoted',   '2008-01-01');

SELECT * FROM @tbl ORDER BY UserID, ActionDate DESC;
这将首先显示以下最新事件:

我需要以相反的日期顺序显示该表,但根据[UserID/ActionID]匹配,删除在刚刚发生之后直接发生的任何事件。例如,如果此人被提升,然后紧接着再次提升,则第二次提升将不包括在结果中,因为它将被视为前一次操作的重复

因此,所需输出为:

在研究之后,我尝试获取
行编号()
,以识别重复项:

SELECT
    *,
    ROW_NUMBER() OVER (PARTITION BY UserID, ActionID ORDER BY ActionDate ASC) AS RowNum
FROM
    @tbl
ORDER BY
    UserID, ActionDate DESC;
…但它不太起作用,因为每次不同的操作后编号都不会重置。我可能想得太多了,但我正在努力寻找灵感,因为搜索结果返回了无数问题,人们只是从列表中删除重复项。

DECLARE@tbl TABLE(
DECLARE @tbl TABLE (
    UserID int,
    ActionID int,
    ActionDesc nvarchar(50),
    ActionDate datetime
);
INSERT INTO @tbl (UserID, ActionID, ActionDesc, ActionDate)
VALUES 
    -- First person
    (1, 200, 'Promoted',   '2000-01-01'),   
    (1, 200, 'Promoted',   '2001-01-01'),   
    (1, 200, 'Promoted',   '2002-02-01'),   
    (1, 300, 'Moved',      '2004-03-01'),   
    (1, 200, 'Promoted',   '2005-03-01'),   
    (1, 200, 'Promoted',   '2006-03-01'),
    -- Second person
    (2, 200, 'Promoted',   '2006-01-01'),   
    (2, 300, 'Moved',      '2007-01-01'), --<<--- here ActionID is 300
    (2, 200, 'Promoted',   '2008-01-01');

select UserID, ActionID, ActionDesc, min(ActionDate) as dt
  from (
         select t.*
              , row_number() over(partition by UserID, ActionID order by ActionDate)
                - row_number() over(partition by UserID order by ActionDate) as grp_id
           from @tbl t
       ) v
 group by grp_id, UserID, ActionID, ActionDesc
 order by UserID, min(ActionDate) desc;
UserID int, ActionID int, ActionDesc nvarchar(50), ActionDate日期时间 ); 插入@tbl(UserID、ActionID、ActionDesc、ActionDate) 价值观 --第一人称 (1200,“晋升”,“2000-01-01”), (1200,“晋升”,“2001-01-01”), (1200,“晋升”,“2002-02-01”), (1300,“已移动”,“2004-03-01”), (1200,“晋升”,“2005-03-01”), (1200,“晋升”,“2006-03-01”), --第二人称 (2200,“晋升”,“2006-01-01”), (2300,“移动”,“2007-01-01”)-- 第一个(内部)查询为每一行分配一个行号,按userid和actiondate排序-然后我计算一个行号,与之相同,但也按“action”分区-如果我从a中选择子动作B,我得到一个只能应用于一组userid和Actions的数字-通过生成另一个行号,按userid、actionId分区,然后,我可以选择行1,最早的日期。

我将使用它来消除不必要的行

USE tempdb;

DECLARE @tbl TABLE (
    UserID int,
    ActionID int,
    ActionDesc nvarchar(50),
    ActionDate datetime
);
INSERT INTO @tbl (UserID, ActionID, ActionDesc, ActionDate)
VALUES 
    -- First person
    (1, 200, 'Promoted',   '2000-01-01'),   
    (1, 200, 'Promoted',   '2001-01-01'),   
    (1, 200, 'Promoted',   '2002-02-01'),   
    (1, 300, 'Moved',      '2004-03-01'),   
    (1, 200, 'Promoted',   '2005-03-01'),   
    (1, 200, 'Promoted',   '2006-03-01'),
    -- Second person
    (2, 200, 'Promoted',   '2006-01-01'),   
    (2, 300, 'Moved',      '2007-01-01'),
    (2, 200, 'Promoted',   '2008-01-01');

;WITH src AS
(
    SELECT *
        , l = LEAD(t.ActionID) OVER (PARTITION BY t.UserID ORDER BY t.ActionDate DESC)
    FROM @tbl t
)
SELECT src.UserID
    , src.ActionID
    , src.ActionDesc
    , src.ActionDate
FROM src
WHERE src.l <> src.ActionID 
    OR src.l IS NULL
对于具有大量行的表,您希望尽可能减少查询中使用的聚合数量;LEAD只需要一个聚合就可以实现这一点。我的版本的执行计划:


Argh。这是我实际测试代码中的一个输入错误。是的,应该是300。我已经更新了OP。你能解释一下你在代码的每一步都在做什么吗?看起来你在用一组行号减去另一组行号??太棒了!你能解释一下代码里发生了什么吗?我从来没有想到,如果你剥开外层,你会看到我是如何努力朝着最终目标构建的。用常用的表表达式而不是子查询写这篇文章可能会更容易理解。就在那里——这让我的日子过得很愉快!如此简单,却又如此强大。每隔一段时间(比如现在),我都会学到一些关于TSQL的新东西,我可以在很多其他地方使用这些新东西来让生活变得更轻松,但我想知道如果不在so上发布,我是如何学会的。。。!!非常感谢你。我的荣幸,是的,斯塔克福也改变了我的生活!
USE tempdb;

DECLARE @tbl TABLE (
    UserID int,
    ActionID int,
    ActionDesc nvarchar(50),
    ActionDate datetime
);
INSERT INTO @tbl (UserID, ActionID, ActionDesc, ActionDate)
VALUES 
    -- First person
    (1, 200, 'Promoted',   '2000-01-01'),   
    (1, 200, 'Promoted',   '2001-01-01'),   
    (1, 200, 'Promoted',   '2002-02-01'),   
    (1, 300, 'Moved',      '2004-03-01'),   
    (1, 200, 'Promoted',   '2005-03-01'),   
    (1, 200, 'Promoted',   '2006-03-01'),
    -- Second person
    (2, 200, 'Promoted',   '2006-01-01'),   
    (2, 300, 'Moved',      '2007-01-01'),
    (2, 200, 'Promoted',   '2008-01-01');

;WITH src AS
(
    SELECT *
        , l = LEAD(t.ActionID) OVER (PARTITION BY t.UserID ORDER BY t.ActionDate DESC)
    FROM @tbl t
)
SELECT src.UserID
    , src.ActionID
    , src.ActionDesc
    , src.ActionDate
FROM src
WHERE src.l <> src.ActionID 
    OR src.l IS NULL
╔════════╦══════════╦════════════╦═════════════════════════╗ ║ UserID ║ ActionID ║ ActionDesc ║ ActionDate ║ ╠════════╬══════════╬════════════╬═════════════════════════╣ ║ 1 ║ 200 ║ Promoted ║ 2005-03-01 00:00:00.000 ║ ║ 1 ║ 300 ║ Moved ║ 2004-03-01 00:00:00.000 ║ ║ 1 ║ 200 ║ Promoted ║ 2000-01-01 00:00:00.000 ║ ║ 2 ║ 200 ║ Promoted ║ 2008-01-01 00:00:00.000 ║ ║ 2 ║ 300 ║ Moved ║ 2007-01-01 00:00:00.000 ║ ║ 2 ║ 200 ║ Promoted ║ 2006-01-01 00:00:00.000 ║ ╚════════╩══════════╩════════════╩═════════════════════════╝