Sql 过滤重复记录_Sql_Sql Server_Count_Subquery_Window Functions

Sql 过滤重复记录

sql sql-server

Sql 过滤重复记录,sql,sql-server,count,subquery,window-functions,Sql,Sql Server,Count,Subquery,Window Functions,我正在尝试识别一个独特的个人id列表，该列表与另一个人具有相同的帐户id 需要注意的是，最后一次获得重叠的帐户id的人员id不应包括在该列表中 id person_id account_id +------------------------------------------+ | | | | | 1 | 1 |

我正在尝试识别一个独特的

个人id列表

，该列表与另一个人具有相同的

帐户id

需要注意的是，最后一次获得重叠的

帐户id的人员id
不应包括在该列表中
        id          person_id     account_id
   +------------------------------------------+
   |            |             |               |
   |     1      |     1       |      10       |
   +------------------------------------------+
   |            |             |               |
   |     2      |     2       |      10       |
   +------------------------------------------+
   |            |             |               |
   |     3      |     3       |      11       |
   +------------------------------------------+

请注意：这是一个略为简化的示例，不应过于字面化

这是我目前的查询
SELECT STRING_AGG(person_id, ',')
FROM accounts_map
WHERE created_at > '2001-01-10' -- ignore records smaller than 2001-01-10
GROUP BY account_id -- group by account id
HAVING count(*) > 1 -- any account that have multiple matches

我不明白的是
如何为每个帐户\u id匹配重叠排除最新记录
如何忽略帐户id标记为null的记录，因为null不起作用
预期产出
id
---
1 (not expected if date range cutoff specified) created_at > '2001-01-10'
2
4 (not expected if date range cutoff specified) created_at > '2001-01-10'
5

如果它有一个重叠的帐户，则被授予该帐户的最新用户将成为腐蚀旧用户的有效用户
在子查询中，您可以在共享相同帐户id的记录组中进行窗口计数，并使用行编号（）
按日期对它们进行排序。然后外部查询筛选窗口计数大于1的记录，并返回除最新记录外的所有记录：
select person_id, account_id, created_at
from (
    select 
        t.*, 
        row_number() over(partition by account_id order by created_at desc) rn,
        count(*) over(partition by account_id) cnt
    from accounts_map t
) t
where cnt > 1 and rn > 1

：
person_id | account_id | created_at         
--------: | ---------: | :------------------
        1 |         10 | 10/01/2001 00:00:00
        1 |         11 | 10/01/2001 00:00:00
        1 |         12 | 10/01/2001 00:00:00
        5 |         20 | 14/01/2019 00:00:00
        2 |         20 | 11/01/2019 00:00:00
        5 |         21 | 14/01/2019 00:00:00
        2 |         21 | 11/01/2019 00:00:00
        5 |         22 | 14/01/2019 00:00:00
        2 |         22 | 11/01/2019 00:00:00
人员|账号|创建于|
--------: | ---------: | :------------------
1 |         10 | 10/01/2001 00:00:00
1 |         11 | 10/01/2001 00:00:00
1 |         12 | 10/01/2001 00:00:00
5 |         20 | 14/01/2019 00:00:00
2 |         20 | 11/01/2019 00:00:00
5 |         21 | 14/01/2019 00:00:00
2 |         21 | 11/01/2019 00:00:00
5 |         22 | 14/01/2019 00:00:00
2 |         22 | 11/01/2019 00:00:00
注意：您提供了示例数据，但不幸的是，没有提供相关的预期结果，以便我们验证查询的输出。
哎呀，对不起，让我更新一下！非常感谢这个顺便说一句！我现在正在努力浏览并理解它注意：我如何排除帐户id为null的记录？如果我添加帐户\u id NULL所有记录dissapears@user391986：您的示例数据中没有此类记录。。。但是无论如何，您可以简单地在子查询中添加，其中account\u id不为NULL
。要检查空值，您需要特殊的sytanx是[NOT]NULL，而不是相等/不相等。这是真正的道歉！我试图把一个相当复杂的例子简化成更简单的东西。我真的很感谢你的帮助，这太棒了，我不知道分区！欢迎@user391986！窗口函数在SQL中非常有用，您想了解它们。。。
person_id | account_id | created_at         
--------: | ---------: | :------------------
        1 |         10 | 10/01/2001 00:00:00
        1 |         11 | 10/01/2001 00:00:00
        1 |         12 | 10/01/2001 00:00:00
        5 |         20 | 14/01/2019 00:00:00
        2 |         20 | 11/01/2019 00:00:00
        5 |         21 | 14/01/2019 00:00:00
        2 |         21 | 11/01/2019 00:00:00
        5 |         22 | 14/01/2019 00:00:00
        2 |         22 | 11/01/2019 00:00:00