Sql server 在mssql中查找Dublet和变体

Sql server 在mssql中查找Dublet和变体,sql-server,Sql Server,我有一张包含大量数据的表格。在该表中有一行,没有唯一的ID,因此可能有它们的Dublet-我通过执行以下查询找到了它们: SELECT theid FROM thetable GROUP BY theid HAVING COUNT(*) > 1 表中还有street1、street2、city1、city2等列 在我找到Dublet的第一个查询的行列表中,我需要检查street1和street2是否不同,city1和city2是否不同,在第一个查询中给定id的任何Dublet中-有意义吗

我有一张包含大量数据的表格。在该表中有一行,没有唯一的ID,因此可能有它们的Dublet-我通过执行以下查询找到了它们:

SELECT theid FROM thetable
GROUP BY theid
HAVING COUNT(*) > 1
表中还有street1、street2、city1、city2等列

在我找到Dublet的第一个查询的行列表中,我需要检查street1和street2是否不同,city1和city2是否不同,在第一个查询中给定id的任何Dublet中-有意义吗

假设我们有两行具有相同的id——在这些行中,我需要检查street1和street1是否在具有特定id的所有行中都不同

任何关于如何做到这一点的提示和指针,我都在盲目地关注这个问题,似乎找不到正确的答案

非常感谢使用CTE将有助于:

;WITH CTE AS
(
  SELECT theID,
         Street1,
         Street2,
         Street3,
         City,
         State,
         Zip,
         rn = ROW_NUMBER() OVER(PARTITION BY theID ORDER BY theID)
  FROM Table
  -- add joins if necessary
)
SELECT oldestID = c1.theID,
       oldestStreet1 = c1.Street1,
       newestStreet1 = c2.Street1,
       newestID = c2.theID
FROM CTE c1
INNER JOIN CTE c2 ON c2.rn = c1.rn + 1
您还可以添加case语句来显示匹配项与非匹配项。这将有助于手动识别1337测试街与1337测试街的打字错误:

;WITH CTE AS
(
  SELECT theID,
         Street1,
         Street2,
         Street3,
         City,
         State,
         Zip,
         rn = ROW_NUMBER() OVER(PARTITION BY theID ORDER BY theID)
  FROM Table
  -- add joins if necessary
)
SELECT oldestID = c1.theID,
       oldestStreet1 = CASE WHEN c1.Street1 = c2.Street1 THEN 'Match' ELSE c1.Street1 END,
       newestStreet1 = CASE WHEN c1.Street1 = c2.Street1 THEn 'Match' ELSE c2.Street1 END,
       newestID = c2.theID
FROM CTE c1
INNER JOIN CTE c2 ON c2.rn = c1.rn + 1
或者,您可以通过将不匹配的项添加到内部JOIN子句中来返回这些项:


请记住,这些是完全匹配的。您可以实现静态模糊逻辑LEFTZip,5,以仅在邮政编码的前5位匹配,以防有些邮政编码为Zip+4,有些邮政编码为Zip+4

你也可以这样分析

;WITH CTE AS
(
  SELECT theID,
         Street1,
         Street2,
         Street3,
         City,
         State,
         Zip,
         rn = ROW_NUMBER() OVER(PARTITION BY theID ORDER BY theID)
  FROM Table
  -- add joins if necessary
)
,
CTE1 as
(
 select *,ROW_NUMBER() 
OVER(PARTITION BY theID,Street1,Street2,City,State,Zip 
oRDER BY theID) rn2 from cte where rn>2
)
select * from cte1

嘿-那不行,是吗?就像我有19个一个id的Dublet一样,我需要进行19次连接,这取决于您使用的密钥标识符。如果您的ID上有19次重复,那么您可以使用CTE来显示它们,而不仅仅是前两次。或者,如果可能,将唯一标识符附加到CTE中。或者使用临时表格。您能提供我们样本数据和结果吗?
;WITH CTE AS
(
  SELECT theID,
         Street1,
         Street2,
         Street3,
         City,
         State,
         Zip,
         rn = ROW_NUMBER() OVER(PARTITION BY theID ORDER BY theID)
  FROM Table
  -- add joins if necessary
)
,
CTE1 as
(
 select *,ROW_NUMBER() 
OVER(PARTITION BY theID,Street1,Street2,City,State,Zip 
oRDER BY theID) rn2 from cte where rn>2
)
select * from cte1