Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/sql-server/27.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/sql-server-2008/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Sql server 如何在SQL Server中查找重复值_Sql Server_Sql Server 2008_Duplicates - Fatal编程技术网

Sql server 如何在SQL Server中查找重复值

Sql server 如何在SQL Server中查找重复值,sql-server,sql-server-2008,duplicates,Sql Server,Sql Server 2008,Duplicates,我正在使用SQLServer2008。我有一张桌子 Customers customer_number int field1 varchar field2 varchar field3 varchar field4 varchar 。。。还有更多的列,这对我的查询并不重要 列客户号为pk。我试图找到重复的值以及它们之间的一些差异 请帮助我找到所有具有相同属性的行 1)field1、field2、field3、field4 2)只有3列相等,其中一列不相等(列表1中的行除外) 3)只有

我正在使用SQLServer2008。我有一张桌子

Customers

customer_number int

field1 varchar

field2 varchar

field3 varchar

field4 varchar
。。。还有更多的列,这对我的查询并不重要

列客户号为pk。我试图找到重复的值以及它们之间的一些差异

请帮助我找到所有具有相同属性的行

1)field1、field2、field3、field4

2)只有3列相等,其中一列不相等(列表1中的行除外)

3)只有两列相等,其中两列不相等(列表1和列表2中的行除外)

最后,我将有3个表,其中包含这个结果和额外的groupId,这对于一组相似的表是相同的(例如,对于3列相等,具有3列相等的行将是一个单独的组)


谢谢。

最简单的方法可能是编写一个存储过程,在每个客户组中迭代重复的客户,并分别在每个组编号中插入匹配的客户

但是,我已经考虑过了,您可能可以通过子查询来实现这一点。希望我没有让它变得比应该的更复杂,但是这会让您了解第一个重复表(所有四个字段)的内容。请注意,这是未经测试的,因此可能需要进行一些调整

基本上,它获取每个字段组,每个字段组有一个组号,然后获取所有具有这些字段的客户,并分配相同的组号

INSERT INTO FourFieldsDuplicates(group_no, customer_no)
SELECT Groups.group_no, custs.customer_no
FROM (SELECT ROW_NUMBER() OVER(ORDER BY c.field1) AS group_no,
             c.field1, c.field2, c.field3, c.field4
      FROM Customers c
      GROUP BY c.field1, c.field2, c.field3, c.field4
      HAVING COUNT(*) > 1) Groups
INNER JOIN Customers custs ON custs.field1 = Groups.field1
                           AND custs.field2 = Groups.field2
                           AND custs.field3 = Groups.field3
                           AND custs.field4 = Groups.field4
然而,其他的则有点复杂,因为您需要扩展可能性。然后,三个现场小组将是:

INSERT INTO ThreeFieldsDuplicates(group_no, customer_no)
SELECT Groups.group_no, custs.customer_no
FROM (SELECT ROW_NUMBER() OVER(ORDER BY GroupsInner.field1) AS group_no,
             GroupsInner.field1, GroupsInner.field2, 
             GroupsInner.field3, GroupsInner.field4
      FROM (SELECT c.field1, c.field2, c.field3, NULL AS field4
            FROM Customers c
            WHERE NOT EXISTS(SELECT d.customer_no
                       FROM FourFieldsDuplicates d
                       WHERE d.customer_no = c.customer_no)
            GROUP BY c.field1, c.field2, c.field3
            UNION ALL
            SELECT c.field1, c.field2, NULL AS field3, c.field4
            FROM Customers c
            WHERE NOT EXISTS(SELECT d.customer_no
                             FROM FourFieldsDuplicates d
                             WHERE d.customer_no = c.customer_no)
            GROUP BY c.field1, c.field2, c.field4
            UNION ALL
            SELECT c.field1, NULL AS field2, c.field3, c.field4
            FROM Customers c
            WHERE NOT EXISTS(SELECT d.customer_no
                             FROM FourFieldsDuplicates d
                             WHERE d.customer_no = c.customer_no)
            GROUP BY c.field1, c.field3, c.field4
            UNION ALL
            SELECT NULL AS field1, c.field2, c.field3, c.field4
            FROM Customers c
            WHERE NOT EXISTS(SELECT d.customer_no
                             FROM FourFieldsDuplicates d
                             WHERE d.customer_no = c.customer_no)
            GROUP BY c.field2, c.field3, c.field4) GroupsInner
      GROUP BY GroupsInner.field1, GroupsInner.field2, 
               GroupsInner.field3, GroupsInner.field4
      HAVING COUNT(*) > 1) Groups
INNER JOIN Customers custs ON (Groups.field1 IS NULL OR custs.field1 = Groups.field1)
                           AND (Groups.field2 IS NULL OR custs.field2 = Groups.field2)
                           AND (Groups.field3 IS NULL OR custs.field3 = Groups.field3)
                           AND (Groups.field4 IS NULL OR custs.field4 = Groups.field4)

希望这能产生正确的结果,我将最后一个作为练习:-D

我不确定您是否需要对不同字段(如field1=field2)进行相等性检查。
否则这就足够了

编辑

请随意调整testdata,以便根据您的规格为我们提供错误输出的输入

测试数据

DECLARE @Customers TABLE (
  customer_number INTEGER IDENTITY(1, 1)
  , field1 INTEGER
  , field2 INTEGER
  , field3 INTEGER
  , field4 INTEGER)

INSERT INTO @Customers
          SELECT 1, 1, 1, 1
UNION ALL SELECT 1, 1, 1, 1
UNION ALL SELECT 1, 1, 1, NULL
UNION ALL SELECT 1, 1, 1, 2
UNION ALL SELECT 1, 1, 1, 3
UNION ALL SELECT 2, 1, 1, 1
全部相等

SELECT  ROW_NUMBER() OVER (ORDER BY c1.customer_number)
        , c1.field1
        , c1.field2
        , c1.field3
        , c1.field4
FROM    @Customers c1 
        INNER JOIN @Customers c2 ON c2.customer_number > c1.customer_number  
                                    AND ISNULL(c2.field1, 0) = ISNULL(c1.field1, 0) 
                                    AND ISNULL(c2.field2, 0) = ISNULL(c1.field2, 0)
                                    AND ISNULL(c2.field3, 0) = ISNULL(c1.field3, 0)
                                    AND ISNULL(c2.field4, 0) = ISNULL(c1.field4, 0)
一个字段不同

SELECT  ROW_NUMBER() OVER (ORDER BY field1, field2, field3, field4)
        , field1
        , field2
        , field3
        , field4
FROM    (
          SELECT  DISTINCT c1.field1
                  , c1.field2
                  , c1.field3
                  , field4 = NULL
          FROM    @Customers c1 
                  INNER JOIN @Customers c2 ON c2.customer_number > c1.customer_number  
                                             AND c2.field1 = c1.field1 
                                             AND c2.field2 = c1.field2 
                                             AND c2.field3 = c1.field3 
                                             AND ISNULL(c2.field4, 0) <> ISNULL(c1.field4, 0) 
          UNION ALL
          SELECT  DISTINCT c1.field1
                  , c1.field2
                  , NULL
                  , c1.field4
          FROM    @Customers c1 
                  INNER JOIN @Customers c2 ON c2.customer_number > c1.customer_number  
                                             AND c2.field1 = c1.field1 
                                             AND c2.field2 = c1.field2 
                                             AND ISNULL(c2.field3, 0) <> ISNULL(c1.field3, 0) 
                                             AND c2.field4 = c1.field4 
          UNION ALL
          SELECT  DISTINCT c1.field1
                  , NULL
                  , c1.field3
                  , c1.field4
          FROM    @Customers c1 
                  INNER JOIN @Customers c2 ON c2.customer_number > c1.customer_number  
                                             AND c2.field1 = c1.field1 
                                             AND ISNULL(c2.field2, 0) <> ISNULL(c1.field2, 0) 
                                             AND c2.field3 = c1.field3 
                                             AND c2.field4 = c1.field4 
          UNION ALL
          SELECT  DISTINCT NULL
                  , c1.field2
                  , c1.field3
                  , c1.field4
          FROM    @Customers c1 
                  INNER JOIN @Customers c2 ON c2.customer_number > c1.customer_number  
                                             AND ISNULL(c2.field1, 0) <> ISNULL(c1.field1, 0)
                                             AND c2.field2 = c1.field2 
                                             AND c2.field3 = c1.field3 
                                             AND c2.field4 = c1.field4 
      ) c
选择上方的行数()(按字段1、字段2、字段3、字段4排序)
,字段1
,第2栏
,第3栏
,第4栏
从(
选择不同的c1.field1
,c1.2
,c1.3
,field4=NULL
来自@Customers c1
内部连接@Customers c2 ON c2.customer\u number>c1.customer\u number
和c2.field1=c1.field1
和c2.field2=c1.field2
和c2.field3=c1.field3
和ISNULL(c2.field4,0)ISNULL(c1.field4,0)
联合所有
选择不同的c1.field1
,c1.2
无效的
,c1.4
来自@Customers c1
内部连接@Customers c2 ON c2.customer\u number>c1.customer\u number
和c2.field1=c1.field1
和c2.field2=c1.field2
和ISNULL(c2.field3,0)ISNULL(c1.field3,0)
和c2.field4=c1.field4
联合所有
选择不同的c1.field1
无效的
,c1.3
,c1.4
来自@Customers c1
内部连接@Customers c2 ON c2.customer\u number>c1.customer\u number
和c2.field1=c1.field1
和ISNULL(c2.field2,0)ISNULL(c1.field2,0)
和c2.field3=c1.field3
和c2.field4=c1.field4
联合所有
选择不同的空值
,c1.2
,c1.3
,c1.4
来自@Customers c1
内部连接@Customers c2 ON c2.customer\u number>c1.customer\u number
和ISNULL(c2.field1,0)ISNULL(c1.field1,0)
和c2.field2=c1.field2
和c2.field3=c1.field3
和c2.field4=c1.field4
)c

这里有一个在表中查找重复项的简便查询。假设要查找表中存在多次的所有电子邮件地址:

SELECT email, COUNT(email) AS NumOccurrences
FROM users
GROUP BY email
HAVING ( COUNT(email) > 1 )
您还可以使用此技术查找只出现一次的行:

SELECT email
FROM users
GROUP BY email
HAVING ( COUNT(email) = 1 )

您可以简单地编写类似的内容来计算重复条目,我认为这是可行的:

use *DATABASE_NAME*
go
SELECT     *YOUR_FIELD*, COUNT(*) AS dupes  
FROM         *YOUR_TABLE_NAME*
GROUP BY *YOUR_FIELD* 
HAVING      (COUNT(*) > 1)

享受

使用
CUBE()
有一种干净的方法,可以通过所有可能的列组合进行聚合

SELECT
  field1,field2,field3,field4
 ,duplicate_row_count = COUNT(*)
 ,grp_id = GROUPING_ID(field1,field2,field3,field4)
INTO #duplicate_rows
FROM table_name
GROUP BY CUBE(field1,field2,field3,field4)
HAVING COUNT(*) > 1
  AND GROUPING_ID(field1,field2,field3,field4) IN (0,1,2,4,8,3,5,6,9,10,12)
数字(0,1,2,4,8,3,5,6,9,10,12)只是我们关心的分组集的位掩码(0000000 100100100,…,10101100)——那些具有4、3或2个匹配项的

然后使用将#duplicate_行中的null视为通配符的技术将其连接回原始表

SELECT a.*
FROM table_name a
INNER JOIN #duplicate_rows b
  ON  NULLIF(b.field1,a.field1) IS NULL
  AND NULLIF(b.field2,a.field2) IS NULL
  AND NULLIF(b.field3,a.field3) IS NULL
  AND NULLIF(b.field4,a.field4) IS NULL
--WHERE grp_id IN (0)             --Use this for 4 matches
--WHERE grp_id IN (1,2,4,8)       --Use this for 3 matches
--WHERE grp_id IN (3,5,6,9,10,12) --Use this for 2 matches

@Ic写“c.field1作为组号”正确吗?组号为int,字段1为varchar。也许我应该使用一些临时表?@hgulyan它实际上是第行,作为组号。实际上,它不起作用。GroupId对于每一行都是唯一的,因为它只是按field1排序,我认为它在按group by排序之前起作用,这就是为什么它只向所有行添加行号,并且我希望对重复的行使用相同ID的组。如果其中一个字段是smalldatetime,会有任何问题吗?@hgulyan如果行号()工作不正常,请尝试将其包装到新的子选择中(
SELECT ROW_NUMBER(),…FROM(SELECT…groupby…
)。随着smalldatetime的发展,我不这么认为