Sql 如何优化此重复项删除查询？_Sql_Sql Server_Sql Server 2000

Sql 如何优化此重复项删除查询？

sql sql-server

Sql 如何优化此重复项删除查询？,sql,sql-server,sql-server-2000,Sql,Sql Server,Sql Server 2000,我正在使用SQL 2000。我已经达到了可以根据一组复杂的条件删除所有不需要的重复项的程度，但是查询现在需要几个小时才能完成，而以前只需要大约3.5分钟就可以获得包含重复项的数据为清楚起见：我可以有一个重复的rpt.Name字段，只要rpt.HostName或rpt.SystemSerialNumber字段也不同。此外，我还必须根据四个不同列的时间戳确定保留哪个条目，因为其中一些列缺少时间戳非常感谢您的帮助 SELECT rpt.[Name], rpt.LastAgentExecutio

我正在使用SQL 2000。我已经达到了可以根据一组复杂的条件删除所有不需要的重复项的程度，但是查询现在需要几个小时才能完成，而以前只需要大约3.5分钟就可以获得包含重复项的数据

为清楚起见：我可以有一个重复的rpt.Name字段，只要rpt.HostName或rpt.SystemSerialNumber字段也不同。此外，我还必须根据四个不同列的时间戳确定保留哪个条目，因为其中一些列缺少时间戳

非常感谢您的帮助

SELECT 
rpt.[Name],
rpt.LastAgentExecution,
rpt.GroupName,
rpt.PackageName,
rpt.PackageVersion,
rpt.ProcedureName,
rpt.HostName,
rpt.SystemSerialNumber,
rpt.JobCreationTime,
rpt.JobActivationTime,
rpt.[Job Completion Time]
FROM DSM_StandardGroupMembersProcedureActivityViewExt rpt
WHERE
(
  (
      rpt.GroupName = 'Adobe Acrobat 7 Deploy'
   OR rpt.GroupName = 'Adobe Acrobat 8 Deploy'
  )
  AND
  (
      (rpt.PackageName = 'Adobe Acrobat 7' AND rpt.PackageVersion = '-1.0')
   OR (rpt.PackageName = 'Adobe Acrobat 8' AND rpt.PackageVersion = '-3.0')
  )
)
AND NOT EXISTS
(
  SELECT *
  FROM   DSM_StandardGroupMembersProcedureActivityViewExt rpt_dupe
  WHERE
  (
    (
     rpt.GroupName = 'Adobe Acrobat 7 Deploy'
      OR rpt.GroupName = 'Adobe Acrobat 8 Deploy'
    )
    AND
    (
     (rpt.PackageName = 'Adobe Acrobat 7' AND rpt.PackageVersion = '-1.0')
      OR (rpt.PackageName = 'Adobe Acrobat 8' AND rpt.PackageVersion = '-3.0')
    )
    AND
    (
      (rpt_dupe.[Name] = rpt.[Name])
      AND
      (
       (rpt_dupe.SystemSerialNumber = rpt.SystemSerialNumber)
    OR (rpt_dupe.HostName = rpt.HostName)
      )
      AND
      (
       (rpt_dupe.LastAgentExecution    < rpt.LastAgentExecution)
    OR (rpt_dupe.JobActivationTime     < rpt.JobActivationTime)
    OR (rpt_dupe.JobCreationTime       < rpt.JobCreationTime)
    OR (rpt_dupe.[Job Completion Time] < rpt.[Job Completion Time])
      )
    )
  )
)

试着按照这些思路：

SELECT t_main.columns
FROM table as t_main
LEFT JOIN 
(
SELECT name, MAX(lastAgentExecution)..... FROM table GROUP BY name,serialnumber, hostname
)
as t_joinSerial
ON t_main.name=t_joinSerial.name,lastAgentExecution etc.
where (t_main.AdobeStuff and t_joinSerial is NULL)

原因是not exists子句

有人建议将其重写为左外连接：

 from <big query> left outer join
      <dups query>
      on <all the fields that constitute a match>
 where <dups query>.<some field> is null

也就是说，按需要区分的列汇总表。选择其中一行。这里，我假设有一个id字段来唯一地标识每一行。您可能必须使用多个字段的组合，例如名称和日期。没有身份证，这更具挑战性。在较新版本的SQL server中，可以使用行号

请发布一个实际执行计划的屏幕截图，最好不是估计的。可能是一个愚蠢的评论，但您是否尝试使用关键字DISTINCT来删除重复项，而不是使用not EXISTS子句？Dan，您假设他的重复项定义完全基于查询返回的字段集。虽然这是一个合理的假设，但不一定是真的。我可以有一个重复的名称字段，只要主机名或SystemSerialNumber也不同。此外，我还必须根据四个不同列的时间戳来确定保留哪个条目，因为其中一些列缺少时间戳。因此，您的SQL是错误的：您选择了第二个选择中没有的所有内容，上面说：选择所有具有相同名称和相同序列号或相同主机的内容-这也会返回相同的\u naem/相同的\u序列号/不同的\u主机，而这些主机不在第一个选择中，那么…如果MAXlastAgentExecution返回两个名称，然后我必须依赖，按照优先顺序：MAXJobActivationTime或MAXJobCreationtime或MAXJobCompletionTime？在内部选择中，您没有一个特定的记录，但每个字段在不同记录上的最大值。如果您在一个记录中拥有最高的lastAgent，而在另一个记录中拥有最高的作业激活率，那么它将不起作用。这种方法会使情况变得复杂。这是你必须经常提出的问题吗？是否可以选择使用该表中的其他字段在多个步骤中解决该问题？顺便问一下，你有一个ID字段吗？

with t as (
    SELECT rpt.[Name], rpt.LastAgentExecution, rpt.GroupName, rpt.PackageName,
           rpt.PackageVersion, rpt.ProcedureName, rpt.HostName, rpt.SystemSerialNumber, 
            rpt.JobCreationTime, rpt.JobActivationTime, rpt.[Job Completion Time]
    FROM DSM_StandardGroupMembersProcedureActivityViewExt rpt
    WHERE rpt.GroupName in ('Adobe Acrobat 7 Deploy', 'Adobe Acrobat 8 Deploy') AND
          ((rpt.PackageName = 'Adobe Acrobat 7' AND rpt.PackageVersion = '-1.0') OR
           (rpt.PackageName = 'Adobe Acrobat 8' AND rpt.PackageVersion = '-3.0')
          )
 )
 select t.*
 from t join
      (select name, ..., max(id)
       from t
       group by name, ...
      ) tsum
      on t.id = tsum.id