Sql server UNION与SELECT DISTINCT和UNION ALL性能对比

Sql server UNION与SELECT DISTINCT和UNION ALL性能对比,sql-server,tsql,Sql Server,Tsql,这两种性能之间有什么区别吗 -- eliminate duplicates using UNION SELECT col1,col2,col3 FROM Table1 UNION SELECT col1,col2,col3 FROM Table2 UNION SELECT col1,col2,col3 FROM Table3 UNION SELECT col1,col2,col3 FROM Table4 UNION SELECT col1,col2,col3 FROM Table5

这两种性能之间有什么区别吗

-- eliminate duplicates using UNION
SELECT col1,col2,col3 FROM Table1 
UNION SELECT col1,col2,col3 FROM Table2 
UNION SELECT col1,col2,col3 FROM Table3 
UNION SELECT col1,col2,col3 FROM Table4 
UNION SELECT col1,col2,col3 FROM Table5       
UNION SELECT col1,col2,col3 FROM Table6       
UNION SELECT col1,col2,col3 FROM Table7       
UNION SELECT col1,col2,col3 FROM Table8       

-- eliminate duplicates using DISTINCT    
SELECT DISTINCT * FROM
(     
    SELECT col1,col2,col3 FROM Table1 
    UNION ALL SELECT col1,col2,col3 FROM Table2 
    UNION ALL SELECT col1,col2,col3 FROM Table3 
    UNION ALL SELECT col1,col2,col3 FROM Table4 
    UNION ALL SELECT col1,col2,col3 FROM Table5       
    UNION ALL SELECT col1,col2,col3 FROM Table6       
    UNION ALL SELECT col1,col2,col3 FROM Table7       
    UNION ALL SELECT col1,col2,col3 FROM Table8       
) x   

UnionUnion all之间的区别在于
Union all
不会消除重复的行,相反,它只是从符合查询细节的所有表中提取所有行,并将它们合并到一个表中

UNION语句有效地对结果集执行
selectdistinct

如果选择Distinct from Union All结果集,则输出将等于联合结果集

编辑:

CPU成本方面的性能:

让我举例说明:

我有两个问题。一个是联合,另一个是联合

SET STATISTICS TIME ON
GO
 
select distinct * from (select * from dbo.user_LogTime
union all
select * from dbo.user_LogTime) X 
GO

SET STATISTICS TIME OFF

SET STATISTICS TIME ON
GO
 
select * from dbo.user_LogTime
union
select * from dbo.user_LogTime
GO

SET STATISTICS TIME OFF
我确实在SMSS的同一查询窗口中运行了这两个。 让我们看看SMS中的执行计划:

实际情况是,使用Union-AllDistinct的查询将比使用Union的查询占用更多的CPU成本

按时完成任务:

UNION ALL

(1172 row(s) affected)

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 39 ms.
联合

(1172 row(s) affected)

SQL Server Execution Times:
   CPU time = 10 ms,  elapsed time = 25 ms.

因此,Union比Union好得多,所有Union在性能方面都具有独特的功能

另一个点到为止的示例说明了四种可能的情况:

/* with each case we should expect a return set:
(1) DISTINCT UNION {1,2,3,4,5} - is redundant with case (2)
(2) UNION {1,2,3,4,5} - more efficient?
(3) DISTINCT UNION ALL {1,2,2,3,3,4,4,5} 
(4) UNION ALL {1,1,2,2,2,3,3,4,4,5} 
*/

declare @t1 table (c1 varchar(15));
declare @t2 table (c2 varchar(15));

insert into @t1 values ('1'),('1'),('2'),('3'),('4');

insert into @t2 values ('2'),('2'),('3'),('4'),('5');


select DISTINCT * from @t1 --case (1)
UNION
select DISTINCT * from @t2 order by c1

select * from @t1 --case (2)    
UNION
select * from @t2 order by c1

select DISTINCT * from @t1 --case (3)
UNION ALL
select DISTINCT * from @t2 order by c1

select * from @t1 --case (4)   
UNION ALL
select * from @t2 order by c1
UNION DISTINCTUNION ALL之间的比较 此查询用于为下游系统创建具有其他备用ID的扩展员工表。此示例来自mySQL 8.0.20环境

对于如下所示的数据和查询,测试产生了显著差异:

UNION ALL       8.983 sec  
UNION DISTINCT 15.344 sec
为了显示本例的规模和复杂性,表大小和查询代码如下所示

hqsource 600K records
accountingemppos 180K 
accountingposld 200K
emp_no_accountingnumeric 20
First UNION block is approx 550K records, second approx 50K

               SELECT a.`emp_no_imported` AS `emp_no`,
              a.`supervisor_emp_no`,
              a.`first name`,
              a.`middle name`,
              a.`last name`,
              a.`jobtitle`,
              a.`status`,
              CASE WHEN rida.`accounting_emp_no` IS NOT NULL THEN
                    rida.`accounting_emp_no`
              ELSE 
                    a.`emp_no_imported`
              END AS `accounting_id`,
              CASE WHEN epfp.`emp_no` IS NOT NULL THEN
                        CASE WHEN `sridf`.`emp_no` IS NOT NULL THEN
                                `sridf`.`accounting_emp_no`
                            ELSE
                                epfp.`emp_no`
                            END
                    ELSE
                        CASE WHEN epp.`emp_no` IS NOT NULL THEN
                            CASE WHEN `srids`.`emp_no` IS NOT NULL THEN
                                `srids`.`accounting_emp_no`
                            ELSE                                        
                                epp.`emp_no`
                            END
                    ELSE
                        CASE WHEN `srida`.`emp_no` IS NOT NULL THEN
                            `srida`.`accounting_emp_no`
                        ELSE
                            a.`supervisor_emp_no`
                        END
                    END
                END AS `accounting_s_emp_no`,
                ep.`emp_no` AS `traas_emp_no`,
                epp.`emp_no` AS `traas_parent_emp_no`
       FROM `hqsource`.hq_people a
       LEFT OUTER JOIN `hqsource`.`emp_no_accountingnumeric` `rida` ON `rida`.emp_no = a.`emp_no_imported`
       LEFT OUTER JOIN `hqsource`.`emp_no_accountingnumeric` `srida` ON `srida`.emp_no = a.`supervisor_emp_no`

       LEFT OUTER JOIN `traas`.`accountingemppos_data_extract` ep  ON ep.`emp_no` = a.`emp_no_imported` AND ep.`End` = '2899-12-31' AND ep.`Primary` = 'Y'
       LEFT OUTER JOIN `epe`.`accountingposld_data_extract` p ON p.`RangeGID` = ep.`GID`
       LEFT OUTER JOIN `traas`.`accountingemppos_data_extract` epp  ON epp.`GID` = p.`ParentGID` AND epp.`End` = '2899-12-31' AND epp.`Primary` = 'Y'
       LEFT OUTER JOIN `hqsource`.`emp_no_accountingnumeric` `rids` ON `rids`.emp_no = ep.`emp_no`
       LEFT OUTER JOIN `hqsource`.`emp_no_accountingnumeric` `srids` ON `srids`.emp_no = epp.`emp_no` AND epp.`End` = '2899-12-31' AND epp.`Primary` = 'Y'

       LEFT OUTER JOIN `epe`.`accountingemppos_data_extract_filtered` epf  ON epf.`emp_no` = a.`emp_no_imported` AND epf.`End` = '2899-12-31' AND epf.`Primary` = 'Y'
       LEFT OUTER JOIN `epe`.`accountingposld_data_extract` pf ON pf.`RangeGID` = epf.`GID` 
       LEFT OUTER JOIN `epe`.`accountingemppos_data_extract_filtered` epfp  ON epfp.`GID` = pf.`ParentGID` AND epfp.`End` = '2899-12-31' AND epfp.`Primary` = 'Y'
       LEFT OUTER JOIN `hqsource`.`emp_no_accountingnumeric` `ridf` ON `ridf`.emp_no = epf.`emp_no` AND epf.`End` = '2899-12-31' AND epf.`Primary` = 'Y'
       LEFT OUTER JOIN `hqsource`.`emp_no_accountingnumeric` `sridf` ON `sridf`.emp_no = epfp.`emp_no` AND epfp.`End` = '2899-12-31' AND epfp.`Primary` = 'Y'
       WHERE a.`emp_no_imported` REGEXP ('^[a-z]{2}\\d{5}.$') 

UNION ALL
-- UNION DISTINCT

    SELECT a.`emp_no_imported` AS `emp_no`, a.`supervisor_emp_no` AS `s_emp_no`, u.`First_Name`, 'ƒ' AS `MI`, u.`Last_Name`, u.`Job_Title`, NULL AS `status`, 
          CASE WHEN rid.`accounting_emp_no` IS NULL THEN
            ep.`emp_no`
          ELSE 
            rid.`accounting_emp_no`
          END AS `accounting_emp_no`,
          CASE WHEN `srid`.`accounting_emp_no` IS NULL THEN
            epp.`emp_no`
          ELSE
            `srid`.`accounting_emp_no`
          END AS `accounting_s_emp_no`,
          ep.`emp_no` AS `traas_emp_no`,
          epp.`emp_no` AS `traas_parent_emp_no`
    FROM `epe`.`accountingemppos_data_extract_filtered`  ep

    LEFT OUTER JOIN  `hqsource`.`hq_people` a ON a.`emp_no_imported` = ep.`emp_no`
    LEFT OUTER JOIN `epe`.`accountingposld_data_extract` p ON p.`RangeGID` =  ep.`GID`
    LEFT OUTER JOIN `epe`.`accountingemppos_data_extract_filtered` epp ON epp.`GID` = p.`ParentGID`
    LEFT OUTER JOIN `siebel`.`users_all_output` u ON u.`LOGIN` = ep.`emp_no`
    LEFT OUTER JOIN `hqsource`.`emp_no_accountingnumeric` `rid` ON `rid`.emp_no = ep.`emp_no`
    LEFT OUTER JOIN `hqsource`.`emp_no_accountingnumeric` `srid` ON `srid`.emp_no = epp.`emp_no`
    WHERE 
        ep.`End` = '2899-12-31' AND
        epp.`End` = '2899-12-31' AND
        p.`End` = '2899-12-31' AND
        ep.emp_no REGEXP ('^F\\d{8}$|^V[0-3]\\d{5}$')
        
ORDER BY LENGTH(accounting_emp_no) ASC, accounting_emp_no ASC
)


两个联合块中的WHERE子句保证结果是唯一的。(此查询已存在多年,每天都在运行。我希望我早一点尝试此查询)。字段名已被混淆

将所有内容包装在“选择区”中会创建一个(有点)昂贵的临时表。除此之外我不明白为什么DISTICNT。。。UNION ALL将比(DISTINCE)UNIONALL更快。在某些情况下,UNIONALL都显示不同的执行计划,但在其他情况下是相同的,现在这是一个很大的困惑。我相信查询优化器会发现UNIONALL没有唯一的答案。有许多因素会影响查询计划,包括返回的列和使用的索引。最好的方法是逐个检查查询计划。我知道两个查询的输出都是相同的。我的问题是关于性能差异。在某些情况下,两者都显示不同的执行计划,但在另一些情况下是相同的,现在这是一个很大的困惑。如果有大约6到8个表在使用,会发生什么?我使用了11个
UNION
/
UNION ALL
,令人惊讶的是,我得到了
UNION ALL
UNION
工作得更快。表的数量越少,
UNION
似乎越快。你能证实这一点吗?你使用的是同一张表还是不同的表?如果你使用的是同一张表,它会给你相同的成本,或者Union比Union更快。我已经检查了5个不同的表格,但工会比所有工会都快。