Sas PROC SQL内部联接查询

Sas PROC SQL内部联接查询,sas,proc-sql,Sas,Proc Sql,我正在学习sas proc sql语句。我观察到,尽管以下两种方法的结果相同,但实时和cpu时间是不同的。我想知道为什么会有这种差异 data data1; input name1 $ choice $; datalines; John A Mary B Peter C ; run; data data2; input name2 $ choice2 $; datalines; John B Mary C Peter B run; 方法1: proc sql;

我正在学习sas proc sql语句。我观察到,尽管以下两种方法的结果相同,但实时和cpu时间是不同的。我想知道为什么会有这种差异

data data1;
    input name1 $ choice $;
    datalines;
John A
Mary B
Peter C
;
run;

data data2;
    input name2 $ choice2 $;
    datalines;
John B
Mary C
Peter B
run;
方法1:

proc sql;
    select a.*, b.*
    from data1 as a, data2 as b 
    where a.name1= data2.name2
    ;
quit;
方法2:

proc sql;
    select a.* , b.*
    from data1 as a inner join data2 as b
        on a.name1 = b.name2
    ;
quit;

为了便于讨论,忽略无法解释的html文件以及CPU和执行时间的任何随机波动,简短的回答可能只是SAS默认以不同的方式处理不同的连接。对于像这里的示例那样小的文件来说,这可能没有多大区别,但值得了解

一个较长的答案是,这可能在某种程度上取决于您使用的SAS的确切版本。在带有示例数据集的SAS 9.4中,如果您将
proc sql
留给自己的设备,我看到生成的查询计划对于两个联接都是相同的:

52         /* Method 1: */
 53         
 54         proc sql _method;
 55             select a.*, b.*
 56             from data1 as a, data2 as b
 57             where a.name1= data2.name2
 58             ;

 NOTE: SQL execution methods chosen are:

       sqxslct
           sqxjhsh
               sqxsrc( WORK.DATA1(alias = A) )
               sqxsrc( WORK.DATA2(alias = B) )
 59         quit;
 NOTE: PROCEDURE SQL used (Total process time):
       real time           0.01 seconds
       user cpu time       0.01 seconds
       system cpu time     0.00 seconds
       memory              5469.21k
       OS Memory           32668.00k
       Timestamp           09/08/2015 06:43:09 PM
       Step Count                        457  Switch Count  50
       Page Faults                       0
       Page Reclaims                     87
       Page Swaps                        0
       Voluntary Context Switches        156
       Involuntary Context Switches      14
       Block Input Operations            0
       Block Output Operations           16


 60         /* Method 2: */
 61         
 62         proc sql _method;
 63             select a.* , b.*
 64             from data1 as a inner join data2 as b
 65                 on a.name1 = b.name2
 66             ;

 NOTE: SQL execution methods chosen are:

       sqxslct
           sqxjhsh
               sqxsrc( WORK.DATA1(alias = A) )
               sqxsrc( WORK.DATA2(alias = B) )
 67         quit;
 NOTE: PROCEDURE SQL used (Total process time):
       real time           0.01 seconds
       user cpu time       0.01 seconds
       system cpu time     0.00 seconds
       memory              5467.81k
       OS Memory           32924.00k
       Timestamp           09/08/2015 06:43:09 PM
       Step Count                        458  Switch Count  50
       Page Faults                       0
       Page Reclaims                     26
       Page Swaps                        0
       Voluntary Context Switches        167
       Involuntary Context Switches      11
       Block Input Operations            0
       Block Output Operations           8
您还可以通过
\u tree
选项确认这一点,该选项将生成更详细的查询计划版本。有关
\u方法
\u树
选项输出的更多详细信息,请参阅

但是,如果您引导query planner使用不同的连接算法,则会出现一些差异:

 52         /* Method 1: */
 53         
 54         proc sql _method magic=101;
 55             select a.*, b.*
 56             from data1 as a, data2 as b
 57             where a.name1= data2.name2
 58             ;
 NOTE: PROC SQL planner chooses sequential loop join.

 NOTE: SQL execution methods chosen are:

       sqxslct
           sqxjsl
               sqxsrc( WORK.DATA1(alias = A) )
               sqxsrc( WORK.DATA2(alias = B) )
 59         quit;
 NOTE: PROCEDURE SQL used (Total process time):
       real time           0.01 seconds
       user cpu time       0.01 seconds
       system cpu time     0.01 seconds
       memory              5468.53k
       OS Memory           32668.00k
       Timestamp           09/08/2015 06:41:54 PM
       Step Count                        451  Switch Count  52
       Page Faults                       0
       Page Reclaims                     101
       Page Swaps                        0
       Voluntary Context Switches        182
       Involuntary Context Switches      14
       Block Input Operations            0
       Block Output Operations           8


 60         /* Method 2: */
 61         
 62         proc sql _method magic=102;
 63             select a.* , b.*
 64             from data1 as a inner join data2 as b
 65                 on a.name1 = b.name2
 66             ;
 NOTE: PROC SQL planner chooses merge join.

 NOTE: SQL execution methods chosen are:

       sqxslct
           sqxjm
               sqxsort
                   sqxsrc( WORK.DATA1(alias = A) )
               sqxsort
                   sqxsrc( WORK.DATA2(alias = B) )
 67         quit;
 NOTE: PROCEDURE SQL used (Total process time):
       real time           0.01 seconds
       user cpu time       0.01 seconds
       system cpu time     0.00 seconds
       memory              5467.12k
       OS Memory           32924.00k
       Timestamp           09/08/2015 06:41:54 PM
       Step Count                        452  Switch Count  60
       Page Faults                       0
       Page Reclaims                     69
       Page Swaps                        0
       Voluntary Context Switches        197
       Involuntary Context Switches      13
       Block Input Operations            0
       Block Output Operations           16
有关
magic=
选项的更多详细信息,请参阅。我不建议在任何类型的生产环境中使用它,但它有时对这类事情很有用


考虑到这么小的文件在CPU时间上的微小差异,即使在强制SAS使用不同的合并方法时,我也非常怀疑是其他因素造成了这种情况;可能是神秘的html文件。

如果不指定连接类型,则使用的是自然连接,并且where子句应用于结果。如果指定的内部联接启用,则结果将在联接过程中过滤。我相信其他人会更好地解释这一点,所以我不会将此作为回答。在我看来,您应该始终明确指定您的连接类型。有关SAS中SQL处理顺序的更多信息@用户,请在问题中添加时间详细信息。这将是有帮助的。请多次运行同一代码并交叉检查,因为其他系统进程可能会影响此进程。已修订。您可以看到,在一台计算机上运行procsql的速度要快得多。有什么原因吗?