Sas PROC SQL内部联接查询
我正在学习sas proc sql语句。我观察到,尽管以下两种方法的结果相同,但实时和cpu时间是不同的。我想知道为什么会有这种差异Sas PROC SQL内部联接查询,sas,proc-sql,Sas,Proc Sql,我正在学习sas proc sql语句。我观察到,尽管以下两种方法的结果相同,但实时和cpu时间是不同的。我想知道为什么会有这种差异 data data1; input name1 $ choice $; datalines; John A Mary B Peter C ; run; data data2; input name2 $ choice2 $; datalines; John B Mary C Peter B run; 方法1: proc sql;
data data1;
input name1 $ choice $;
datalines;
John A
Mary B
Peter C
;
run;
data data2;
input name2 $ choice2 $;
datalines;
John B
Mary C
Peter B
run;
方法1:
proc sql;
select a.*, b.*
from data1 as a, data2 as b
where a.name1= data2.name2
;
quit;
方法2:
proc sql;
select a.* , b.*
from data1 as a inner join data2 as b
on a.name1 = b.name2
;
quit;
为了便于讨论,忽略无法解释的html文件以及CPU和执行时间的任何随机波动,简短的回答可能只是SAS默认以不同的方式处理不同的连接。对于像这里的示例那样小的文件来说,这可能没有多大区别,但值得了解 一个较长的答案是,这可能在某种程度上取决于您使用的SAS的确切版本。在带有示例数据集的SAS 9.4中,如果您将
proc sql
留给自己的设备,我看到生成的查询计划对于两个联接都是相同的:
52 /* Method 1: */
53
54 proc sql _method;
55 select a.*, b.*
56 from data1 as a, data2 as b
57 where a.name1= data2.name2
58 ;
NOTE: SQL execution methods chosen are:
sqxslct
sqxjhsh
sqxsrc( WORK.DATA1(alias = A) )
sqxsrc( WORK.DATA2(alias = B) )
59 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.01 seconds
user cpu time 0.01 seconds
system cpu time 0.00 seconds
memory 5469.21k
OS Memory 32668.00k
Timestamp 09/08/2015 06:43:09 PM
Step Count 457 Switch Count 50
Page Faults 0
Page Reclaims 87
Page Swaps 0
Voluntary Context Switches 156
Involuntary Context Switches 14
Block Input Operations 0
Block Output Operations 16
60 /* Method 2: */
61
62 proc sql _method;
63 select a.* , b.*
64 from data1 as a inner join data2 as b
65 on a.name1 = b.name2
66 ;
NOTE: SQL execution methods chosen are:
sqxslct
sqxjhsh
sqxsrc( WORK.DATA1(alias = A) )
sqxsrc( WORK.DATA2(alias = B) )
67 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.01 seconds
user cpu time 0.01 seconds
system cpu time 0.00 seconds
memory 5467.81k
OS Memory 32924.00k
Timestamp 09/08/2015 06:43:09 PM
Step Count 458 Switch Count 50
Page Faults 0
Page Reclaims 26
Page Swaps 0
Voluntary Context Switches 167
Involuntary Context Switches 11
Block Input Operations 0
Block Output Operations 8
您还可以通过\u tree
选项确认这一点,该选项将生成更详细的查询计划版本。有关\u方法
和\u树
选项输出的更多详细信息,请参阅
但是,如果您引导query planner使用不同的连接算法,则会出现一些差异:
52 /* Method 1: */
53
54 proc sql _method magic=101;
55 select a.*, b.*
56 from data1 as a, data2 as b
57 where a.name1= data2.name2
58 ;
NOTE: PROC SQL planner chooses sequential loop join.
NOTE: SQL execution methods chosen are:
sqxslct
sqxjsl
sqxsrc( WORK.DATA1(alias = A) )
sqxsrc( WORK.DATA2(alias = B) )
59 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.01 seconds
user cpu time 0.01 seconds
system cpu time 0.01 seconds
memory 5468.53k
OS Memory 32668.00k
Timestamp 09/08/2015 06:41:54 PM
Step Count 451 Switch Count 52
Page Faults 0
Page Reclaims 101
Page Swaps 0
Voluntary Context Switches 182
Involuntary Context Switches 14
Block Input Operations 0
Block Output Operations 8
60 /* Method 2: */
61
62 proc sql _method magic=102;
63 select a.* , b.*
64 from data1 as a inner join data2 as b
65 on a.name1 = b.name2
66 ;
NOTE: PROC SQL planner chooses merge join.
NOTE: SQL execution methods chosen are:
sqxslct
sqxjm
sqxsort
sqxsrc( WORK.DATA1(alias = A) )
sqxsort
sqxsrc( WORK.DATA2(alias = B) )
67 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.01 seconds
user cpu time 0.01 seconds
system cpu time 0.00 seconds
memory 5467.12k
OS Memory 32924.00k
Timestamp 09/08/2015 06:41:54 PM
Step Count 452 Switch Count 60
Page Faults 0
Page Reclaims 69
Page Swaps 0
Voluntary Context Switches 197
Involuntary Context Switches 13
Block Input Operations 0
Block Output Operations 16
有关magic=
选项的更多详细信息,请参阅。我不建议在任何类型的生产环境中使用它,但它有时对这类事情很有用
考虑到这么小的文件在CPU时间上的微小差异,即使在强制SAS使用不同的合并方法时,我也非常怀疑是其他因素造成了这种情况;可能是神秘的html文件。如果不指定连接类型,则使用的是自然连接,并且where子句应用于结果。如果指定的内部联接启用,则结果将在联接过程中过滤。我相信其他人会更好地解释这一点,所以我不会将此作为回答。在我看来,您应该始终明确指定您的连接类型。有关SAS中SQL处理顺序的更多信息@用户,请在问题中添加时间详细信息。这将是有帮助的。请多次运行同一代码并交叉检查,因为其他系统进程可能会影响此进程。已修订。您可以看到,在一台计算机上运行procsql的速度要快得多。有什么原因吗?