Loops SAS循环:根据条件垂直求和观测值
我有一个数据集,看起来像: 邮政编码车辆总数Loops SAS循环:根据条件垂直求和观测值,loops,sas,Loops,Sas,我有一个数据集,看起来像: 邮政编码车辆总数 11111 3 11111 4 23232 1 443310 44331 10 18860 6 18860 6 18860 6 188608 有300多万行像这样,有不同的拉链。我需要对每个邮政编码的总汽车数求和,这样生成的表如下所示 邮政编码车辆总数 11111 7 23232 1 44331 10 18860 26 . . 考虑到数据集的大小,手动将ZIP输入到代码中不是一个选项。想法 要求和的变量是“ZipCodes”,因此将进入“C
- 11111 3
- 11111 4
- 23232 1
- 443310
- 44331 10
- 18860 6
- 18860 6
- 18860 6
- 188608
- 11111 7
- 23232 1
- 44331 10
- 18860 26 . .
proc summary data=Input_table;
class ZipCodes;
var Total_cars;
output out=Output_table
sum()=;
run;
proc summary data=Input_table;
class ZipCodes;
var Total_cars;
output out=Output_table
sum()=;
run;
您可以使用procsql。这是一个非常简单的步骤
proc sql;
create table new as
select Zipcodes, sum(Total Cars) as total_cars from table_have group by Zipcodes
;
退出 您可以使用proc-sql。这是一个非常简单的步骤
proc sql;
create table new as
select Zipcodes, sum(Total Cars) as total_cars from table_have group by Zipcodes
;
退出 到目前为止,两种答案都是可以的,但下面是两种可能方法的更详细解释: 过程SQL方法
PROC SQL;
CREATE TABLE output_table AS
SELECT ZipCodes,
SUM(Total_Cars) as Total_Cars
FROM input_table
GROUP BY ZipCodes;
QUIT;
PROC SUMMARY DATA=input_table NWAY;
CLASS ZipCodes;
VAR Total_Cars;
OUTPUT OUT=output_table (DROP=_TYPE_ _FREQ_) SUM()=;
RUN;
PROC SORT DATA=input_table (RENAME=(Total_Cars = tc)) OUT=_temp;
BY ZipCodes;
RUN;
DATA output_table (DROP=TC);
SET _temp;
BY ZipCodes;
IF first.ZipCodes THEN Total_Cars = 0;
Total_Cars+tc;
IF last.ZipCodes THEN OUTPUT;
RUN;
proc tabulate data=testin out=testout
/*drop extra created vars and rename as needed*/
(drop=_type_ _page_ _table_ rename=(Zip='Zip Codes'n Cars_Sum='Total Cars'n));
/*grouping variable, also used to sort output in ascending order*/
class Zip;
/* variable to be analyzed*/
var Cars;
/*sum cars by zip code*/
table Zip, Cars*(sum);
run;
groupby
子句也可以写入groupby 1
,省略ZipCodes
,因为这是指SELECT
子句中的第一列
过程汇总方法
PROC SQL;
CREATE TABLE output_table AS
SELECT ZipCodes,
SUM(Total_Cars) as Total_Cars
FROM input_table
GROUP BY ZipCodes;
QUIT;
PROC SUMMARY DATA=input_table NWAY;
CLASS ZipCodes;
VAR Total_Cars;
OUTPUT OUT=output_table (DROP=_TYPE_ _FREQ_) SUM()=;
RUN;
PROC SORT DATA=input_table (RENAME=(Total_Cars = tc)) OUT=_temp;
BY ZipCodes;
RUN;
DATA output_table (DROP=TC);
SET _temp;
BY ZipCodes;
IF first.ZipCodes THEN Total_Cars = 0;
Total_Cars+tc;
IF last.ZipCodes THEN OUTPUT;
RUN;
proc tabulate data=testin out=testout
/*drop extra created vars and rename as needed*/
(drop=_type_ _page_ _table_ rename=(Zip='Zip Codes'n Cars_Sum='Total Cars'n));
/*grouping variable, also used to sort output in ascending order*/
class Zip;
/* variable to be analyzed*/
var Cars;
/*sum cars by zip code*/
table Zip, Cars*(sum);
run;
这个方法类似于这个问题的另一个答案,但我补充道:
-只给出了最大的汇总级别,这里没有那么重要,因为您只有一个NWAY
变量,这意味着只有一个汇总级别。但是,如果不使用类
,您将获得一个额外的行,显示整个数据集中NWAY
的总值,这不是您在问题中要求的total_Cars
-这将删除自动变量:DROP=\u TYPE\u\u FREQ\u
-显示汇总级别(见上文注释),该列仅包含值\u TYPE \
1
-给出了\u FREQ\u
的频率计数,虽然它很有用,但在您的问题中并不是您想要的ZipCodes
PROC SQL;
CREATE TABLE output_table AS
SELECT ZipCodes,
SUM(Total_Cars) as Total_Cars
FROM input_table
GROUP BY ZipCodes;
QUIT;
PROC SUMMARY DATA=input_table NWAY;
CLASS ZipCodes;
VAR Total_Cars;
OUTPUT OUT=output_table (DROP=_TYPE_ _FREQ_) SUM()=;
RUN;
PROC SORT DATA=input_table (RENAME=(Total_Cars = tc)) OUT=_temp;
BY ZipCodes;
RUN;
DATA output_table (DROP=TC);
SET _temp;
BY ZipCodes;
IF first.ZipCodes THEN Total_Cars = 0;
Total_Cars+tc;
IF last.ZipCodes THEN OUTPUT;
RUN;
proc tabulate data=testin out=testout
/*drop extra created vars and rename as needed*/
(drop=_type_ _page_ _table_ rename=(Zip='Zip Codes'n Cars_Sum='Total Cars'n));
/*grouping variable, also used to sort output in ascending order*/
class Zip;
/* variable to be analyzed*/
var Cars;
/*sum cars by zip code*/
table Zip, Cars*(sum);
run;
这只是出于完整性考虑,并不像预排序那样有效。到目前为止,两种答案都可以,但下面是两种可能方法的更详细说明: 过程SQL方法
PROC SQL;
CREATE TABLE output_table AS
SELECT ZipCodes,
SUM(Total_Cars) as Total_Cars
FROM input_table
GROUP BY ZipCodes;
QUIT;
PROC SUMMARY DATA=input_table NWAY;
CLASS ZipCodes;
VAR Total_Cars;
OUTPUT OUT=output_table (DROP=_TYPE_ _FREQ_) SUM()=;
RUN;
PROC SORT DATA=input_table (RENAME=(Total_Cars = tc)) OUT=_temp;
BY ZipCodes;
RUN;
DATA output_table (DROP=TC);
SET _temp;
BY ZipCodes;
IF first.ZipCodes THEN Total_Cars = 0;
Total_Cars+tc;
IF last.ZipCodes THEN OUTPUT;
RUN;
proc tabulate data=testin out=testout
/*drop extra created vars and rename as needed*/
(drop=_type_ _page_ _table_ rename=(Zip='Zip Codes'n Cars_Sum='Total Cars'n));
/*grouping variable, also used to sort output in ascending order*/
class Zip;
/* variable to be analyzed*/
var Cars;
/*sum cars by zip code*/
table Zip, Cars*(sum);
run;
groupby
子句也可以写入groupby 1
,省略ZipCodes
,因为这是指SELECT
子句中的第一列
过程汇总方法
PROC SQL;
CREATE TABLE output_table AS
SELECT ZipCodes,
SUM(Total_Cars) as Total_Cars
FROM input_table
GROUP BY ZipCodes;
QUIT;
PROC SUMMARY DATA=input_table NWAY;
CLASS ZipCodes;
VAR Total_Cars;
OUTPUT OUT=output_table (DROP=_TYPE_ _FREQ_) SUM()=;
RUN;
PROC SORT DATA=input_table (RENAME=(Total_Cars = tc)) OUT=_temp;
BY ZipCodes;
RUN;
DATA output_table (DROP=TC);
SET _temp;
BY ZipCodes;
IF first.ZipCodes THEN Total_Cars = 0;
Total_Cars+tc;
IF last.ZipCodes THEN OUTPUT;
RUN;
proc tabulate data=testin out=testout
/*drop extra created vars and rename as needed*/
(drop=_type_ _page_ _table_ rename=(Zip='Zip Codes'n Cars_Sum='Total Cars'n));
/*grouping variable, also used to sort output in ascending order*/
class Zip;
/* variable to be analyzed*/
var Cars;
/*sum cars by zip code*/
table Zip, Cars*(sum);
run;
这个方法类似于这个问题的另一个答案,但我补充道:
-只给出了最大的汇总级别,这里没有那么重要,因为您只有一个NWAY
变量,这意味着只有一个汇总级别。但是,如果不使用类
,您将获得一个额外的行,显示整个数据集中NWAY
的总值,这不是您在问题中要求的total_Cars
-这将删除自动变量:DROP=\u TYPE\u\u FREQ\u
-显示汇总级别(见上文注释),该列仅包含值\u TYPE \
1
-给出了\u FREQ\u
的频率计数,虽然它很有用,但在您的问题中并不是您想要的ZipCodes
PROC SQL;
CREATE TABLE output_table AS
SELECT ZipCodes,
SUM(Total_Cars) as Total_Cars
FROM input_table
GROUP BY ZipCodes;
QUIT;
PROC SUMMARY DATA=input_table NWAY;
CLASS ZipCodes;
VAR Total_Cars;
OUTPUT OUT=output_table (DROP=_TYPE_ _FREQ_) SUM()=;
RUN;
PROC SORT DATA=input_table (RENAME=(Total_Cars = tc)) OUT=_temp;
BY ZipCodes;
RUN;
DATA output_table (DROP=TC);
SET _temp;
BY ZipCodes;
IF first.ZipCodes THEN Total_Cars = 0;
Total_Cars+tc;
IF last.ZipCodes THEN OUTPUT;
RUN;
proc tabulate data=testin out=testout
/*drop extra created vars and rename as needed*/
(drop=_type_ _page_ _table_ rename=(Zip='Zip Codes'n Cars_Sum='Total Cars'n));
/*grouping variable, also used to sort output in ascending order*/
class Zip;
/* variable to be analyzed*/
var Cars;
/*sum cars by zip code*/
table Zip, Cars*(sum);
run;
这只是为了完整性而包括在内,它的效率不如需要预排序的效率。为了补充@mjsqu的答案,为了(更多)完整性:
data testin;
input Zip Cars;
datalines;
11111 3
11111 4
23232 1
44331 0
44331 10
18860 6
18860 6
18860 6
18860 8
;
过程制表方法
PROC SQL;
CREATE TABLE output_table AS
SELECT ZipCodes,
SUM(Total_Cars) as Total_Cars
FROM input_table
GROUP BY ZipCodes;
QUIT;
PROC SUMMARY DATA=input_table NWAY;
CLASS ZipCodes;
VAR Total_Cars;
OUTPUT OUT=output_table (DROP=_TYPE_ _FREQ_) SUM()=;
RUN;
PROC SORT DATA=input_table (RENAME=(Total_Cars = tc)) OUT=_temp;
BY ZipCodes;
RUN;
DATA output_table (DROP=TC);
SET _temp;
BY ZipCodes;
IF first.ZipCodes THEN Total_Cars = 0;
Total_Cars+tc;
IF last.ZipCodes THEN OUTPUT;
RUN;
proc tabulate data=testin out=testout
/*drop extra created vars and rename as needed*/
(drop=_type_ _page_ _table_ rename=(Zip='Zip Codes'n Cars_Sum='Total Cars'n));
/*grouping variable, also used to sort output in ascending order*/
class Zip;
/* variable to be analyzed*/
var Cars;
/*sum cars by zip code*/
table Zip, Cars*(sum);
run;
如果使用企业指南,将生成一个数据集和一个结果表。要抑制结果并仅输出数据集,请在“proc TABLATE”之前包含此行:
在“运行”之后:
为了补充@mjsqu的答案,为了(更多)完整性:
data testin;
input Zip Cars;
datalines;
11111 3
11111 4
23232 1
44331 0
44331 10
18860 6
18860 6
18860 6
18860 8
;
过程制表方法
PROC SQL;
CREATE TABLE output_table AS
SELECT ZipCodes,
SUM(Total_Cars) as Total_Cars
FROM input_table
GROUP BY ZipCodes;
QUIT;
PROC SUMMARY DATA=input_table NWAY;
CLASS ZipCodes;
VAR Total_Cars;
OUTPUT OUT=output_table (DROP=_TYPE_ _FREQ_) SUM()=;
RUN;
PROC SORT DATA=input_table (RENAME=(Total_Cars = tc)) OUT=_temp;
BY ZipCodes;
RUN;
DATA output_table (DROP=TC);
SET _temp;
BY ZipCodes;
IF first.ZipCodes THEN Total_Cars = 0;
Total_Cars+tc;
IF last.ZipCodes THEN OUTPUT;
RUN;
proc tabulate data=testin out=testout
/*drop extra created vars and rename as needed*/
(drop=_type_ _page_ _table_ rename=(Zip='Zip Codes'n Cars_Sum='Total Cars'n));
/*grouping variable, also used to sort output in ascending order*/
class Zip;
/* variable to be analyzed*/
var Cars;
/*sum cars by zip code*/
table Zip, Cars*(sum);
run;
如果使用企业指南,将生成一个数据集和一个结果表。要抑制结果并仅输出数据集,请在“proc TABLATE”之前包含此行:
在“运行”之后:
试着在回答中更具体一些,更好地解释代码段在做什么。试着在回答中更具体一些,更好地解释代码段在做什么。你能突出显示并按Ctrl+K设置答案中的代码格式吗?嗨,mjsqu-我已经更新了格式。。。希望它现在更容易理解。你能通过突出显示并按Ctrl+K设置答案中的代码格式吗?嗨,mjsqu-我已经更新了格式。。。希望现在更容易理解,因为这是一个新概念,我建议您研究一下如何编写SQL教程,比如下面的教程:。然后,您将能够详细介绍@mjsqu在下面提供的SQL答案。由于对这一点还不熟悉,我建议您研究一下如何编写SQL教程,例如下面的教程:。然后,您将能够详细说明@mjsqu在下面提供的SQL答案。