在SAS中展平多个观测值
我有一个数据集,患者可以为某些变量设置多个(未知)值,这些变量的结果如下:在SAS中展平多个观测值,sas,Sas,我有一个数据集,患者可以为某些变量设置多个(未知)值,这些变量的结果如下: ID Var1 Var2 Var3 Var4 1 Blue Female 17 908 1 Blue Female 17 909 1 Red Female 17 910 1 Red Female 17 911 ... 99 Blue Female 14 908
ID Var1 Var2 Var3 Var4
1 Blue Female 17 908
1 Blue Female 17 909
1 Red Female 17 910
1 Red Female 17 911
...
99 Blue Female 14 908
100 Red Male 28 911
ID YesBlue Var2 Var3 Yes911
1 1 Female 17 1
99 1 Female 14 0
100 0 Male 28 1
proc sql noprint;
create table patients2 as
select *
,case(var1)
when "Blue" then 1
else 0
end as ablue
,case(var4)
when 911 then 1
else 0
end as a911
,max(calculated ablue) as yesblue
,max(calculated a911) as yes911
from patients1
group by id
order by id;
quit;
proc sort data=patients2 out=patients3(drop=var1 var4 ablue a911) nodupkey;
by id;
run;
我想将这些数据打包,以便每个ID只有一个条目,并带有指示其原始条目中是否存在一个值的指示器。例如,类似这样的事情:
ID Var1 Var2 Var3 Var4
1 Blue Female 17 908
1 Blue Female 17 909
1 Red Female 17 910
1 Red Female 17 911
...
99 Blue Female 14 908
100 Red Male 28 911
ID YesBlue Var2 Var3 Yes911
1 1 Female 17 1
99 1 Female 14 0
100 0 Male 28 1
proc sql noprint;
create table patients2 as
select *
,case(var1)
when "Blue" then 1
else 0
end as ablue
,case(var4)
when 911 then 1
else 0
end as a911
,max(calculated ablue) as yesblue
,max(calculated a911) as yes911
from patients1
group by id
order by id;
quit;
proc sort data=patients2 out=patients3(drop=var1 var4 ablue a911) nodupkey;
by id;
run;
在SAS中有没有一种简单的方法可以做到这一点?否则,在Access(数据来自何处)中,我真的不知道如何使用它。如果您的数据集名为PATIENTS1,可能是这样的:
ID Var1 Var2 Var3 Var4
1 Blue Female 17 908
1 Blue Female 17 909
1 Red Female 17 910
1 Red Female 17 911
...
99 Blue Female 14 908
100 Red Male 28 911
ID YesBlue Var2 Var3 Yes911
1 1 Female 17 1
99 1 Female 14 0
100 0 Male 28 1
proc sql noprint;
create table patients2 as
select *
,case(var1)
when "Blue" then 1
else 0
end as ablue
,case(var4)
when 911 then 1
else 0
end as a911
,max(calculated ablue) as yesblue
,max(calculated a911) as yes911
from patients1
group by id
order by id;
quit;
proc sort data=patients2 out=patients3(drop=var1 var4 ablue a911) nodupkey;
by id;
run;
这是一个数据步解决方案。我假设对于给定的ID,Var2和Var3的值总是相同的
data have;
input ID Var1 $ Var2 $ Var3 Var4;
cards;
1 Blue Female 17 908
1 Blue Female 17 909
1 Red Female 17 910
1 Red Female 17 911
99 Blue Female 14 908
100 Red Male 28 911
;
run;
data want (drop=Var1 Var4 _:);
set have;
by ID;
if first.ID then do;
_blue=0;
_911=0;
end;
_blue+(Var1='Blue');
_911+(Var4=911);
if last.ID then do;
YesBlue=(_blue>0);
Yes911=(_911>0);
output;
end;
run;
编辑:看起来和基思说的一样,只是写得不一样 这应该做到:
data test;
input id Var1 $ Var2 $ Var3 Var4;
datalines;
1 Blue Female 17 908
1 Blue Female 17 909
1 Red Female 17 910
1 Red Female 17 911
99 Blue Female 14 908
100 Red Male 28 911
run;
data flatten(drop=Var1 Var4);
set test;
retain YesBlue;
retain Yes911;
by id;
if first.id then do;
YesBlue = 0;
Yes911 = 0;
end;
if Var1 eq "Blue" then YesBlue = 1;
if Var4 eq 911 then Yes911 = 1;
if last.id then output;
run;
PROC-SQL
非常适合这种情况。这与DavB的答案类似,但消除了额外的排序:
data have;
input ID Var1 $ Var2 $ Var3 Var4;
cards;
1 Blue Female 17 908
1 Blue Female 17 909
1 Red Female 17 910
1 Red Female 17 911
99 Blue Female 14 908
100 Red Male 28 911
;
run;
proc sql;
create table want as
select ID
, max(case(var1)
when 'Blue'
then 1
else 0 end) as YesBlue
, max(var2) as Var2
, max(var3) as Var3
, max(case(var4)
when 911
then 1
else 0 end) as Yes911
from have
group by id
order by id;
quit;
它还可以通过ID变量安全地减少原始数据,但如果源与您描述的不完全一致,则可能会出现错误