If statement 从两个数据sas创建变量
我是SAS的初学者,我有两个数据集,如下所示:If statement 从两个数据sas创建变量,if-statement,merge,sas,If Statement,Merge,Sas,我是SAS的初学者,我有两个数据集,如下所示: A B ID Column1 Column 2 Column 4 ID Column2 Column3 1 A carrot food 1 carrot veggie 1 B carrot drink 2 pear
A B
ID Column1 Column 2 Column 4 ID Column2 Column3
1 A carrot food 1 carrot veggie
1 B carrot drink 2 pear fruit
2 A pear food 3 apple fruit
2 B pear drink
我想在中创建一个名为“Column3”的新变量,该变量使用以下逻辑:
If Column1 = A in dataset A, then set to Column3 from dataset B, otherwise set to Column4 in dataset A.
A
ID Column1 Column 2 Column 4 Column 3
1 A carrot food veggie
1 B carrot drink drink
2 A pear food fruit
2 B pear drink drink
我认为代码会起作用:
DATA A;
SET DF.A;
if (Column1 = A) then Column3 = [Column3 from B which I may have to merge];
else Column3 = Column4;
RUN;
PROC PRINT DATA = A;
TITLE 'OUTPUT DATASET';
RUN;
可以使用简单的内部联接,如下所示
data have1;
input ID Column1 $ Column2 $ Column4 $;
datalines;
1 A carrot food
1 B carrot drink
2 A pear food
2 B pear drink
;
data have2;
input ID Column2 $ Column3 $;
datalines;
1 carrot veggie
2 pear fruit
3 apple fruit
;
proc sql;
create table want as
select a.id,
a.column1,
a.column2,
a.column4,
case when trim(a.column1)= "A"
then b.column3
else a.column4
end as column3
from have1 a
inner join
have2 b
on a.id =b.id;
如果需要在数据步骤中使用merge和use if语句
proc sort data =have1;
by id;
run;
proc sort data =have2;
by id;
run;
data want;
merge have1(in=a) have2(in=b drop=column2);
by id;
if a= 1 and b=1;
if trim(column1)= "A" then column3=column3;
else column3=column4;
run;
或者你也可以使用散列
data want;
if _n_ =1 then do;
if _n_ = 0 then set have1 have2;
dcl hash h(multidata:"Y", dataset:"have2(drop=column2)");
h.definekey("id");
h.definedata( "column3");
h.definedone();
end;
set have1;
if h.find() = 0 then do until (h.find_next() ne 0);
end;
if trim(column1)= "A" then column3=column3;
else column3=column4;
run;
因此,首先合并这两个数据集,以便可以访问另一个数据集的值。然后,您可以使用条件逻辑选择用于填充新变量的变量。您需要使用一个新的变量,而不是仅使用从B中提取的变量。如果在a中有多个观测值与B中的一个观测值匹配,则SAS不会更改第3列的值,直到需要从B中读取新的观测值
data want ;
merge A B (keep=column2 column3);
by column2 ;
if column1 = 'A' then new_column3=column3;
else new_column3=column4;
run;
请注意,这两个数据集都需要排序才能合并
我添加了KEEP=
dataset选项,因为您将两个数据集都列为具有公共非关键变量ID。如果不将ID包含在从B读取的变量列表中,将防止其值覆盖数据集a中类似命名的变量