If statement 从两个数据sas创建变量

If statement 从两个数据sas创建变量,if-statement,merge,sas,If Statement,Merge,Sas,我是SAS的初学者,我有两个数据集,如下所示: A B ID Column1 Column 2 Column 4 ID Column2 Column3 1 A carrot food 1 carrot veggie 1 B carrot drink 2 pear

我是SAS的初学者,我有两个数据集,如下所示:

   A                                        B 
ID  Column1  Column 2  Column 4          ID  Column2   Column3 
1      A     carrot    food              1   carrot    veggie
1      B     carrot    drink             2   pear      fruit
2      A     pear      food              3   apple     fruit 
2      B     pear      drink
我想在中创建一个名为“Column3”的新变量,该变量使用以下逻辑:

If Column1 = A in dataset A, then set to Column3 from dataset B, otherwise set to Column4 in dataset A. 

   A                                         
ID  Column1  Column 2  Column 4   Column 3        
1      A     carrot    food       veggie         
1      B     carrot    drink      drink       
2      A     pear      food       fruit        
2      B     pear      drink      drink
我认为代码会起作用:

DATA A;
    SET DF.A; 
    if (Column1 = A) then Column3 = [Column3 from B which I may have to merge];
else Column3 = Column4;
RUN;
PROC PRINT DATA = A; 
    TITLE 'OUTPUT DATASET'; 
RUN; 

可以使用简单的内部联接,如下所示

  data have1;                                    
  input ID  Column1 $ Column2 $ Column4 $;   
 datalines;     
  1      A     carrot    food             
  1      B     carrot    drink            
  2      A     pear      food              
  2      B     pear      drink
  ;

data have2;
    input ID  Column2 $ Column3 $;
   datalines;
  1   carrot    veggie
  2   pear      fruit
  3   apple     fruit 
  ;

proc sql;
create table want as 
select a.id,
   a.column1,
   a.column2,
   a.column4, 
   case when trim(a.column1)= "A" 
       then b.column3 
       else a.column4
   end as column3
 from have1 a
 inner join 
 have2 b
 on a.id =b.id;
如果需要在数据步骤中使用merge和use if语句

proc sort data =have1;
by id;
run;

proc sort data =have2;
 by id;
 run;

 data want;
 merge have1(in=a) have2(in=b drop=column2);
 by id;
if a= 1 and b=1;
if trim(column1)= "A" then column3=column3;
else column3=column4;
run;
或者你也可以使用散列

data want;
if _n_ =1 then do;
 if _n_ = 0 then set have1 have2;
dcl hash h(multidata:"Y", dataset:"have2(drop=column2)");
h.definekey("id");
h.definedata( "column3");
h.definedone();
end;
set have1;
if h.find() = 0 then do until (h.find_next() ne 0);
end;
if trim(column1)= "A" then column3=column3;
else column3=column4;
run;

因此,首先合并这两个数据集,以便可以访问另一个数据集的值。然后,您可以使用条件逻辑选择用于填充新变量的变量。您需要使用一个新的变量,而不是仅使用从B中提取的变量。如果在a中有多个观测值与B中的一个观测值匹配,则SAS不会更改第3列的值,直到需要从B中读取新的观测值

data want ;
  merge A B (keep=column2 column3);
  by column2 ;
  if column1 = 'A' then new_column3=column3;
  else new_column3=column4;
run;
请注意,这两个数据集都需要排序才能合并

我添加了
KEEP=
dataset选项,因为您将两个数据集都列为具有公共非关键变量ID。如果不将ID包含在从B读取的变量列表中,将防止其值覆盖数据集a中类似命名的变量