Merge 根据文件A中的两列合并两个SAS文件

Merge 根据文件A中的两列合并两个SAS文件,merge,sas,mainframe,Merge,Sas,Mainframe,文件A: C1(name) C2(other name) Apple Fruit_1 Fruit_2 Orange Fruit_1 Fruit_2 Carrot Vegetable_1 Vegetable_2 Potato Vegetable_1 Vegetable_2 文件B: C1(name) C2 (last used) Apple 2014 Fruit_1 2011 Carrot 201

文件A:

  C1(name)       C2(other name)
Apple      Fruit_1 Fruit_2
Orange     Fruit_1 Fruit_2
Carrot     Vegetable_1 Vegetable_2
Potato     Vegetable_1 Vegetable_2
文件B:

 C1(name)    C2 (last used)
Apple        2014  
Fruit_1      2011 
Carrot       2010
Vegetable_2  2018
预期结果:

  C1(name)       C2(other name)       C3(last used)
Apple      Fruit_1 Fruit_2           2014 
Orange     Fruit_1 Fruit_2           2011
Carrot     Vegetable_1 Vegetable_2   2018
Potato     Vegetable_1 Vegetable_2   2018
基本上,我希望根据第一个文件中的两列合并我的文件。如果在文件a的C1或C2列中找到文件b中的“名称”,则添加日期。请注意,对于胡萝卜,日期与土豆相同,这是因为蔬菜_2是最近的日期,蔬菜_2可以指胡萝卜或土豆

在使用MERGE的测试中,我无法让MERGE检查第二列,因此我只能获取两个文件中C1中存在的项的数据

因此,我目前的结果是:

  C1(name)       C2(other name)       C3(last used)
Apple      Fruit_1 Fruit_2           2014 
Orange     Fruit_1 Fruit_2           
Carrot     Vegetable_1 Vegetable_2   2010
Potato     Vegetable_1 Vegetable_2   

你知道什么样的SAS流程可以用来获得我想要的结果吗?仅供参考,我正在主机上使用SAS。我不确定这是否会改变什么,因为我没有在大型机环境之外使用SAS

您只需将
文件B左连接到A,并在
On
子句中使用
Contains
运算符

因此,如果table1.c1=table2.c1或table2.c1是table1.C2(文件A)的子集,您就离开了join

虚拟数据:

data file_a;
length c1 $ 8 c2 $ 30 ;
input c1 $ c2 $ ;
datalines;
Apple      Fruit_1,Fruit_2
Orange     Fruit_1,Fruit_2
Carrot     Vegetable_1,Vegetable_2
Potato     Vegetable_1,Vegetable_2
;
data file_b;
length c1 $ 12 c2 $ 4 ;
input c1 $ c2 $ ;
datalines;
Apple        2014  
Fruit_1      2011 
Carrot       2010
Vegetable_2  2018
;
代码:

输出:

c1=Apple c2=Fruit_1,Fruit_2 last_used=2014 cc=Apple 
c1=Carrot c2=Vegetable_1,Vegetable_2 last_used=2010 cc=Carrot 
c1=Carrot c2=Vegetable_1,Vegetable_2 last_used=2018 cc=Vegetable_2 
c1=Potato c2=Vegetable_1,Vegetable_2 last_used=2018 cc=Vegetable_2 
c1=Orange c2=Fruit_1,Fruit_2 last_used=  cc=  

只合并的方法需要

  • 按行旋转第一个表
  • 按任意键排序以准备合并
  • 合并
  • 排序以恢复原始行顺序和递减年份
  • 选择最近一年
样本数据

data foods;
length key1 $20 key2s $50;
input key1 key2s &; datalines;
Apple      Fruit_1 Fruit_2
Orange     Fruit_1 Fruit_2
Carrot     Vegetable_1 Vegetable_2
Potato     Vegetable_1 Vegetable_2
Knuckle    Sandwich_1 Sandwich_2
run;

data dates;
length key $20 year 8;
input key year; datalines;
Apple        2014  
Fruit_1      2011 
Carrot       2010
Vegetable_2  2018
Grain_1      2009
run;
按行旋转每行,以获得每键值一行

data food_single_keyed;
  length key $20;
  set foods;

  rowid = _n_;

  key = key1; output;
  do i = 1 by 1;
    key = scan(key2s,i); 
    if missing(key) then leave;
    output;
    if i > 10 then stop;
  end;
  drop i;
run;
按键排序以准备按键合并

proc sort data=food_single_keyed;
  by key;
run;

proc sort data=dates;
  by key;
run;
data foods_dated;
  merge food_single_keyed dates;
  by key;
run;
按键合并

proc sort data=food_single_keyed;
  by key;
run;

proc sort data=dates;
  by key;
run;
data foods_dated;
  merge food_single_keyed dates;
  by key;
run;
排序以准备最终选择

proc sort data=foods_dated;
  by rowid descending year ;
run;
仅为行选择第一个最近关联的年份。您还可以保留
列,以了解用于选择年份的值

data want (keep=key1 key2s year);
  set foods_dated;
  by rowid;
  if rowid;
  if first.rowid;
run;

这需要SQL或哈希解决方案,SQL可能更容易。请发布您迄今为止试图解决此问题的任何内容。