Merge 根据文件A中的两列合并两个SAS文件_Merge_Sas_Mainframe

Merge 根据文件A中的两列合并两个SAS文件

merge sas

Merge 根据文件A中的两列合并两个SAS文件,merge,sas,mainframe,Merge,Sas,Mainframe,文件A： C1(name) C2(other name) Apple Fruit_1 Fruit_2 Orange Fruit_1 Fruit_2 Carrot Vegetable_1 Vegetable_2 Potato Vegetable_1 Vegetable_2 文件B： C1(name) C2 (last used) Apple 2014 Fruit_1 2011 Carrot 201

文件A：

  C1(name)       C2(other name)
Apple      Fruit_1 Fruit_2
Orange     Fruit_1 Fruit_2
Carrot     Vegetable_1 Vegetable_2
Potato     Vegetable_1 Vegetable_2

文件B：

 C1(name)    C2 (last used)
Apple        2014  
Fruit_1      2011 
Carrot       2010
Vegetable_2  2018

预期结果：

  C1(name)       C2(other name)       C3(last used)
Apple      Fruit_1 Fruit_2           2014 
Orange     Fruit_1 Fruit_2           2011
Carrot     Vegetable_1 Vegetable_2   2018
Potato     Vegetable_1 Vegetable_2   2018

基本上，我希望根据第一个文件中的两列合并我的文件。如果在文件a的C1或C2列中找到文件b中的“名称”，则添加日期。请注意，对于胡萝卜，日期与土豆相同，这是因为蔬菜_2是最近的日期，蔬菜_2可以指胡萝卜或土豆

在使用MERGE的测试中，我无法让MERGE检查第二列，因此我只能获取两个文件中C1中存在的项的数据

因此，我目前的结果是：

  C1(name)       C2(other name)       C3(last used)
Apple      Fruit_1 Fruit_2           2014 
Orange     Fruit_1 Fruit_2           
Carrot     Vegetable_1 Vegetable_2   2010
Potato     Vegetable_1 Vegetable_2

你知道什么样的SAS流程可以用来获得我想要的结果吗？仅供参考，我正在主机上使用SAS。我不确定这是否会改变什么，因为我没有在大型机环境之外使用SAS

您只需将

文件B左连接到A，并在On
子句中使用Contains
运算符
因此，如果table1.c1=table2.c1或table2.c1是table1.C2（文件A）的子集，您就离开了join
虚拟数据：
data file_a;
length c1 $ 8 c2 $ 30 ;
input c1 $ c2 $ ;
datalines;
Apple      Fruit_1,Fruit_2
Orange     Fruit_1,Fruit_2
Carrot     Vegetable_1,Vegetable_2
Potato     Vegetable_1,Vegetable_2
;
data file_b;
length c1 $ 12 c2 $ 4 ;
input c1 $ c2 $ ;
datalines;
Apple        2014  
Fruit_1      2011 
Carrot       2010
Vegetable_2  2018
;

代码：
输出：
c1=Apple c2=Fruit_1,Fruit_2 last_used=2014 cc=Apple 
c1=Carrot c2=Vegetable_1,Vegetable_2 last_used=2010 cc=Carrot 
c1=Carrot c2=Vegetable_1,Vegetable_2 last_used=2018 cc=Vegetable_2 
c1=Potato c2=Vegetable_1,Vegetable_2 last_used=2018 cc=Vegetable_2 
c1=Orange c2=Fruit_1,Fruit_2 last_used=  cc=  

只合并的方法需要

按行旋转第一个表
按任意键排序以准备合并
合并
排序以恢复原始行顺序和递减年份
选择最近一年

样本数据
data foods;
length key1 $20 key2s $50;
input key1 key2s &; datalines;
Apple      Fruit_1 Fruit_2
Orange     Fruit_1 Fruit_2
Carrot     Vegetable_1 Vegetable_2
Potato     Vegetable_1 Vegetable_2
Knuckle    Sandwich_1 Sandwich_2
run;

data dates;
length key $20 year 8;
input key year; datalines;
Apple        2014  
Fruit_1      2011 
Carrot       2010
Vegetable_2  2018
Grain_1      2009
run;

按行旋转每行，以获得每键值一行
data food_single_keyed;
  length key $20;
  set foods;

  rowid = _n_;

  key = key1; output;
  do i = 1 by 1;
    key = scan(key2s,i); 
    if missing(key) then leave;
    output;
    if i > 10 then stop;
  end;
  drop i;
run;

按键排序以准备按键合并
proc sort data=food_single_keyed;
  by key;
run;

proc sort data=dates;
  by key;
run;

data foods_dated;
  merge food_single_keyed dates;
  by key;
run;

按键合并
proc sort data=food_single_keyed;
  by key;
run;

proc sort data=dates;
  by key;
run;

data foods_dated;
  merge food_single_keyed dates;
  by key;
run;

排序以准备最终选择
proc sort data=foods_dated;
  by rowid descending year ;
run;

仅为行选择第一个最近关联的年份。您还可以保留键
列，以了解用于选择年份的值
data want (keep=key1 key2s year);
  set foods_dated;
  by rowid;
  if rowid;
  if first.rowid;
run;

这需要SQL或哈希解决方案，SQL可能更容易。请发布您迄今为止试图解决此问题的任何内容。