Merge 根据文件A中的两列合并两个SAS文件
文件A:Merge 根据文件A中的两列合并两个SAS文件,merge,sas,mainframe,Merge,Sas,Mainframe,文件A: C1(name) C2(other name) Apple Fruit_1 Fruit_2 Orange Fruit_1 Fruit_2 Carrot Vegetable_1 Vegetable_2 Potato Vegetable_1 Vegetable_2 文件B: C1(name) C2 (last used) Apple 2014 Fruit_1 2011 Carrot 201
C1(name) C2(other name)
Apple Fruit_1 Fruit_2
Orange Fruit_1 Fruit_2
Carrot Vegetable_1 Vegetable_2
Potato Vegetable_1 Vegetable_2
文件B:
C1(name) C2 (last used)
Apple 2014
Fruit_1 2011
Carrot 2010
Vegetable_2 2018
预期结果:
C1(name) C2(other name) C3(last used)
Apple Fruit_1 Fruit_2 2014
Orange Fruit_1 Fruit_2 2011
Carrot Vegetable_1 Vegetable_2 2018
Potato Vegetable_1 Vegetable_2 2018
基本上,我希望根据第一个文件中的两列合并我的文件。如果在文件a的C1或C2列中找到文件b中的“名称”,则添加日期。请注意,对于胡萝卜,日期与土豆相同,这是因为蔬菜_2是最近的日期,蔬菜_2可以指胡萝卜或土豆
在使用MERGE的测试中,我无法让MERGE检查第二列,因此我只能获取两个文件中C1中存在的项的数据
因此,我目前的结果是:
C1(name) C2(other name) C3(last used)
Apple Fruit_1 Fruit_2 2014
Orange Fruit_1 Fruit_2
Carrot Vegetable_1 Vegetable_2 2010
Potato Vegetable_1 Vegetable_2
你知道什么样的SAS流程可以用来获得我想要的结果吗?仅供参考,我正在主机上使用SAS。我不确定这是否会改变什么,因为我没有在大型机环境之外使用SAS 您只需将
文件B左连接到A,并在On
子句中使用Contains
运算符
因此,如果table1.c1=table2.c1或table2.c1是table1.C2(文件A)的子集,您就离开了join
虚拟数据:
data file_a;
length c1 $ 8 c2 $ 30 ;
input c1 $ c2 $ ;
datalines;
Apple Fruit_1,Fruit_2
Orange Fruit_1,Fruit_2
Carrot Vegetable_1,Vegetable_2
Potato Vegetable_1,Vegetable_2
;
data file_b;
length c1 $ 12 c2 $ 4 ;
input c1 $ c2 $ ;
datalines;
Apple 2014
Fruit_1 2011
Carrot 2010
Vegetable_2 2018
;
代码:
输出:
c1=Apple c2=Fruit_1,Fruit_2 last_used=2014 cc=Apple
c1=Carrot c2=Vegetable_1,Vegetable_2 last_used=2010 cc=Carrot
c1=Carrot c2=Vegetable_1,Vegetable_2 last_used=2018 cc=Vegetable_2
c1=Potato c2=Vegetable_1,Vegetable_2 last_used=2018 cc=Vegetable_2
c1=Orange c2=Fruit_1,Fruit_2 last_used= cc=
只合并的方法需要
- 按行旋转第一个表
- 按任意键排序以准备合并
- 合并
- 排序以恢复原始行顺序和递减年份
- 选择最近一年
样本数据
data foods;
length key1 $20 key2s $50;
input key1 key2s &; datalines;
Apple Fruit_1 Fruit_2
Orange Fruit_1 Fruit_2
Carrot Vegetable_1 Vegetable_2
Potato Vegetable_1 Vegetable_2
Knuckle Sandwich_1 Sandwich_2
run;
data dates;
length key $20 year 8;
input key year; datalines;
Apple 2014
Fruit_1 2011
Carrot 2010
Vegetable_2 2018
Grain_1 2009
run;
按行旋转每行,以获得每键值一行
data food_single_keyed;
length key $20;
set foods;
rowid = _n_;
key = key1; output;
do i = 1 by 1;
key = scan(key2s,i);
if missing(key) then leave;
output;
if i > 10 then stop;
end;
drop i;
run;
按键排序以准备按键合并
proc sort data=food_single_keyed;
by key;
run;
proc sort data=dates;
by key;
run;
data foods_dated;
merge food_single_keyed dates;
by key;
run;
按键合并
proc sort data=food_single_keyed;
by key;
run;
proc sort data=dates;
by key;
run;
data foods_dated;
merge food_single_keyed dates;
by key;
run;
排序以准备最终选择
proc sort data=foods_dated;
by rowid descending year ;
run;
仅为行选择第一个最近关联的年份。您还可以保留键
列,以了解用于选择年份的值
data want (keep=key1 key2s year);
set foods_dated;
by rowid;
if rowid;
if first.rowid;
run;
这需要SQL或哈希解决方案,SQL可能更容易。请发布您迄今为止试图解决此问题的任何内容。