Sorting SAS中的复杂连接
我正在尝试加入以下两个数据集:Sorting SAS中的复杂连接,sorting,join,merge,sas,sql-order-by,Sorting,Join,Merge,Sas,Sql Order By,我正在尝试加入以下两个数据集: data testA; input categorical $3. value; *order = _n_; datalines; Dog. M 7 F 5 Cat. M 4 F 2 ; run; data testA; set testA; order=_n_; run; data testB; input categorical $2. value; datalines; Dog. F 3 Cat. M 1 F 2 ; run; pro
data testA;
input categorical $3. value;
*order = _n_;
datalines;
Dog.
M 7
F 5
Cat.
M 4
F 2
;
run;
data testA;
set testA;
order=_n_;
run;
data testB;
input categorical $2. value;
datalines;
Dog.
F 3
Cat.
M 1
F 2
;
run;
proc sql;
create table final as
select a.*,b.* from testA a left join testB b on
a.categorical=b.categorical
order by order;
quit;
我期望的输出如下:
data testA;
input categorical $ value value2;
datalines;
Dog . .
M 7 .
F 5 3
Cat . .
M 4 1
F 2 2
;
run;
我遇到的问题是1)“分类”id没有按字母顺序排序,我不想改变它的顺序2)因为有两个Ms和Fs,我不知道如何在不重命名mf的情况下进行连接,以使其唯一3)它可能是一个内部连接,因为值中可能包含的内容可能不包含在值中2如果您的数据具有该类别值作为散布的行,您需要创建第三列来保存在通过数据集时发现的值。对于讨论,将此新列命名为
组
——它也将是分类的,并且在其他类别列的“上方”具有层次结构。它是执行复杂联接所需的“合成”类别,将从最终结果中丢弃
want
join将是一个简单的“黑匣子”,包括分组、合并、鬼鬼祟祟的数学运算和行和的组和
示例代码创建了一个表fulljoin_peek
,该表不是结果所需的,但将提供对通过黑盒的数据流的深入了解。该代码还处理类别在组中重复的“真实数据”情况
样本数据:
data testA;
input categorical $3. value;
datalines;
Dog . * missing means categorical is really group
M 7
F 5
Cat .
M 4
F 2
Rat . * B does not have rat
T 5
Bat . * Bat has two M (repeated category) need to be summed
M 7
M 3
Fly .
M 5
F 6
;
run;
data testB;
input categorical $3. value;
datalines;
Dog . * only one category
F 3
Cat .
M 1
F 2
Cow . * A does not have cow
X 7
Bat . * Bat has two F (repeated category) need to be summed
F 7
F 13
Fly . * F M order different than A
F 16
M 20
;
run;
增强数据有一个组列和有关原始顺序的信息:
data A2;
set testA;
if value = . then do;
* presume missing is the 'discovery' of when the
* group value has to be assigned;
group = categorical; retain group;
group_order + 1;
value_order = 0;
end;
value_order + 1;
format group_order value_order 4.;
run;
data B2;
set testB;
if value = . then do;
* presume missing is the 'discovery' of when the
* group value has to be assigned;
group = categorical; retain group;
group_order + 1;
value_order = 0;
end;
value_order + 1;
format group_order value_order 4.;
run;
联接操作(数据窥视)
想要加入(回答)
从测试中移除狗…F。你怎么知道你应该以中频还是调频的顺序输出?您需要另一个信息源来设置排序。您是否仍在尝试加入人口统计配置文件中每个变量的摘要?如果是这样,还有另一个选项,修改它,使其具有所有值,即使它不存在。@Reeza,是的,我仍在尝试加入人口统计配置文件中的摘要。当这些值不存在时,我会修改宏来修改它们吗?我会在这个周末看一看,然后再给你回复。我有另一个地方可以立即使用,但它被埋入临床试验计划中,所以我需要找到它。@Reeza,非常感谢!我将不胜感激。我能够进行合并,但这是一个比我想象的更为手动的过程。
* this full join shows how data matches up for the answer
* the answer will use grouping, coalescing, summing and adding;
proc sql;
create table fulljoin_peek as
select
coalesce (A.categorical, B.categorical) as want_categorical
, sum(A.value,B.value) as want_value format=4.
, A.group as A_group
, B.group as B_group
, A.group_order as A_group_order
, B.group_order as B_group_order
, A.categorical as A_cat
, B.categorical as B_cat
, A.value as A_value
, B.value as B_value
, A.value_order as A_value_order
, B.value_order as B_value_order
from
A2 as A
full join
B2 as B
on
A.group = B.group
and A.categorical = B.categorical
;
proc sql;
create table
want (drop=group_order value_order) as
select
coalesce (A.categorical, B.categorical) as want_categorical
, min (coalesce (A.group_order-1e6,B.group_order)) as group_order
, min (coalesce (A.value_order-1e6,B.value_order)) as value_order %* -1e6 forces A order to have precedence ;
, sum ( sum (A.value,B.value) ) as value
from
A2 as A
full join
B2 as B
on
A.group = B.group
and A.categorical = B.categorical
group by
A.group, want_categorical
order by
group_order, value_order
;