Dataframe SAS汇总观察值不在一个组中,由多个组进行
这篇文章如下: 遗憾的是,我的最小示例有点太小了,我无法在我的数据上使用它 这是一个完整的案例示例,我所拥有的是:Dataframe SAS汇总观察值不在一个组中,由多个组进行,dataframe,sas,Dataframe,Sas,这篇文章如下: 遗憾的是,我的最小示例有点太小了,我无法在我的数据上使用它 这是一个完整的案例示例,我所拥有的是: data have; input group1 group2 group3 $ value; datalines; 1 A X 2 1 A X 4 1 A Y 1 1 A Y 3 1 B Z 2 1 B Z 1 1 C Y 1 1 C Y 6 1 C Z 7 2 A Z 3 2 A Z 9 2 A Y 2 2 B X 8 2 B X 5 2 B X 5 2 B Z 7
data have;
input group1 group2 group3 $ value;
datalines;
1 A X 2
1 A X 4
1 A Y 1
1 A Y 3
1 B Z 2
1 B Z 1
1 C Y 1
1 C Y 6
1 C Z 7
2 A Z 3
2 A Z 9
2 A Y 2
2 B X 8
2 B X 5
2 B X 5
2 B Z 7
2 C Y 2
2 C X 1
;
run;
对于每个组,我需要一个新变量“sum”,该变量包含相同子组(group1和group2)列中所有值的总和,观察值所在的组(group3)除外
data want;
input group1 group2 group3 $ value $ sum;
datalines;
1 A X 2 8
1 A X 4 6
1 A Y 1 9
1 A Y 3 7
1 B Z 2 1
1 B Z 1 2
1 C Y 1 13
1 C Y 6 8
1 C Z 7 7
2 A Z 3 11
2 A Z 9 5
2 A Y 2 12
2 B X 8 17
2 B X 5 20
2 B X 5 20
2 B Z 7 18
2 C Y 2 1
2 C X 1 2
;
run;
我的目标是使用datasteps或ProcSQL(在大约3000万次观测和ProcMeans上进行此操作,SAS中的这类操作似乎比以前类似计算的速度要慢)
我在链接帖子中提供的解决方案的问题是使用列的总值,我不知道如何通过使用子组中的总值来改变这一点。
有什么想法吗?SQL解决方案会将所有数据加入聚合选择:
proc sql;
create table want as
select have.group1, have.group2, have.group3, have.value
, aggregate.sum - value as sum
from
have
join
(select group1, group2, sum(value) as sum
from have
group by group1, group2
) aggregate
on
aggregate.group1 = have.group1
& aggregate.group2 = have.group2
;
SQL可能比哈希解决方案慢,但SQL代码比那些理解涉及哈希的SAS数据步骤的人更容易理解(这可能比SQL更快)
SAS文档涉及并具有
问题不涉及这一概念:
- 对于每一行,计算不包括此行所在的第3层的第2层总和
data want2;
if 0 then set have; * prep pdv;
declare hash T2 (suminc:'value'); * hash for two (T)iers;
T2.defineKey('group1', 'group2'); * one hash record per combination of group1, group2;
T2.defineDone();
declare hash T3 (suminc:'value'); * hash for three (T)iers;
T3.defineKey('group1', 'group2', 'group3'); * one hash record per combination of group1, group2, group3;
T3.defineDone();
do while (not hash_loaded);
set have end=hash_loaded;
T2.ref(); * adds value to internal sum of hash data record;
T3.ref();
end;
T2_cardinality = T2.num_items;
T3_cardinality = T3.num_items;
put 'NOTE: |T2| = ' T2_cardinality;
put 'NOTE: |T3| = ' T3_cardinality;
do while (not last_have);
set have end=last_have;
T2.sum(sum:t2_sum);
T3.sum(sum:t3_sum);
sum = t2_sum - t3_sum;
output;
end;
stop;
drop t2_: t3:;
run;
SQL解决方案将所有数据连接到聚合选择:
proc sql;
create table want as
select have.group1, have.group2, have.group3, have.value
, aggregate.sum - value as sum
from
have
join
(select group1, group2, sum(value) as sum
from have
group by group1, group2
) aggregate
on
aggregate.group1 = have.group1
& aggregate.group2 = have.group2
;
SQL可能比哈希解决方案慢,但SQL代码比那些理解涉及哈希的SAS数据步骤的人更容易理解(这可能比SQL更快)
SAS文档涉及并具有
问题不涉及这一概念:
- 对于每一行,计算不包括此行所在的第3层的第2层总和
data want2;
if 0 then set have; * prep pdv;
declare hash T2 (suminc:'value'); * hash for two (T)iers;
T2.defineKey('group1', 'group2'); * one hash record per combination of group1, group2;
T2.defineDone();
declare hash T3 (suminc:'value'); * hash for three (T)iers;
T3.defineKey('group1', 'group2', 'group3'); * one hash record per combination of group1, group2, group3;
T3.defineDone();
do while (not hash_loaded);
set have end=hash_loaded;
T2.ref(); * adds value to internal sum of hash data record;
T3.ref();
end;
T2_cardinality = T2.num_items;
T3_cardinality = T3.num_items;
put 'NOTE: |T2| = ' T2_cardinality;
put 'NOTE: |T3| = ' T3_cardinality;
do while (not last_have);
set have end=last_have;
T2.sum(sum:t2_sum);
T3.sum(sum:t3_sum);
sum = t2_sum - t3_sum;
output;
end;
stop;
drop t2_: t3:;
run;
感谢您在我的脑海中,我的“想要”数据库是一个错误,您提出的第二个解决方案正是我所需要的。我处理这件事很痛苦,你解决了,非常感谢!感谢您在我的脑海中,我的“想要”数据库是一个错误,您提出的第二个解决方案正是我所需要的。我处理这件事很痛苦,你解决了,非常感谢!