Sql 如何对具有相同变量的两个频率数据集求和?
嗨,我想创建两个网球运动员的频率数据集,他们赢得了一定数量的比赛。 两个数据集的int顺序完全相同 我如何创建数据集:Sql 如何对具有相同变量的两个频率数据集求和?,sql,database,sas,dataset,proc,Sql,Database,Sas,Dataset,Proc,嗨,我想创建两个网球运动员的频率数据集,他们赢得了一定数量的比赛。 两个数据集的int顺序完全相同 我如何创建数据集: PROC FREQ data=projet.matchs; TABLES player1 / out = table1; run; player1 Fréquence Pourcentage Fréquencecumulée Pourcentagecumulé Adrian Mannarino 3 1.
PROC FREQ data=projet.matchs;
TABLES player1 / out = table1;
run;
player1 Fréquence Pourcentage Fréquencecumulée Pourcentagecumulé
Adrian Mannarino 3 1.18 3 1.18
Agnieszka Radwanska 2 0.79 5 1.97
Ajla Tomljanovic 1 0.39 6 2.36
Albert Ramos 1 0.39 7 2.76
第二数据集表2
PROC FREQ data=projet.matchs;
TABLES player2 / out= table2;
run;
player2 Fréquence Pourcentage Fréquence cumulée Pourcentage cumulé
Adrian Mannarino 1 0.39 1 0.39
Alex Bolt 1 0.39 2 0.79
Alex De Minaur 1 0.39 3 1.18
Alexander Zverev 3 1.18 6 2.36
我想要的是创建一个表1和表2之和的新数据集。
我的数据集要大得多,我刚刚把第四个观测值放在第一位
任何帮助都将不胜感激!谢谢这个怎么样?对你有用吗
data combined / view=combined;
set table1 table2;
run;
proc means data=combined nway;
class player1;
var Fréquence,Pourcentage,Fréquence cumulée,Pourcentage cumulé;
run;
Yoy可以使用proc sql和summary函数将其连接到表中 Have1数据集:
+---------------------+-----------+-------------+------------------+-------------------+
| player1 | Frequence | Pourcentage | Frequencecumulee | Pourcentagecumule |
+---------------------+-----------+-------------+------------------+-------------------+
| Adrian Mannarino | 3 | 1.18 | 3 | 1.18 |
| Agnieszka Radwanska | 2 | 0.79 | 5 | 1.97 |
| Ajla Tomljanovic | 1 | 0.39 | 6 | 2.36 |
| Albert Ramos | 1 | 0.39 | 7 | 2.76 |
+---------------------+-----------+-------------+------------------+-------------------+
Have2数据集:
+------------------+-----------+-------------+------------------+-------------------+
| player2 | Frequence | Pourcentage | Frequencecumulee | Pourcentagecumule |
+------------------+-----------+-------------+------------------+-------------------+
| Adrian Mannarino | 1 | 0.39 | 1 | 0.39 |
| Alex Bolt | 1 | 0.39 | 2 | 0.79 |
| Alex De Minaur | 1 | 0.39 | 3 | 1.18 |
| Alexander Zverev | 3 | 1.18 | 6 | 2.36 |
+------------------+-----------+-------------+------------------+-------------------+
解决方案:
proc sql noprint;
create table want1 as
select
coalesce(player1,player2) as player,
sum(t1.Frequence,t2.Frequence) as Frequence,
sum(t1.Pourcentage,t2.Pourcentage) as Pourcentage,
sum(t1.Frequencecumulee,t2.Frequencecumulee) as Frequencecumulee,
sum(t1.Pourcentagecumule,t2.Pourcentagecumule) as Pourcentagecumule
from
have1 t1
full join
have2 t2
on
strip(player1)=strip(player2);
quit;
输出:
+---------------------+-----------+-------------+------------------+-------------------+
| player | Frequence | Pourcentage | Frequencecumulee | Pourcentagecumule |
+---------------------+-----------+-------------+------------------+-------------------+
| Adrian Mannarino | 4 | 1.57 | 4 | 1.57 |
| Agnieszka Radwanska | 2 | 0.79 | 5 | 1.97 |
| Ajla Tomljanovic | 1 | 0.39 | 6 | 2.36 |
| Albert Ramos | 1 | 0.39 | 7 | 2.76 |
| Alex Bolt | 1 | 0.39 | 2 | 0.79 |
| Alex De Minaur | 1 | 0.39 | 3 | 1.18 |
| Alexander Zverev | 3 | 1.18 | 6 | 2.36 |
+---------------------+-----------+-------------+------------------+-------------------+
+---------------------+-----------+-------------+------------------+-------------------+
| player | Frequence | Pourcentage | Frequencecumulee | Pourcentagecumule |
+---------------------+-----------+-------------+------------------+-------------------+
| Adrian Mannarino | 4 | 1.57 | 4 | 1.57 |
| Agnieszka Radwanska | 2 | 0.79 | 5 | 1.97 |
| Ajla Tomljanovic | 1 | 0.39 | 6 | 2.36 |
| Albert Ramos | 1 | 0.39 | 7 | 2.76 |
| Alex Bolt | 1 | 0.39 | 2 | 0.79 |
| Alex De Minaur | 1 | 0.39 | 3 | 1.18 |
| Alexander Zverev | 3 | 1.18 | 6 | 2.36 |
+---------------------+-----------+-------------+------------------+-------------------+
或者您可以尝试使用数据步骤+过程摘要:
data want2;
set have2(rename=(player2=player)) have1(rename=(player1=player));
run;
proc summary data=want2 nway;
var Frequence Pourcentage Frequencecumulee Pourcentagecumule;
class player;
output out=want2 (drop=_:) sum=;
run;
输出:
+---------------------+-----------+-------------+------------------+-------------------+
| player | Frequence | Pourcentage | Frequencecumulee | Pourcentagecumule |
+---------------------+-----------+-------------+------------------+-------------------+
| Adrian Mannarino | 4 | 1.57 | 4 | 1.57 |
| Agnieszka Radwanska | 2 | 0.79 | 5 | 1.97 |
| Ajla Tomljanovic | 1 | 0.39 | 6 | 2.36 |
| Albert Ramos | 1 | 0.39 | 7 | 2.76 |
| Alex Bolt | 1 | 0.39 | 2 | 0.79 |
| Alex De Minaur | 1 | 0.39 | 3 | 1.18 |
| Alexander Zverev | 3 | 1.18 | 6 | 2.36 |
+---------------------+-----------+-------------+------------------+-------------------+
+---------------------+-----------+-------------+------------------+-------------------+
| player | Frequence | Pourcentage | Frequencecumulee | Pourcentagecumule |
+---------------------+-----------+-------------+------------------+-------------------+
| Adrian Mannarino | 4 | 1.57 | 4 | 1.57 |
| Agnieszka Radwanska | 2 | 0.79 | 5 | 1.97 |
| Ajla Tomljanovic | 1 | 0.39 | 6 | 2.36 |
| Albert Ramos | 1 | 0.39 | 7 | 2.76 |
| Alex Bolt | 1 | 0.39 | 2 | 0.79 |
| Alex De Minaur | 1 | 0.39 | 3 | 1.18 |
| Alexander Zverev | 3 | 1.18 | 6 | 2.36 |
+---------------------+-----------+-------------+------------------+-------------------+
当然,使用ODS表输出。这会给你一个干净的版本。名为
temp
的表是proc freq的输出,然后我将其清理到名为want
的可显示表中。这是非常通用的,所以在第一步中更改数据集名称和变量名称,其他一切都应该正常工作
*Run frequency for tables;
ods table onewayfreqs=temp;
proc freq data=sashelp.class;
table sex age;
run;
*Format output;
data want;
length variable $32. variable_value $50.;
set temp;
Variable=scan(table, 2);
Variable_Value=strip(trim(vvaluex(variable)));
keep variable variable_value frequency percent cum:;
label variable='Variable'
variable_value='Variable Value';
run;
*Display;
proc print data=want(obs=20) label;
run;
如果我理解正确:在一场比赛中,一名球员可以是球员1或球员2,你想得到他们所有出场的统计数据,而不管他们是球员1还是球员2?为什么不在计算频率之前先转置。然后PROC FREQ将为您生成适当的统计信息。如果添加百分比变量,它们的总数将从100%增加到200%。添加累积统计数据毫无意义。