Sql 如何对具有相同变量的两个频率数据集求和？_Sql_Database_Sas_Dataset_Proc

Sql 如何对具有相同变量的两个频率数据集求和？

sql database sas

Sql 如何对具有相同变量的两个频率数据集求和？,sql,database,sas,dataset,proc,Sql,Database,Sas,Dataset,Proc,嗨，我想创建两个网球运动员的频率数据集，他们赢得了一定数量的比赛。两个数据集的int顺序完全相同我如何创建数据集： PROC FREQ data=projet.matchs; TABLES player1 / out = table1; run; player1 Fréquence Pourcentage Fréquencecumulée Pourcentagecumulé Adrian Mannarino 3 1.

嗨，我想创建两个网球运动员的频率数据集，他们赢得了一定数量的比赛。两个数据集的int顺序完全相同

我如何创建数据集：

PROC FREQ data=projet.matchs;
    TABLES player1 / out = table1;
run;

player1            Fréquence    Pourcentage Fréquencecumulée    Pourcentagecumulé
Adrian Mannarino    3              1.18        3                      1.18
Agnieszka Radwanska 2               0.79       5                      1.97
Ajla Tomljanovic    1               0.39       6                      2.36
Albert Ramos        1               0.39       7                      2.76

第二数据集表2

PROC FREQ data=projet.matchs;
    TABLES player2 / out= table2;
run;

player2              Fréquence  Pourcentage Fréquence cumulée   Pourcentage cumulé
Adrian Mannarino       1          0.39              1                 0.39
Alex Bolt              1          0.39              2                 0.79
Alex De Minaur         1          0.39              3                 1.18
Alexander Zverev       3          1.18              6                 2.36

我想要的是创建一个表1和表2之和的新数据集。我的数据集要大得多，我刚刚把第四个观测值放在第一位

任何帮助都将不胜感激！谢谢

这个怎么样？对你有用吗

data combined / view=combined;
set table1 table2;
run;

proc means data=combined nway;
class player1;
var Fréquence,Pourcentage,Fréquence cumulée,Pourcentage cumulé;
run;

Yoy可以使用proc sql和summary函数将其连接到表中

Have1数据集：

+---------------------+-----------+-------------+------------------+-------------------+
|       player1       | Frequence | Pourcentage | Frequencecumulee | Pourcentagecumule |
+---------------------+-----------+-------------+------------------+-------------------+
| Adrian Mannarino    |         3 |        1.18 |                3 |              1.18 |
| Agnieszka Radwanska |         2 |        0.79 |                5 |              1.97 |
| Ajla Tomljanovic    |         1 |        0.39 |                6 |              2.36 |
| Albert Ramos        |         1 |        0.39 |                7 |              2.76 |
+---------------------+-----------+-------------+------------------+-------------------+

Have2数据集：

+------------------+-----------+-------------+------------------+-------------------+
|     player2      | Frequence | Pourcentage | Frequencecumulee | Pourcentagecumule |
+------------------+-----------+-------------+------------------+-------------------+
| Adrian Mannarino |         1 |        0.39 |                1 |              0.39 |
| Alex Bolt        |         1 |        0.39 |                2 |              0.79 |
| Alex De Minaur   |         1 |        0.39 |                3 |              1.18 |
| Alexander Zverev |         3 |        1.18 |                6 |              2.36 |
+------------------+-----------+-------------+------------------+-------------------+

解决方案：

proc sql noprint;
   create table want1 as
   select 
            coalesce(player1,player2) as player,
            sum(t1.Frequence,t2.Frequence) as Frequence,
            sum(t1.Pourcentage,t2.Pourcentage) as Pourcentage,
            sum(t1.Frequencecumulee,t2.Frequencecumulee) as Frequencecumulee,
            sum(t1.Pourcentagecumule,t2.Pourcentagecumule) as Pourcentagecumule
   from
            have1 t1
   full join 
            have2 t2
   on
            strip(player1)=strip(player2);
quit;

输出：

+---------------------+-----------+-------------+------------------+-------------------+
|       player        | Frequence | Pourcentage | Frequencecumulee | Pourcentagecumule |
+---------------------+-----------+-------------+------------------+-------------------+
| Adrian Mannarino    |         4 |        1.57 |                4 |              1.57 |
| Agnieszka Radwanska |         2 |        0.79 |                5 |              1.97 |
| Ajla Tomljanovic    |         1 |        0.39 |                6 |              2.36 |
| Albert Ramos        |         1 |        0.39 |                7 |              2.76 |
| Alex Bolt           |         1 |        0.39 |                2 |              0.79 |
| Alex De Minaur      |         1 |        0.39 |                3 |              1.18 |
| Alexander Zverev    |         3 |        1.18 |                6 |              2.36 |
+---------------------+-----------+-------------+------------------+-------------------+

+---------------------+-----------+-------------+------------------+-------------------+
|       player        | Frequence | Pourcentage | Frequencecumulee | Pourcentagecumule |
+---------------------+-----------+-------------+------------------+-------------------+
| Adrian Mannarino    |         4 |        1.57 |                4 |              1.57 |
| Agnieszka Radwanska |         2 |        0.79 |                5 |              1.97 |
| Ajla Tomljanovic    |         1 |        0.39 |                6 |              2.36 |
| Albert Ramos        |         1 |        0.39 |                7 |              2.76 |
| Alex Bolt           |         1 |        0.39 |                2 |              0.79 |
| Alex De Minaur      |         1 |        0.39 |                3 |              1.18 |
| Alexander Zverev    |         3 |        1.18 |                6 |              2.36 |
+---------------------+-----------+-------------+------------------+-------------------+

或者您可以尝试使用数据步骤+过程摘要：

data want2;
  set have2(rename=(player2=player)) have1(rename=(player1=player));
run;

proc summary data=want2 nway;
  var Frequence Pourcentage Frequencecumulee Pourcentagecumule;
  class player;
  output out=want2 (drop=_:) sum=;
run;

输出：

+---------------------+-----------+-------------+------------------+-------------------+
|       player        | Frequence | Pourcentage | Frequencecumulee | Pourcentagecumule |
+---------------------+-----------+-------------+------------------+-------------------+
| Adrian Mannarino    |         4 |        1.57 |                4 |              1.57 |
| Agnieszka Radwanska |         2 |        0.79 |                5 |              1.97 |
| Ajla Tomljanovic    |         1 |        0.39 |                6 |              2.36 |
| Albert Ramos        |         1 |        0.39 |                7 |              2.76 |
| Alex Bolt           |         1 |        0.39 |                2 |              0.79 |
| Alex De Minaur      |         1 |        0.39 |                3 |              1.18 |
| Alexander Zverev    |         3 |        1.18 |                6 |              2.36 |
+---------------------+-----------+-------------+------------------+-------------------+

+---------------------+-----------+-------------+------------------+-------------------+
|       player        | Frequence | Pourcentage | Frequencecumulee | Pourcentagecumule |
+---------------------+-----------+-------------+------------------+-------------------+
| Adrian Mannarino    |         4 |        1.57 |                4 |              1.57 |
| Agnieszka Radwanska |         2 |        0.79 |                5 |              1.97 |
| Ajla Tomljanovic    |         1 |        0.39 |                6 |              2.36 |
| Albert Ramos        |         1 |        0.39 |                7 |              2.76 |
| Alex Bolt           |         1 |        0.39 |                2 |              0.79 |
| Alex De Minaur      |         1 |        0.39 |                3 |              1.18 |
| Alexander Zverev    |         3 |        1.18 |                6 |              2.36 |
+---------------------+-----------+-------------+------------------+-------------------+

当然，使用ODS表输出。这会给你一个干净的版本。名为

temp

的表是proc freq的输出，然后我将其清理到名为

want

的可显示表中。这是非常通用的，所以在第一步中更改数据集名称和变量名称，其他一切都应该正常工作

*Run frequency for tables;
ods table onewayfreqs=temp;
proc freq data=sashelp.class;
    table sex age;
run;

*Format output;
data want;
length variable $32. variable_value $50.;
set temp;
Variable=scan(table, 2);

Variable_Value=strip(trim(vvaluex(variable)));

keep variable variable_value frequency percent cum:;
label variable='Variable' 
    variable_value='Variable Value';
run;

*Display;
proc print data=want(obs=20) label;
run;

如果我理解正确：在一场比赛中，一名球员可以是球员1或球员2，你想得到他们所有出场的统计数据，而不管他们是球员1还是球员2？为什么不在计算频率之前先转置。然后PROC FREQ将为您生成适当的统计信息。如果添加百分比变量，它们的总数将从100%增加到200%。添加累积统计数据毫无意义。