R?中多因素的汇总统计?
我有每个游戏的游戏数据如下: 身份证件 销售时点情报系统 团队 目镜肺体积描记图 分数 0 A. 鸭子 青蛙 2. 1. B 鸭子 青蛙 15 2. B 鸭子 青蛙 20 3. C 鸭子 青蛙 7. 4. C 鸭子 青蛙 9.5 5. C 鸭子 青蛙 10 6. A. 青蛙 鸭子 3. 7. A. 青蛙 鸭子 0.5 8. B 青蛙 鸭子 17 9 B 青蛙 鸭子 13 10 B 青蛙 鸭子 21 11 C 青蛙 鸭子 8.5R?中多因素的汇总统计?,r,dataframe,R,Dataframe,我有每个游戏的游戏数据如下: 身份证件 销售时点情报系统 团队 目镜肺体积描记图 分数 0 A. 鸭子 青蛙 2. 1. B 鸭子 青蛙 15 2. B 鸭子 青蛙 20 3. C 鸭子 青蛙 7. 4. C 鸭子 青蛙 9.5 5. C 鸭子 青蛙 10 6. A. 青蛙 鸭子 3. 7. A. 青蛙 鸭子 0.5 8. B 青蛙 鸭子 17 9 B 青蛙 鸭子 13 10 B 青蛙 鸭子 21 11 C 青蛙 鸭子 8.5 正如@Maurits Evers所评论的那样,您展示输出的方式没有任
正如@Maurits Evers所评论的那样,您展示输出的方式没有任何意义。看起来您需要一个单独的输出,每个团队和位置的平均分数。另外,你每行只给我们一分,我想这是
团队的得分,所以我们没有对手的得分来计算平均值。我会使用dplyr
summary
函数
以下是您的数据:
game = data.frame(id = c(0:11),
Pos = c("A", "B", "B", "C", "C", "C","A","A", "B", "B", "B","C"),
Team = c("Duck","Duck","Duck","Duck","Duck","Duck","Frog","Frog","Frog","Frog","Frog","Frog"),
Opp = c("Frog","Frog","Frog","Frog","Frog","Frog","Duck","Duck","Duck","Duck","Duck","Duck"),
Score = c(2, 15, 20, 7, 9.5, 10, 3, 0.5, 17, 13, 21, 8.5))
首先,按职位划分的平均数:
library(dplyr)
Pos_av = game%>% #creat a new data.frame called "Pos_av" which is taking data from "game" and piping it (%>%) into different functions
group_by(Pos)%>% #first into a grouping function so we chose the variable we want to find the average for
summarise(Pos_Mean = mean(Score)) # the we pipe into summarise function where we name our new variable (Pos_Mean) and then define the function we want to use to summarise it (in this case the mean)
这给了我们:
对于团队来说,这同样意味着:
Team_av = game%>%
group_by(Team)%>%
summarise(Team_Mean = mean(Score))
要获得每个团队和职位的平均值,请按两个变量分组:
Both_av = game%>%
group_by(Team, Pos)%>%
summarise(Mean = mean(Score))
您可以通过循环数据帧和所有条件来设置每个单元格的值,具体取决于该条件和该对手/团队的平均值:
## The name of the variable holding the data.frame is "df"
## Expand the dataframe to contain your desired variables
for(t in c("Team","Opp")){
for(p in c("A","B","C")){
df[[paste(t,"_",p,"_","Avg",sep="")]]=NA
}
}
## Loop through the data to compute the means
for(i in 1:dim(df)[1]){
for(t in c("Team","Opp")){
for(p in c("A","B","C")){
## For each case i, each Team t, and each Position p, compute the mean and store it:
df[[paste(t,"_",p,"_","Avg",sep="")]][i] = mean(df$Score[df$Team==df[[t]][i] & df$Pos==p])
}
}
}
这将导致数据帧:
> df
Id Pos Team Opp Score Team_A_Avg Team_B_Avg Team_C_Avg Opp_A_Avg Opp_B_Avg Opp_C_Avg
1 0 A Duck Frog 2.0 2.00 17.5 8.833333 1.75 17.0 8.500000
2 1 B Duck Frog 15.0 2.00 17.5 8.833333 1.75 17.0 8.500000
3 2 B Duck Frog 20.0 2.00 17.5 8.833333 1.75 17.0 8.500000
4 3 C Duck Frog 7.0 2.00 17.5 8.833333 1.75 17.0 8.500000
5 4 C Duck Frog 9.5 2.00 17.5 8.833333 1.75 17.0 8.500000
6 5 C Duck Frog 10.0 2.00 17.5 8.833333 1.75 17.0 8.500000
7 6 A Frog Duck 3.0 1.75 17.0 8.500000 2.00 17.5 8.833333
8 7 A Frog Duck 0.5 1.75 17.0 8.500000 2.00 17.5 8.833333
9 8 B Frog Duck 17.0 1.75 17.0 8.500000 2.00 17.5 8.833333
10 9 B Frog Duck 13.0 1.75 17.0 8.500000 2.00 17.5 8.833333
11 10 B Frog Duck 21.0 1.75 17.0 8.500000 2.00 17.5 8.833333
12 11 C Frog Duck 8.5 1.75 17.0 8.500000 2.00 17.5 8.833333
我不明白你的预期产出;为什么Team_A_Avg=2
即使是Pos=B
和Pos=C
的行也是如此?@mbenoo请您查看建议的答案,如果有人回答了您的问题,请接受+/-向上投票?谢谢