R:按团队计算平均值
我有以下格式的足球成绩数据(数千次观察):R:按团队计算平均值,r,R,我有以下格式的足球成绩数据(数千次观察): 加上其他变量。“价值”是团队,pts是最终结果(赢/输/平)作为数值。我试图添加一个新的变量,它是该行球队在过去X场比赛中该值的平均值。我如何做到这一点而不使用一些可怕的循环 尝试使用stats中的ave函数 Trt <- gl(n=2, k=3, length=2*3, labels =c("A", "B")) Y <- 1:6 Data <- data.frame(Trt, Y) Data Trt Y 1 A 1 2
加上其他变量。“价值”是团队,pts是最终结果(赢/输/平)作为数值。我试图添加一个新的变量,它是该行球队在过去X场比赛中该值的平均值。我如何做到这一点而不使用一些可怕的循环 尝试使用stats中的
ave
函数
Trt <- gl(n=2, k=3, length=2*3, labels =c("A", "B"))
Y <- 1:6
Data <- data.frame(Trt, Y)
Data
Trt Y
1 A 1
2 A 2
3 A 3
4 B 4
5 B 5
6 B 6
Data$TrtMean <- ave(Y, Trt, FUN=mean)
Data
Trt Y TrtMean
1 A 1 2
2 A 2 2
3 A 3 2
4 B 4 5
5 B 5 5
6 B 6 5
Trt看看
使用zoo软件包和rollmean
以及plyr软件包的ddply
:
library(zoo)
library(plyr)
dat <- data.frame(value=letters[1:5], pts=sample(c(0, 0.5, 1), 50, replace=T))
ddply(dat, .(value), summarise, rollmean(pts, k=5, align='right'))
使用tapply
可以非常有效地完成此操作。我用随机的分数和日期复制了球队的比赛,在某种程度上改变了你的数据。这取最近2场比赛的平均值,如tail
函数中所指定
# create some data
d <- structure(list(Div = structure(rep(1L, 33), .Label = " E0",
class = "factor"), date = structure(c(15013, 14990, 14996, 15001, 14995, 15006,
15020, 15032, 15023, 15022, 15015, 15016, 15034, 14994, 14986, 14998, 14982,
14979, 14980, 15016, 15031, 15013, 15031, 14999, 15025, 14978, 15007, 15026,
14992, 14997, 15023, 14986, 15028), class = "Date"),
value = structure(c(3L, 4L, 5L, 7L, 8L, 11L, 9L, 10L, 6L, 1L, 2L, 3L, 4L, 5L,
7L, 8L, 11L, 9L, 10L, 6L, 1L, 2L, 3L, 4L, 5L, 7L, 8L, 11L, 9L, 10L, 6L, 1L,
2L), .Label = c("Arsenal", "Aston Villa", "Blackburn", "Fulham", "Liverpool",
"Man City", "Newcastle", "QPR", "Stoke", "West Brom", "Wigan"),
class = "factor"), pts = c(0.5, 0.5, 0.5, 1, 1, 1, 1, 0, 1, 0.5, 0, 1, 1, 1, 1,
0.5, 0.5, 0, 0.5, 0.5, 0, 0, 0, 1, 0, 0, 0.5, 0, 1, 0, 0.5, 0.5, 0.5)),
.Names = c("Div", "date", "value", "pts"), row.names = c(NA, 33L),
class = "data.frame")
# sort rows by date
d2 <- d[order(d$date),]
# mean of all games
tapply(d2$pts, d2$value, mean)
# mean of last 2 games
tapply(d2$pts, d2$value, function(x) mean(tail(x, 2)))
# To tidy up the output, you could use simplify=FALSE and do.call(rbind, x):
# e.g., mean of last 2 games:
do.call(rbind, tapply(d2$pts, d2$value, function(x) mean(tail(x, 2)),
simplify=F))
[,1]
Arsenal 0.25
Aston Villa 0.25
Blackburn 0.50
Fulham 1.00
Liverpool 0.25
Man City 0.75
Newcastle 1.00
QPR 0.50
Stoke 1.00
West Brom 0.00
Wigan 0.50
#创建一些数据
d请提供一些示例代码,这使OP更加清晰。@PaulHiemstra:我添加了一个示例。是否有一种简单的方法来修改此代码以满足问题的要求?事实上,aggregate
只需一步,例如聚合(d2$pts,list(d2$value),函数(x)均值(tail(x,2))
ddply(dat, .(value), summarise, rollmean(pts, k=5, fill=NA, align='right'))
# create some data
d <- structure(list(Div = structure(rep(1L, 33), .Label = " E0",
class = "factor"), date = structure(c(15013, 14990, 14996, 15001, 14995, 15006,
15020, 15032, 15023, 15022, 15015, 15016, 15034, 14994, 14986, 14998, 14982,
14979, 14980, 15016, 15031, 15013, 15031, 14999, 15025, 14978, 15007, 15026,
14992, 14997, 15023, 14986, 15028), class = "Date"),
value = structure(c(3L, 4L, 5L, 7L, 8L, 11L, 9L, 10L, 6L, 1L, 2L, 3L, 4L, 5L,
7L, 8L, 11L, 9L, 10L, 6L, 1L, 2L, 3L, 4L, 5L, 7L, 8L, 11L, 9L, 10L, 6L, 1L,
2L), .Label = c("Arsenal", "Aston Villa", "Blackburn", "Fulham", "Liverpool",
"Man City", "Newcastle", "QPR", "Stoke", "West Brom", "Wigan"),
class = "factor"), pts = c(0.5, 0.5, 0.5, 1, 1, 1, 1, 0, 1, 0.5, 0, 1, 1, 1, 1,
0.5, 0.5, 0, 0.5, 0.5, 0, 0, 0, 1, 0, 0, 0.5, 0, 1, 0, 0.5, 0.5, 0.5)),
.Names = c("Div", "date", "value", "pts"), row.names = c(NA, 33L),
class = "data.frame")
# sort rows by date
d2 <- d[order(d$date),]
# mean of all games
tapply(d2$pts, d2$value, mean)
# mean of last 2 games
tapply(d2$pts, d2$value, function(x) mean(tail(x, 2)))
# To tidy up the output, you could use simplify=FALSE and do.call(rbind, x):
# e.g., mean of last 2 games:
do.call(rbind, tapply(d2$pts, d2$value, function(x) mean(tail(x, 2)),
simplify=F))
[,1]
Arsenal 0.25
Aston Villa 0.25
Blackburn 0.50
Fulham 1.00
Liverpool 0.25
Man City 0.75
Newcastle 1.00
QPR 0.50
Stoke 1.00
West Brom 0.00
Wigan 0.50