R 基于函数在我的数据框中创建新列
我有一个数据框,里面有NFL球队和一些关于他们的数据。我想在那一周为每支球队每场比赛增加分数。 我不能只按团队总结数据,因为我需要以当前的方式呈现单个游戏R 基于函数在我的数据框中创建新列,r,function,dplyr,mutate,R,Function,Dplyr,Mutate,我有一个数据框,里面有NFL球队和一些关于他们的数据。我想在那一周为每支球队每场比赛增加分数。 我不能只按团队总结数据,因为我需要以当前的方式呈现单个游戏 CurrYrfun <- function(Yr,Tm,Wk){ PPG <- Schedule_Results %>% filter(Year == Yr & Team == Tm & Week < Wk) %>% group_by(Team) %>%
CurrYrfun <- function(Yr,Tm,Wk){
PPG <- Schedule_Results %>%
filter(Year == Yr & Team == Tm & Week < Wk) %>%
group_by(Team) %>%
summarize(APG = mean(Pts))
return(PPG[['APG']])
}
此函数为单个记录提供正确的结果,但当我尝试在dataframe中改变一个新列时,如下所示:
Schedule_Results <- Schedule_Results %>%
mutate(PPG = CurrYrfun(Year, Team, Week))
我得到一个错误,说PPG的长度为0。我试着附上一张数据帧的图片,这样你就知道我正在处理的数据了
编辑以包含数据和示例:
Schedule_Results <- structure(list(Year = c(2019L, 2019L, 2019L, 2019L, 2019L, 2019L,
2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 2019L,
2019L, 2019L, 2019L, 2019L, 2019L, 2019L), Week = c(17, 17, 17,
16, 16, 16, 15, 15, 15, 14, 14, 14, 13, 13, 13, 12, 12, 12, 11,
11, 11), Team = c("Washington Redskins", "Cincinnati Bengals",
"Jacksonville Jaguars", "Jacksonville Jaguars", "Washington Redskins",
"Cincinnati Bengals", "Cincinnati Bengals", "Washington Redskins",
"Jacksonville Jaguars", "Washington Redskins", "Cincinnati Bengals",
"Jacksonville Jaguars", "Jacksonville Jaguars", "Washington Redskins",
"Cincinnati Bengals", "Cincinnati Bengals", "Jacksonville Jaguars",
"Washington Redskins", "Washington Redskins", "Jacksonville Jaguars",
"Cincinnati Bengals"), Opp = c("Dallas Cowboys", "Cleveland Browns",
"Indianapolis Colts", "Atlanta Falcons", "New York Giants", "Miami Dolphins",
"New England Patriots", "Philadelphia Eagles", "Oakland Raiders",
"Green Bay Packers", "Cleveland Browns", "Los Angeles Chargers",
"Tampa Bay Buccaneers", "Carolina Panthers", "New York Jets",
"Pittsburgh Steelers", "Tennessee Titans", "Detroit Lions", "New York Jets",
"Indianapolis Colts", "Oakland Raiders"), Pts = c(16, 33, 38,
12, 35, 35, 13, 27, 20, 15, 19, 10, 11, 29, 22, 10, 20, 19, 17,
13, 10), Opp_Pts = c(47, 23, 20, 24, 41, 38, 34, 37, 16, 20,
27, 45, 28, 21, 6, 16, 42, 16, 34, 33, 17), Yds = c(271, 361,
353, 288, 361, 430, 315, 352, 262, 262, 451, 252, 242, 362, 277,
244, 369, 230, 225, 308, 246), Opp_Yds = c(517, 313, 275, 518,
552, 502, 291, 415, 364, 341, 333, 525, 315, 278, 271, 338, 471,
364, 400, 389, 386), TO = c(2, 1, 1, 1, 0, 1, 5, 1, 0, 1, 1,
0, 4, 0, 0, 2, 1, 2, 1, 1, 2), Opp_TO = c(1, 3, 2, 2, 0, 1, 0,
1, 0, 1, 2, 0, 1, 2, 0, 1, 2, 4, 2, 2, 2), Home = c("1", "1",
"1", "1", "0", "1", "0", "0", "0", "1", "1", "0", "0", "0", "1",
"0", "1", "1", "0", "1", "1"), Playoffs = c(0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), win = c("0", "1",
"1", "0", "0", "0", "0", "0", "1", "0", "0", "0", "0", "1", "1",
"0", "0", "1", "0", "0", "0")), row.names = c(NA, -21L), class = "data.frame")
我的目标是将每一行的函数输出作为数据帧中的一个新列返回我很确定这就是您想要的。我抽查了你给出的前几个例子,它们看起来不错
Schedule_Results %>%
group_by(Team, Year) %>%
arrange(Week) %>%
mutate(PPG = lag(cummean(Pts), 1))
# # A tibble: 21 x 14
# # Groups: Team, Year [3]
# Year Week Team Opp Pts Opp_Pts Yds Opp_Yds TO Opp_TO Home Playoffs win PPG
# <int> <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <chr> <dbl>
# 1 2019 11 Washington Reds~ New York Jets 17 34 225 400 1 2 0 0 0 NA
# 2 2019 11 Jacksonville Ja~ Indianapolis Co~ 13 33 308 389 1 2 1 0 0 NA
# 3 2019 11 Cincinnati Beng~ Oakland Raiders 10 17 246 386 2 2 1 0 0 NA
# 4 2019 12 Cincinnati Beng~ Pittsburgh Stee~ 10 16 244 338 2 1 0 0 0 10
# 5 2019 12 Jacksonville Ja~ Tennessee Titans 20 42 369 471 1 2 1 0 0 13
# 6 2019 12 Washington Reds~ Detroit Lions 19 16 230 364 2 4 1 0 1 17
# 7 2019 13 Jacksonville Ja~ Tampa Bay Bucca~ 11 28 242 315 4 1 0 0 0 16.5
# 8 2019 13 Washington Reds~ Carolina Panthe~ 29 21 362 278 0 2 0 0 1 18
# 9 2019 13 Cincinnati Beng~ New York Jets 22 6 277 271 0 0 1 0 1 10
# 10 2019 14 Washington Reds~ Green Bay Packe~ 15 20 262 341 1 1 1 0 0 21.7
...
你不能用mutate代替sumarize吗?如果你共享示例输入和所需输出,我们可能会帮助你调试。请使用dput共享示例输入,如dputSchedule_Results[1:10],或其他合适的子集(如果前10行不是一个好选择)。处理数据图片非常困难……您应该了解dplyr中的函数是如何工作的:试着阅读整个@onyanbu是的,我认为我的问题是,当我的函数添加到第二段代码中时,它没有按照需要将我的列名作为输入。我如何解决这个问题?@GregorThomas我已经更新了我的问题,将这些项目包括在内。很抱歉最初不清楚,这是我第一次发布数据。表版本将设置为Schedule_结果[orderWeek,PPG:=shiftcummeanPts,.Team,Year]
Schedule_Results %>%
group_by(Team, Year) %>%
arrange(Week) %>%
mutate(PPG = lag(cummean(Pts), 1))
# # A tibble: 21 x 14
# # Groups: Team, Year [3]
# Year Week Team Opp Pts Opp_Pts Yds Opp_Yds TO Opp_TO Home Playoffs win PPG
# <int> <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <chr> <dbl>
# 1 2019 11 Washington Reds~ New York Jets 17 34 225 400 1 2 0 0 0 NA
# 2 2019 11 Jacksonville Ja~ Indianapolis Co~ 13 33 308 389 1 2 1 0 0 NA
# 3 2019 11 Cincinnati Beng~ Oakland Raiders 10 17 246 386 2 2 1 0 0 NA
# 4 2019 12 Cincinnati Beng~ Pittsburgh Stee~ 10 16 244 338 2 1 0 0 0 10
# 5 2019 12 Jacksonville Ja~ Tennessee Titans 20 42 369 471 1 2 1 0 0 13
# 6 2019 12 Washington Reds~ Detroit Lions 19 16 230 364 2 4 1 0 1 17
# 7 2019 13 Jacksonville Ja~ Tampa Bay Bucca~ 11 28 242 315 4 1 0 0 0 16.5
# 8 2019 13 Washington Reds~ Carolina Panthe~ 29 21 362 278 0 2 0 0 1 18
# 9 2019 13 Cincinnati Beng~ New York Jets 22 6 277 271 0 0 1 0 1 10
# 10 2019 14 Washington Reds~ Green Bay Packe~ 15 20 262 341 1 1 1 0 0 21.7
...