R 计算组平均值,然后根据组计算滞后
我正在学习R方面的技能。如果可能的话,我想使用R 计算组平均值,然后根据组计算滞后,r,dplyr,R,Dplyr,我正在学习R方面的技能。如果可能的话,我想使用dplyr包解决这个问题 我有一个关于梦幻足球的数据集。每个记录都是一个赛季一场比赛(一周)的球员统计数据,包括球员在那一周的梦幻足球分数 以下是我正在处理的数据片段: Player Week year Fantasy.Points Avg.Fantasy.Ponts 1 Aaron Hernandez 1 2011 16.3 9.678571 2 Aaron Hernan
dplyr
包解决这个问题
我有一个关于梦幻足球的数据集。每个记录都是一个赛季一场比赛(一周)的球员统计数据,包括球员在那一周的梦幻足球分数
以下是我正在处理的数据片段:
Player Week year Fantasy.Points Avg.Fantasy.Ponts
1 Aaron Hernandez 1 2011 16.3 9.678571
2 Aaron Hernandez 2 2011 12.2 9.678571
3 Aaron Hernandez 5 2011 5.6 9.678571
4 Aaron Hernandez 6 2011 10.8 9.678571
5 Aaron Hernandez 8 2011 7.1 9.678571
6 Aaron Hernandez 9 2011 9.5 9.678571
7 Aaron Hernandez 10 2011 4.1 9.678571
8 Aaron Hernandez 11 2011 4.4 9.678571
9 Aaron Hernandez 12 2011 6.2 9.678571
10 Aaron Hernandez 13 2011 4.3 9.678571
11 Aaron Hernandez 14 2011 8.4 9.678571
12 Aaron Hernandez 15 2011 20.5 9.678571
13 Aaron Hernandez 16 2011 3.7 9.678571
14 Aaron Hernandez 17 2011 22.4 9.678571
15 Aaron Hernandez 1 2012 12.4 8.755556
16 Aaron Hernandez 6 2012 9.0 8.755556
17 Aaron Hernandez 7 2012 5.4 8.755556
18 Aaron Hernandez 12 2012 3.6 8.755556
19 Aaron Hernandez 13 2012 9.7 8.755556
20 Aaron Hernandez 14 2012 17.8 8.755556
字段Avg.Fantasy.Points
是玩家在该记录中全年的平均积分。例如,艾伦·埃尔南德斯(Aaron Hernandez)2011赛季的平均身价为9.678571分,2012赛季的平均身价为8.755556分
我感兴趣的是计算一列球员在前一年的平均得分。在上面的例子中,亚伦·埃尔南德斯在2012年的记录应该表明,前一年的平均得分为9.68571分 我找到了一个变通解决方案,类似于SQL中的子查询 假设
df_te
是上面代码段中的数据帧:
df_te %>%
left_join(
mutate(next.year = year + 1) %>% #add a column for the next year
group_by(Player, year) %>%
mutate(Previous.Avg.Fantasy.Points = first(Avg.Fantasy.Points) %>% #Copy of 'Avg.Fantasy.Points' column, with the name I'd like to have for new column
filter(row_number() == 1) %>% #Only keep one row per player/year group to avoid duplication upon join
select(Player, next.year, Previous.Avg.Fantasy.Points) #keep only columns I'd like to join in
by = c("Player" = "Player", "year" = "next.year") #By joining 'year' on LHS table with 'next.year' on RHS table, can get the previous year's average points.
)
由于您使用的是
dplyr
软件包,因此我想介绍lag
函数的用法。它可以移动给定行数的值。默认值为1。最后一行select(c(colnames(dt),“Pre.Avg.Fantasy.Ponts”)
仅用于调整列的顺序dt2
是最终输出
library(dplyr)
dt2 <- dt %>%
group_by(Player, year) %>%
summarise(Avg.Fantasy.Ponts = first(Avg.Fantasy.Ponts)) %>%
mutate(Pre.Avg.Fantasy.Ponts = lag(Avg.Fantasy.Ponts)) %>%
select(-Avg.Fantasy.Ponts) %>%
right_join(dt, by = c("Player", "year")) %>%
select(c(colnames(dt), "Pre.Avg.Fantasy.Ponts"))
库(dplyr)
dt2%
分组依据(球员,年份)%>%
总结(Avg.Fantasy.Ponts=first(Avg.Fantasy.Ponts))%>%
突变(Pre.Avg.Fantasy.Ponts=滞后(Avg.Fantasy.Ponts))%>%
选择(-Avg.Fantasy.Ponts)%>%
右加入(dt,by=c(“玩家”,“年份”))%>%
选择(c(colnames(dt),“Pre.Avg.Fantasy.Ponts”))
资料
dt
dt <- read.table(text = " Player Week year Fantasy.Points Avg.Fantasy.Ponts
1 'Aaron Hernandez' 1 2011 16.3 9.678571
2 'Aaron Hernandez' 2 2011 12.2 9.678571
3 'Aaron Hernandez' 5 2011 5.6 9.678571
4 'Aaron Hernandez' 6 2011 10.8 9.678571
5 'Aaron Hernandez' 8 2011 7.1 9.678571
6 'Aaron Hernandez' 9 2011 9.5 9.678571
7 'Aaron Hernandez' 10 2011 4.1 9.678571
8 'Aaron Hernandez' 11 2011 4.4 9.678571
9 'Aaron Hernandez' 12 2011 6.2 9.678571
10 'Aaron Hernandez' 13 2011 4.3 9.678571
11 'Aaron Hernandez' 14 2011 8.4 9.678571
12 'Aaron Hernandez' 15 2011 20.5 9.678571
13 'Aaron Hernandez' 16 2011 3.7 9.678571
14 'Aaron Hernandez' 17 2011 22.4 9.678571
15 'Aaron Hernandez' 1 2012 12.4 8.755556
16 'Aaron Hernandez' 6 2012 9.0 8.755556
17 'Aaron Hernandez' 7 2012 5.4 8.755556
18 'Aaron Hernandez' 12 2012 3.6 8.755556
19 'Aaron Hernandez' 13 2012 9.7 8.755556
20 'Aaron Hernandez' 14 2012 17.8 8.755556",
header = TRUE, stringsAsFactors = FALSE)