如何使用dyplr计算所有列的平均值和变量值之间的差值

如何使用dyplr计算所有列的平均值和变量值之间的差值,r,dplyr,R,Dplyr,我想计算数据集中所有列的平均值和值之间的差值。我的方程式是: residual = y - mean(y) 我的原始数据集如下所示: d_season_combined %>% head() Player Pos Age Team G GS MPG FG FGA FG% 3P 3PA 3P% 2P 2PA 2P% eFG% FT FTA FT% ORB DRB TRB 1 Álex Abrines SG 24 OKC 75 8

我想计算数据集中所有列的平均值和值之间的差值。我的方程式是:

residual = y - mean(y)
我的原始数据集如下所示:

d_season_combined %>% head()

         Player Pos Age Team  G GS  MPG  FG FGA   FG%  3P 3PA   3P%  2P 2PA   2P%  eFG%  FT FTA   FT% ORB DRB TRB
1  Álex Abrines  SG  24  OKC 75  8 15.1 1.5 3.9 0.395 1.1 2.9 0.380 0.4 0.9 0.443 0.540 0.5 0.6 0.848 0.3 1.2 1.5
2    Quincy Acy  PF  27  BRK 70  8 19.4 1.9 5.2 0.356 1.5 4.2 0.349 0.4 1.0 0.384 0.496 0.7 0.9 0.817 0.6 3.1 3.7
3  Steven Adams   C  24  OKC 76 76 32.7 5.9 9.4 0.629 0.0 0.0 0.000 5.9 9.3 0.631 0.629 2.1 3.8 0.559 5.1 4.0 9.0
4   Bam Adebayo   C  20  MIA 69 19 19.8 2.5 4.9 0.512 0.0 0.1 0.000 2.5 4.8 0.523 0.512 1.9 2.6 0.721 1.7 3.8 5.5
5 Arron Afflalo  SG  32  ORL 53  3 12.9 1.2 3.1 0.401 0.5 1.3 0.386 0.7 1.7 0.413 0.485 0.4 0.5 0.846 0.1 1.2 1.2
6  Cole Aldrich   C  29  MIN 21  0  2.3 0.2 0.7 0.333 0.0 0.0    NA 0.2 0.7 0.333 0.333 0.1 0.3 0.333 0.1 0.6 0.7
d_league_average <- d_season_combined %>%
  select(-c(Player, Team, Pos, Season)) %>%
  mutate_if(is.character, as.numeric) %>%
  summarise_all(mean, na.rm = TRUE)

     Age        G       GS      MPG       FG      FGA       FG%        3P     3PA       3P%       2P      2PA
1 25.99439 48.79159 22.99065 19.27589 3.113551 6.873738 0.4432531 0.8437383 2.40972 0.3118051 2.269533 4.465794
  d_season_combined %>% 
  select(PTS) %>% 
  lapply(function(i)i-d_league_average$PTS)

$PTS
   [1] -3.68504673 -2.48504673  5.51495327 -1.48504673 -4.98504673 -7.78504673 14.71495327 -0.18504673 -7.28504673
  [10] -3.68504673  0.91495327 -2.18504673 -0.48504673  0.91495327 -7.18504673 18.51495327  7.81495327 -2.48504673
我计算了每列的平均值,如下所示:

d_season_combined %>% head()

         Player Pos Age Team  G GS  MPG  FG FGA   FG%  3P 3PA   3P%  2P 2PA   2P%  eFG%  FT FTA   FT% ORB DRB TRB
1  Álex Abrines  SG  24  OKC 75  8 15.1 1.5 3.9 0.395 1.1 2.9 0.380 0.4 0.9 0.443 0.540 0.5 0.6 0.848 0.3 1.2 1.5
2    Quincy Acy  PF  27  BRK 70  8 19.4 1.9 5.2 0.356 1.5 4.2 0.349 0.4 1.0 0.384 0.496 0.7 0.9 0.817 0.6 3.1 3.7
3  Steven Adams   C  24  OKC 76 76 32.7 5.9 9.4 0.629 0.0 0.0 0.000 5.9 9.3 0.631 0.629 2.1 3.8 0.559 5.1 4.0 9.0
4   Bam Adebayo   C  20  MIA 69 19 19.8 2.5 4.9 0.512 0.0 0.1 0.000 2.5 4.8 0.523 0.512 1.9 2.6 0.721 1.7 3.8 5.5
5 Arron Afflalo  SG  32  ORL 53  3 12.9 1.2 3.1 0.401 0.5 1.3 0.386 0.7 1.7 0.413 0.485 0.4 0.5 0.846 0.1 1.2 1.2
6  Cole Aldrich   C  29  MIN 21  0  2.3 0.2 0.7 0.333 0.0 0.0    NA 0.2 0.7 0.333 0.333 0.1 0.3 0.333 0.1 0.6 0.7
d_league_average <- d_season_combined %>%
  select(-c(Player, Team, Pos, Season)) %>%
  mutate_if(is.character, as.numeric) %>%
  summarise_all(mean, na.rm = TRUE)

     Age        G       GS      MPG       FG      FGA       FG%        3P     3PA       3P%       2P      2PA
1 25.99439 48.79159 22.99065 19.27589 3.113551 6.873738 0.4432531 0.8437383 2.40972 0.3118051 2.269533 4.465794
  d_season_combined %>% 
  select(PTS) %>% 
  lapply(function(i)i-d_league_average$PTS)

$PTS
   [1] -3.68504673 -2.48504673  5.51495327 -1.48504673 -4.98504673 -7.78504673 14.71495327 -0.18504673 -7.28504673
  [10] -3.68504673  0.91495327 -2.18504673 -0.48504673  0.91495327 -7.18504673 18.51495327  7.81495327 -2.48504673
但我不知道如何为所有的哥伦布人做这件事。 我尝试用总结来解决这个问题:

  d_residuals <- d_season_combined %>% 
  select(-c( Pos, Team)) %>%
  group_by(Season, Player) %>% 
  summarise_all(function(i)d_league_average[i])
d_残差%
选择(-c(职位、团队))%>%
按(赛季、球员)分组%>%
总结所有(功能(i)d联赛平均值[i])
使用lappy:

 d_residuals <- lapply(d_season_combined[column_names], function(i) y-d_league_average[i]) %>% bind_rows()
d_残差%bind_行()

但这对我不起作用。如何为所有列实现这一点?感谢您的帮助

我们不需要单独计算
d_联赛_平均值
,然后使用它来计算
d_残差
,我们可以在一个步骤中直接从相应的列平均值中减去这些值

library(dplyr)

d_season_combined %>%
  select(-c(Player, Team, Pos)) %>%
  mutate_if(is.character, as.numeric) %>%
  mutate_all(~. - mean(., na.rm = TRUE))

我们不需要单独计算d_联赛平均值,然后用它来计算d_残差,我们可以在一个步骤中直接从相应的列平均值中减去这些值

library(dplyr)

d_season_combined %>%
  select(-c(Player, Team, Pos)) %>%
  mutate_if(is.character, as.numeric) %>%
  mutate_all(~. - mean(., na.rm = TRUE))

非常感谢你@yukikongju很高兴能提供帮助!如果您觉得投票按钮对您有用,请单击左侧投票按钮旁边的复选标记,随时进行投票。:-)每个帖子你只能接受一个答案。非常感谢@yukikongju很高兴能提供帮助!如果您觉得投票按钮对您有用,请单击左侧投票按钮旁边的复选标记,随时进行投票。:-)每个帖子只能接受一个答案。