R 是否要对基于列名将列平均在一起，但同时将某些列排除在计算之外的列进行变异？_R_Dataframe_Dplyr

R 是否要对基于列名将列平均在一起，但同时将某些列排除在计算之外的列进行变异？

r dataframe

R 是否要对基于列名将列平均在一起，但同时将某些列排除在计算之外的列进行变异？,r,dataframe,dplyr,R,Dataframe,Dplyr,在数据框中工作时，我希望使用mutate创建一个新列，该列根据列名将每行中除一列之外的所有列平均起来。我需要能够在每次使用mutate时排除某一列，并且我希望计算也跳过NA值我的DF的简单版本： Team stat1 stat2 stat3 stat4 1 ARI 3 NA 4 6 2 BAL NA 2 NA 1 3 CAR 5 4 6 2 通过计算统计列的平均值创建的NewCol1，不包括

在数据框中工作时，我希望使用mutate创建一个新列，该列根据列名将每行中除一列之外的所有列平均起来。我需要能够在每次使用mutate时排除某一列，并且我希望计算也跳过NA值

我的DF的简单版本：

   Team stat1 stat2 stat3 stat4
1  ARI     3    NA     4     6
2  BAL    NA     2    NA     1
3  CAR     5     4     6     2

通过计算统计列的平均值创建的NewCol1，不包括“stat 1”列和NA值。与NewCol2相同，计算平均值不包括“stat2”列：

  Team stat1 stat2 stat3 stat4 NewCol1 NewCol2
1  ARI     3    NA     4     6     5.0    4.33
2  BAL    NA     2    NA     1     1.5    1.00
3  CAR     5     4     6     2     4.0    4.33

如果我想创建新的列，对每个属性执行相同的操作，那么最有效的方法是什么？DF有10个stat列，每个列都有相同的名称，然后在每个名称后面都有一个数字。我在想starts_with（）函数在这里可能对rowMeans（）有用，但在每次排除特定列的同时，我仍在努力实现它。

我们可以在

选择相关列后使用rowMeans

library(dplyr)
df1 %>%
      mutate(NewCol1 = rowMeans(select(., -Team, -stat1), na.rm = TRUE),
        NewCol2 = rowMeans(select(., -Team, -stat2), na.rm = TRUE))

-输出
#  Team stat1 stat2 stat3 stat4 NewCol1  NewCol2
#1  ARI     3    NA     4     6     5.0 4.333333
#2  BAL    NA     2    NA     1     1.5 1.000000
#3  CAR     5     4     6     2     4.0 4.333333

# A tibble: 3 x 9
#  Team  stat1 stat2 stat3 stat4 NewCol1 NewCol2 NewCol3 NewCol4
#  <chr> <int> <int> <int> <int>   <dbl>   <dbl>   <dbl>   <dbl>
#1 ARI       3    NA     4     6     5      4.33    4.5      3.5
#2 BAL      NA     2    NA     1     1.5    1       1.5      2  
#3 CAR       5     4     6     2     4      4.33    3.67     5  

df1
#   Team stat1 stat2 stat3 stat4 NewCol1  NewCol2  NewCol3 NewCol4
#1  ARI     3    NA     4     6     5.0 4.333333 4.500000     3.5
#2  BAL    NA     2    NA     1     1.5 1.000000 1.500000     2.0
#3  CAR     5     4     6     2     4.0 4.333333 3.666667     5.0

#  Team stat1 stat2 stat3 stat4 NewCol1  NewCol2  NewCol3 NewCol4
#1  ARI     3    NA     4     6     5.0 4.333333 4.500000     3.5
#2  BAL    NA     2    NA     1     1.5 1.000000 1.500000     2.0
#3  CAR     5     4     6     2     4.0 4.333333 3.666667     5.0

或另一个带有c_的选项
df1 %>% 
   rowwise %>%
   mutate(NewCol1 = mean(c_across(c(where(is.numeric), -stat1)), na.rm = TRUE), 
   NewCol2 = mean(c_across(c(starts_with('stat'), -stat2)), na.rm = TRUE), 
   NewCol3 = mean(c_across(c(starts_with('stat'), -stat3)), na.rm = TRUE), 
   NewCol4 = mean(c_across(c(starts_with('stat'), -stat4)), na.rm = TRUE)) %>%
   ungroup

map_dfc(nm1,  ~
     df1 %>% 
        select(starts_with('stat'), -.x) %>% rowwise %>%
        transmute(!! str_c('NewCol', readr::parse_number(.x)) :=   mean(c_across(everything()), na.rm = TRUE))) %>%
        ungroup %>%
    bind_cols(df1, .)

-输出
#  Team stat1 stat2 stat3 stat4 NewCol1  NewCol2
#1  ARI     3    NA     4     6     5.0 4.333333
#2  BAL    NA     2    NA     1     1.5 1.000000
#3  CAR     5     4     6     2     4.0 4.333333

# A tibble: 3 x 9
#  Team  stat1 stat2 stat3 stat4 NewCol1 NewCol2 NewCol3 NewCol4
#  <chr> <int> <int> <int> <int>   <dbl>   <dbl>   <dbl>   <dbl>
#1 ARI       3    NA     4     6     5      4.33    4.5      3.5
#2 BAL      NA     2    NA     1     1.5    1       1.5      2  
#3 CAR       5     4     6     2     4      4.33    3.67     5  

df1
#   Team stat1 stat2 stat3 stat4 NewCol1  NewCol2  NewCol3 NewCol4
#1  ARI     3    NA     4     6     5.0 4.333333 4.500000     3.5
#2  BAL    NA     2    NA     1     1.5 1.000000 1.500000     2.0
#3  CAR     5     4     6     2     4.0 4.333333 3.666667     5.0

#  Team stat1 stat2 stat3 stat4 NewCol1  NewCol2  NewCol3 NewCol4
#1  ARI     3    NA     4     6     5.0 4.333333 4.500000     3.5
#2  BAL    NA     2    NA     1     1.5 1.000000 1.500000     2.0
#3  CAR     5     4     6     2     4.0 4.333333 3.666667     5.0


或者在tidyverse中完全做到这一点
library(stringr)
map_dfc(nm1,  ~
    df1 %>% 
       select(starts_with('stat'), -.x) %>% 
       transmute(!! str_c('NewCol', readr::parse_number(.x)) := 
              rowMeans(., na.rm = TRUE))) %>% 
       bind_cols(df1, .)
#  Team stat1 stat2 stat3 stat4 NewCol1  NewCol2  NewCol3 NewCol4
#1  ARI     3    NA     4     6     5.0 4.333333 4.500000     3.5
#2  BAL    NA     2    NA     1     1.5 1.000000 1.500000     2.0
#3  CAR     5     4     6     2     4.0 4.333333 3.666667     5.0


或者在整个

df1 %>% 
   rowwise %>%
   mutate(NewCol1 = mean(c_across(c(where(is.numeric), -stat1)), na.rm = TRUE), 
   NewCol2 = mean(c_across(c(starts_with('stat'), -stat2)), na.rm = TRUE), 
   NewCol3 = mean(c_across(c(starts_with('stat'), -stat3)), na.rm = TRUE), 
   NewCol4 = mean(c_across(c(starts_with('stat'), -stat4)), na.rm = TRUE)) %>%
   ungroup

map_dfc(nm1,  ~
     df1 %>% 
        select(starts_with('stat'), -.x) %>% rowwise %>%
        transmute(!! str_c('NewCol', readr::parse_number(.x)) :=   mean(c_across(everything()), na.rm = TRUE))) %>%
        ungroup %>%
    bind_cols(df1, .)

-输出
#  Team stat1 stat2 stat3 stat4 NewCol1  NewCol2
#1  ARI     3    NA     4     6     5.0 4.333333
#2  BAL    NA     2    NA     1     1.5 1.000000
#3  CAR     5     4     6     2     4.0 4.333333

# A tibble: 3 x 9
#  Team  stat1 stat2 stat3 stat4 NewCol1 NewCol2 NewCol3 NewCol4
#  <chr> <int> <int> <int> <int>   <dbl>   <dbl>   <dbl>   <dbl>
#1 ARI       3    NA     4     6     5      4.33    4.5      3.5
#2 BAL      NA     2    NA     1     1.5    1       1.5      2  
#3 CAR       5     4     6     2     4      4.33    3.67     5  

df1
#   Team stat1 stat2 stat3 stat4 NewCol1  NewCol2  NewCol3 NewCol4
#1  ARI     3    NA     4     6     5.0 4.333333 4.500000     3.5
#2  BAL    NA     2    NA     1     1.5 1.000000 1.500000     2.0
#3  CAR     5     4     6     2     4.0 4.333333 3.666667     5.0

#  Team stat1 stat2 stat3 stat4 NewCol1  NewCol2  NewCol3 NewCol4
#1  ARI     3    NA     4     6     5.0 4.333333 4.500000     3.5
#2  BAL    NA     2    NA     1     1.5 1.000000 1.500000     2.0
#3  CAR     5     4     6     2     4.0 4.333333 3.666667     5.0


或使用base R

df1[paste0("NewCol", seq_along(nm1))] <- lapply(nm1,
            function(x) rowMeans(df1[setdiff(names(df1)[-1], x)],  na.rm = TRUE))

df1[paste0（“NewCol”，seq_-on（nm1））]我们可以使用rowMeans
在之后选择
将相关列显示出来
library(dplyr)
df1 %>%
      mutate(NewCol1 = rowMeans(select(., -Team, -stat1), na.rm = TRUE),
        NewCol2 = rowMeans(select(., -Team, -stat2), na.rm = TRUE))

-输出
#  Team stat1 stat2 stat3 stat4 NewCol1  NewCol2
#1  ARI     3    NA     4     6     5.0 4.333333
#2  BAL    NA     2    NA     1     1.5 1.000000
#3  CAR     5     4     6     2     4.0 4.333333

# A tibble: 3 x 9
#  Team  stat1 stat2 stat3 stat4 NewCol1 NewCol2 NewCol3 NewCol4
#  <chr> <int> <int> <int> <int>   <dbl>   <dbl>   <dbl>   <dbl>
#1 ARI       3    NA     4     6     5      4.33    4.5      3.5
#2 BAL      NA     2    NA     1     1.5    1       1.5      2  
#3 CAR       5     4     6     2     4      4.33    3.67     5  

df1
#   Team stat1 stat2 stat3 stat4 NewCol1  NewCol2  NewCol3 NewCol4
#1  ARI     3    NA     4     6     5.0 4.333333 4.500000     3.5
#2  BAL    NA     2    NA     1     1.5 1.000000 1.500000     2.0
#3  CAR     5     4     6     2     4.0 4.333333 3.666667     5.0

#  Team stat1 stat2 stat3 stat4 NewCol1  NewCol2  NewCol3 NewCol4
#1  ARI     3    NA     4     6     5.0 4.333333 4.500000     3.5
#2  BAL    NA     2    NA     1     1.5 1.000000 1.500000     2.0
#3  CAR     5     4     6     2     4.0 4.333333 3.666667     5.0

或另一个带有c_的选项
df1 %>% 
   rowwise %>%
   mutate(NewCol1 = mean(c_across(c(where(is.numeric), -stat1)), na.rm = TRUE), 
   NewCol2 = mean(c_across(c(starts_with('stat'), -stat2)), na.rm = TRUE), 
   NewCol3 = mean(c_across(c(starts_with('stat'), -stat3)), na.rm = TRUE), 
   NewCol4 = mean(c_across(c(starts_with('stat'), -stat4)), na.rm = TRUE)) %>%
   ungroup

map_dfc(nm1,  ~
     df1 %>% 
        select(starts_with('stat'), -.x) %>% rowwise %>%
        transmute(!! str_c('NewCol', readr::parse_number(.x)) :=   mean(c_across(everything()), na.rm = TRUE))) %>%
        ungroup %>%
    bind_cols(df1, .)

-输出
#  Team stat1 stat2 stat3 stat4 NewCol1  NewCol2
#1  ARI     3    NA     4     6     5.0 4.333333
#2  BAL    NA     2    NA     1     1.5 1.000000
#3  CAR     5     4     6     2     4.0 4.333333

# A tibble: 3 x 9
#  Team  stat1 stat2 stat3 stat4 NewCol1 NewCol2 NewCol3 NewCol4
#  <chr> <int> <int> <int> <int>   <dbl>   <dbl>   <dbl>   <dbl>
#1 ARI       3    NA     4     6     5      4.33    4.5      3.5
#2 BAL      NA     2    NA     1     1.5    1       1.5      2  
#3 CAR       5     4     6     2     4      4.33    3.67     5  

df1
#   Team stat1 stat2 stat3 stat4 NewCol1  NewCol2  NewCol3 NewCol4
#1  ARI     3    NA     4     6     5.0 4.333333 4.500000     3.5
#2  BAL    NA     2    NA     1     1.5 1.000000 1.500000     2.0
#3  CAR     5     4     6     2     4.0 4.333333 3.666667     5.0

#  Team stat1 stat2 stat3 stat4 NewCol1  NewCol2  NewCol3 NewCol4
#1  ARI     3    NA     4     6     5.0 4.333333 4.500000     3.5
#2  BAL    NA     2    NA     1     1.5 1.000000 1.500000     2.0
#3  CAR     5     4     6     2     4.0 4.333333 3.666667     5.0


或者在tidyverse中完全做到这一点
library(stringr)
map_dfc(nm1,  ~
    df1 %>% 
       select(starts_with('stat'), -.x) %>% 
       transmute(!! str_c('NewCol', readr::parse_number(.x)) := 
              rowMeans(., na.rm = TRUE))) %>% 
       bind_cols(df1, .)
#  Team stat1 stat2 stat3 stat4 NewCol1  NewCol2  NewCol3 NewCol4
#1  ARI     3    NA     4     6     5.0 4.333333 4.500000     3.5
#2  BAL    NA     2    NA     1     1.5 1.000000 1.500000     2.0
#3  CAR     5     4     6     2     4.0 4.333333 3.666667     5.0


或者在整个

df1 %>% 
   rowwise %>%
   mutate(NewCol1 = mean(c_across(c(where(is.numeric), -stat1)), na.rm = TRUE), 
   NewCol2 = mean(c_across(c(starts_with('stat'), -stat2)), na.rm = TRUE), 
   NewCol3 = mean(c_across(c(starts_with('stat'), -stat3)), na.rm = TRUE), 
   NewCol4 = mean(c_across(c(starts_with('stat'), -stat4)), na.rm = TRUE)) %>%
   ungroup

map_dfc(nm1,  ~
     df1 %>% 
        select(starts_with('stat'), -.x) %>% rowwise %>%
        transmute(!! str_c('NewCol', readr::parse_number(.x)) :=   mean(c_across(everything()), na.rm = TRUE))) %>%
        ungroup %>%
    bind_cols(df1, .)

-输出
#  Team stat1 stat2 stat3 stat4 NewCol1  NewCol2
#1  ARI     3    NA     4     6     5.0 4.333333
#2  BAL    NA     2    NA     1     1.5 1.000000
#3  CAR     5     4     6     2     4.0 4.333333

# A tibble: 3 x 9
#  Team  stat1 stat2 stat3 stat4 NewCol1 NewCol2 NewCol3 NewCol4
#  <chr> <int> <int> <int> <int>   <dbl>   <dbl>   <dbl>   <dbl>
#1 ARI       3    NA     4     6     5      4.33    4.5      3.5
#2 BAL      NA     2    NA     1     1.5    1       1.5      2  
#3 CAR       5     4     6     2     4      4.33    3.67     5  

df1
#   Team stat1 stat2 stat3 stat4 NewCol1  NewCol2  NewCol3 NewCol4
#1  ARI     3    NA     4     6     5.0 4.333333 4.500000     3.5
#2  BAL    NA     2    NA     1     1.5 1.000000 1.500000     2.0
#3  CAR     5     4     6     2     4.0 4.333333 3.666667     5.0

#  Team stat1 stat2 stat3 stat4 NewCol1  NewCol2  NewCol3 NewCol4
#1  ARI     3    NA     4     6     5.0 4.333333 4.500000     3.5
#2  BAL    NA     2    NA     1     1.5 1.000000 1.500000     2.0
#3  CAR     5     4     6     2     4.0 4.333333 3.666667     5.0


或使用base R

df1[paste0("NewCol", seq_along(nm1))] <- lapply(nm1,
            function(x) rowMeans(df1[setdiff(names(df1)[-1], x)],  na.rm = TRUE))

df1[paste0（“NewCol”，seq_-on（nm1））]在底端R中，您可以找到包含'stat'
的列，并将其从lappy
中逐个移除，并对其进行行平均
cols <- grep('stat', names(df))
new_cols <- paste0('remove_', names(df)[cols])
df[new_cols] <- lapply(cols, function(x) rowMeans(df[, -c(1, x)], na.rm = TRUE))
df

#  Team stat1 stat2 stat3 stat4 remove_stat1 remove_stat2 remove_stat3 remove_stat4
#1  ARI     3    NA     4     6          5.0     4.333333     4.500000          3.5
#2  BAL    NA     2    NA     1          1.5     1.000000     1.500000          2.0
#3  CAR     5     4     6     2          4.0     4.333333     3.666667          5.0

cols在底端R中，您可以找到其中包含'stat'
的列，然后将其从lapply
中逐个删除，并对其进行行平均
cols <- grep('stat', names(df))
new_cols <- paste0('remove_', names(df)[cols])
df[new_cols] <- lapply(cols, function(x) rowMeans(df[, -c(1, x)], na.rm = TRUE))
df

#  Team stat1 stat2 stat3 stat4 remove_stat1 remove_stat2 remove_stat3 remove_stat4
#1  ARI     3    NA     4     6          5.0     4.333333     4.500000          3.5
#2  BAL    NA     2    NA     1          1.5     1.000000     1.500000          2.0
#3  CAR     5     4     6     2          4.0     4.333333     3.666667          5.0

cols这非常有用；非常感谢。让其他示例正常工作，但我尝试使用您的示例自动执行此操作（第3个和第4个示例），但出现了一个错误，即x必须是数字-此代码示例中的“x”是否应该替换为某个内容？@ChazC这一定是您的数据列类型的问题。这非常有用；非常感谢。让其他示例正常工作，但我尝试使用您的示例自动执行此操作（第3个和第4个示例），但出现了一个错误，即x必须是数字-此代码示例中的“x”是否应该替换为某个内容？@ChazC这一定是您的数据列类型的问题。谢谢您的回答。我得到了这个错误：rowMeans中的错误（df[，-c（1，x）]，na.rm=TRUE）：“x”必须是数字-这里有什么建议吗？@ChazCTeam
列是数据帧中的第一列吗？除了Team
之外，您还有其他非数字列吗？是的，通过将代码更改为rowMeans（df[，-c（1:5，x）]，na.rm=TRUE来修复它-再次感谢！谢谢你的回答。我得到了这个错误：rowMeans中的错误（df[，-c（1，x）]，na.rm=TRUE）：“x”必须是数字-这里有什么建议吗？@ChazCTeam
列是数据帧中的第一列吗？除了Team
之外，您还有其他非数字列吗？是的，通过将代码更改为rowMeans（df[，-c（1:5，x）]，na.rm=TRUE来修复它-再次感谢！