R 如何按行计算select列的平均值_R_Dplyr_Grouping_Tidyr_String Matching

R 如何按行计算select列的平均值

R 如何按行计算select列的平均值,r,dplyr,grouping,tidyr,string-matching,R,Dplyr,Grouping,Tidyr,String Matching,我的tibble插入到末尾我有一个具有不同类型列的数据框（它们是不同的重复）。前四列应保持原样；那些以（）开头的（（我这样写是因为这个函数可能有用）“树冠”应该总结成的平均值，以及“林下”（写为“低于”）和“胸径”（tdbh）——包括na.rm=TRUE。我该怎么做？我如何总结这样的专栏数据（总目）：所需输出从以下内容开始： # A tibble: 2 x 7 Site Classification transect point canopy understory tdbh &

我的

tibble

插入到末尾

我有一个具有不同类型列的数据框（它们是不同的重复）。前四列应保持原样；那些

以（）开头的（

（我这样写是因为这个函数可能有用）“树冠”应该总结成

的平均值，以及“林下”（写为“低于”）和“胸径”（tdbh）——包括na.rm=TRUE
。我该怎么做？我如何总结这样的专栏
数据（总目）：
所需输出从以下内容开始：
# A tibble: 2 x 7
  Site  Classification transect point canopy understory tdbh 
  <chr> <chr>             <dbl> <dbl>  <dbl> <chr>      <chr>
1 Bala  Primary forest        1     1    5.4 ...        ...  
2 Bala  Primary forest        1     2    3.8 ...        ...  

#一个tible:2 x 7
立地分类样带点冠层林下tdbh
1巴拉原始森林1 15.4。。。
2巴拉原始森林1 2 3.8。。。

我希望这可能只使用base R和tidyverse
中的任何东西（可能是dplyr
和/或tidyr
）
编辑：我知道mutate（canopy=mean（c（canopy1，canopy2，…））
应该可以正常工作，但这有两个问题：首先，它添加了一列而不是替换。这是一场灾难，但并不可怕。然而，我必须列出所有的东西。这是一个低效答案的标志。
你就不能这样做：
df$canopy%
变异（树冠=行意味着（选择（，以“树冠”）开头）%>%
选择（-（5:24））
#>#tibble:6 x 5
#>立地分类样点冠层
#>                   
#>1巴拉原始森林1 15.4
#>2巴拉原始森林1 2 3.8
#>3巴拉原始森林1 3.6
#>4巴拉原始森林1 4 5.2
#>5巴拉原始森林1 5 3
#>6巴拉原始森林2 1 4.2
使用sapply

cbind(df[1:4], sapply(c("canopy", "under", "dbh"), function(x) 
  rowMeans(df[grep(x, names(df))], na.rm=TRUE)))
#   Site Classification transect point canopy under      dbh
# 1 Bala Primary forest        1     1    5.4 12.00 90.00000
# 2 Bala Primary forest        1     2    3.8  7.75 90.00000
# 3 Bala Primary forest        1     3    3.6  2.00 81.66667
# 4 Bala Primary forest        1     4    5.2  8.00 60.00000
# 5 Bala Primary forest        1     5    3.0  3.50 90.00000
# 6 Bala Primary forest        2     1    4.2 12.50 88.57143

使用tidyverse
软件包的解决方案。我们可以用目标字符串创建一个向量，然后使用map\u dfc
和mutate
动态计算平均值。然后，我们可以将计算出的列合并到原始数据帧中
library(tidyverse)

# Set the target column names
target <- c("canopy", "under", "dbh")

# 
dat2 <- map_dfc(target, function(x){
  temp <- dat %>%
    mutate("{x}" := rowMeans(select(., contains(x)), na.rm = TRUE), .keep = "none")
})

dat3 <- dat %>% 
  select(-contains(target)) %>%
  bind_cols(dat2)

print(dat3)
# # A tibble: 6 x 8
#   Site  Classification transect point Numtrees canopy under   dbh
#   <chr> <chr>             <dbl> <dbl>    <dbl>  <dbl> <dbl> <dbl>
# 1 Bala  Primary forest        1     1        4    5.4 12     90  
# 2 Bala  Primary forest        1     2        3    3.8  7.75  90  
# 3 Bala  Primary forest        1     3        6    3.6  2     81.7
# 4 Bala  Primary forest        1     4        1    5.2  8     60  
# 5 Bala  Primary forest        1     5        3    3    3.5   90  
# 6 Bala  Primary forest        2     1        7    4.2 12.5   88.6

库（tidyverse）
#设置目标列名
目标
library(tidyverse)

# Set the target column names
target <- c("canopy", "under", "dbh")

# 
dat2 <- map_dfc(target, function(x){
  temp <- dat %>%
    mutate("{x}" := rowMeans(select(., contains(x)), na.rm = TRUE), .keep = "none")
})

dat3 <- dat %>% 
  select(-contains(target)) %>%
  bind_cols(dat2)

print(dat3)
# # A tibble: 6 x 8
#   Site  Classification transect point Numtrees canopy under   dbh
#   <chr> <chr>             <dbl> <dbl>    <dbl>  <dbl> <dbl> <dbl>
# 1 Bala  Primary forest        1     1        4    5.4 12     90  
# 2 Bala  Primary forest        1     2        3    3.8  7.75  90  
# 3 Bala  Primary forest        1     3        6    3.6  2     81.7
# 4 Bala  Primary forest        1     4        1    5.2  8     60  
# 5 Bala  Primary forest        1     5        3    3    3.5   90  
# 6 Bala  Primary forest        2     1        7    4.2 12.5   88.6