R 是否要对基于列名将列平均在一起,但同时将某些列排除在计算之外的列进行变异?
在数据框中工作时,我希望使用mutate创建一个新列,该列根据列名将每行中除一列之外的所有列平均起来。我需要能够在每次使用mutate时排除某一列,并且我希望计算也跳过NA值 我的DF的简单版本:R 是否要对基于列名将列平均在一起,但同时将某些列排除在计算之外的列进行变异?,r,dataframe,dplyr,R,Dataframe,Dplyr,在数据框中工作时,我希望使用mutate创建一个新列,该列根据列名将每行中除一列之外的所有列平均起来。我需要能够在每次使用mutate时排除某一列,并且我希望计算也跳过NA值 我的DF的简单版本: Team stat1 stat2 stat3 stat4 1 ARI 3 NA 4 6 2 BAL NA 2 NA 1 3 CAR 5 4 6 2 通过计算统计列的平均值创建的NewCol1,不包括
Team stat1 stat2 stat3 stat4
1 ARI 3 NA 4 6
2 BAL NA 2 NA 1
3 CAR 5 4 6 2
通过计算统计列的平均值创建的NewCol1,不包括“stat 1”列和NA值。
与NewCol2相同,计算平均值不包括“stat2”列:
Team stat1 stat2 stat3 stat4 NewCol1 NewCol2
1 ARI 3 NA 4 6 5.0 4.33
2 BAL NA 2 NA 1 1.5 1.00
3 CAR 5 4 6 2 4.0 4.33
如果我想创建新的列,对每个属性执行相同的操作,那么最有效的方法是什么?DF有10个stat列,每个列都有相同的名称,然后在每个名称后面都有一个数字。我在想starts_with()函数在这里可能对rowMeans()有用,但在每次排除特定列的同时,我仍在努力实现它。我们可以在
选择相关列后使用rowMeans
library(dplyr)
df1 %>%
mutate(NewCol1 = rowMeans(select(., -Team, -stat1), na.rm = TRUE),
NewCol2 = rowMeans(select(., -Team, -stat2), na.rm = TRUE))
-输出
# Team stat1 stat2 stat3 stat4 NewCol1 NewCol2
#1 ARI 3 NA 4 6 5.0 4.333333
#2 BAL NA 2 NA 1 1.5 1.000000
#3 CAR 5 4 6 2 4.0 4.333333
# A tibble: 3 x 9
# Team stat1 stat2 stat3 stat4 NewCol1 NewCol2 NewCol3 NewCol4
# <chr> <int> <int> <int> <int> <dbl> <dbl> <dbl> <dbl>
#1 ARI 3 NA 4 6 5 4.33 4.5 3.5
#2 BAL NA 2 NA 1 1.5 1 1.5 2
#3 CAR 5 4 6 2 4 4.33 3.67 5
df1
# Team stat1 stat2 stat3 stat4 NewCol1 NewCol2 NewCol3 NewCol4
#1 ARI 3 NA 4 6 5.0 4.333333 4.500000 3.5
#2 BAL NA 2 NA 1 1.5 1.000000 1.500000 2.0
#3 CAR 5 4 6 2 4.0 4.333333 3.666667 5.0
# Team stat1 stat2 stat3 stat4 NewCol1 NewCol2 NewCol3 NewCol4
#1 ARI 3 NA 4 6 5.0 4.333333 4.500000 3.5
#2 BAL NA 2 NA 1 1.5 1.000000 1.500000 2.0
#3 CAR 5 4 6 2 4.0 4.333333 3.666667 5.0
或另一个带有c_的选项
df1 %>%
rowwise %>%
mutate(NewCol1 = mean(c_across(c(where(is.numeric), -stat1)), na.rm = TRUE),
NewCol2 = mean(c_across(c(starts_with('stat'), -stat2)), na.rm = TRUE),
NewCol3 = mean(c_across(c(starts_with('stat'), -stat3)), na.rm = TRUE),
NewCol4 = mean(c_across(c(starts_with('stat'), -stat4)), na.rm = TRUE)) %>%
ungroup
map_dfc(nm1, ~
df1 %>%
select(starts_with('stat'), -.x) %>% rowwise %>%
transmute(!! str_c('NewCol', readr::parse_number(.x)) := mean(c_across(everything()), na.rm = TRUE))) %>%
ungroup %>%
bind_cols(df1, .)
-输出
# Team stat1 stat2 stat3 stat4 NewCol1 NewCol2
#1 ARI 3 NA 4 6 5.0 4.333333
#2 BAL NA 2 NA 1 1.5 1.000000
#3 CAR 5 4 6 2 4.0 4.333333
# A tibble: 3 x 9
# Team stat1 stat2 stat3 stat4 NewCol1 NewCol2 NewCol3 NewCol4
# <chr> <int> <int> <int> <int> <dbl> <dbl> <dbl> <dbl>
#1 ARI 3 NA 4 6 5 4.33 4.5 3.5
#2 BAL NA 2 NA 1 1.5 1 1.5 2
#3 CAR 5 4 6 2 4 4.33 3.67 5
df1
# Team stat1 stat2 stat3 stat4 NewCol1 NewCol2 NewCol3 NewCol4
#1 ARI 3 NA 4 6 5.0 4.333333 4.500000 3.5
#2 BAL NA 2 NA 1 1.5 1.000000 1.500000 2.0
#3 CAR 5 4 6 2 4.0 4.333333 3.666667 5.0
# Team stat1 stat2 stat3 stat4 NewCol1 NewCol2 NewCol3 NewCol4
#1 ARI 3 NA 4 6 5.0 4.333333 4.500000 3.5
#2 BAL NA 2 NA 1 1.5 1.000000 1.500000 2.0
#3 CAR 5 4 6 2 4.0 4.333333 3.666667 5.0
或者在tidyverse中完全做到这一点
library(stringr)
map_dfc(nm1, ~
df1 %>%
select(starts_with('stat'), -.x) %>%
transmute(!! str_c('NewCol', readr::parse_number(.x)) :=
rowMeans(., na.rm = TRUE))) %>%
bind_cols(df1, .)
# Team stat1 stat2 stat3 stat4 NewCol1 NewCol2 NewCol3 NewCol4
#1 ARI 3 NA 4 6 5.0 4.333333 4.500000 3.5
#2 BAL NA 2 NA 1 1.5 1.000000 1.500000 2.0
#3 CAR 5 4 6 2 4.0 4.333333 3.666667 5.0
或者在整个
df1 %>%
rowwise %>%
mutate(NewCol1 = mean(c_across(c(where(is.numeric), -stat1)), na.rm = TRUE),
NewCol2 = mean(c_across(c(starts_with('stat'), -stat2)), na.rm = TRUE),
NewCol3 = mean(c_across(c(starts_with('stat'), -stat3)), na.rm = TRUE),
NewCol4 = mean(c_across(c(starts_with('stat'), -stat4)), na.rm = TRUE)) %>%
ungroup
map_dfc(nm1, ~
df1 %>%
select(starts_with('stat'), -.x) %>% rowwise %>%
transmute(!! str_c('NewCol', readr::parse_number(.x)) := mean(c_across(everything()), na.rm = TRUE))) %>%
ungroup %>%
bind_cols(df1, .)
-输出
# Team stat1 stat2 stat3 stat4 NewCol1 NewCol2
#1 ARI 3 NA 4 6 5.0 4.333333
#2 BAL NA 2 NA 1 1.5 1.000000
#3 CAR 5 4 6 2 4.0 4.333333
# A tibble: 3 x 9
# Team stat1 stat2 stat3 stat4 NewCol1 NewCol2 NewCol3 NewCol4
# <chr> <int> <int> <int> <int> <dbl> <dbl> <dbl> <dbl>
#1 ARI 3 NA 4 6 5 4.33 4.5 3.5
#2 BAL NA 2 NA 1 1.5 1 1.5 2
#3 CAR 5 4 6 2 4 4.33 3.67 5
df1
# Team stat1 stat2 stat3 stat4 NewCol1 NewCol2 NewCol3 NewCol4
#1 ARI 3 NA 4 6 5.0 4.333333 4.500000 3.5
#2 BAL NA 2 NA 1 1.5 1.000000 1.500000 2.0
#3 CAR 5 4 6 2 4.0 4.333333 3.666667 5.0
# Team stat1 stat2 stat3 stat4 NewCol1 NewCol2 NewCol3 NewCol4
#1 ARI 3 NA 4 6 5.0 4.333333 4.500000 3.5
#2 BAL NA 2 NA 1 1.5 1.000000 1.500000 2.0
#3 CAR 5 4 6 2 4.0 4.333333 3.666667 5.0
或使用base R
df1[paste0("NewCol", seq_along(nm1))] <- lapply(nm1,
function(x) rowMeans(df1[setdiff(names(df1)[-1], x)], na.rm = TRUE))
df1[paste0(“NewCol”,seq_-on(nm1))]我们可以使用rowMeans
在之后选择
将相关列显示出来
library(dplyr)
df1 %>%
mutate(NewCol1 = rowMeans(select(., -Team, -stat1), na.rm = TRUE),
NewCol2 = rowMeans(select(., -Team, -stat2), na.rm = TRUE))
-输出
# Team stat1 stat2 stat3 stat4 NewCol1 NewCol2
#1 ARI 3 NA 4 6 5.0 4.333333
#2 BAL NA 2 NA 1 1.5 1.000000
#3 CAR 5 4 6 2 4.0 4.333333
# A tibble: 3 x 9
# Team stat1 stat2 stat3 stat4 NewCol1 NewCol2 NewCol3 NewCol4
# <chr> <int> <int> <int> <int> <dbl> <dbl> <dbl> <dbl>
#1 ARI 3 NA 4 6 5 4.33 4.5 3.5
#2 BAL NA 2 NA 1 1.5 1 1.5 2
#3 CAR 5 4 6 2 4 4.33 3.67 5
df1
# Team stat1 stat2 stat3 stat4 NewCol1 NewCol2 NewCol3 NewCol4
#1 ARI 3 NA 4 6 5.0 4.333333 4.500000 3.5
#2 BAL NA 2 NA 1 1.5 1.000000 1.500000 2.0
#3 CAR 5 4 6 2 4.0 4.333333 3.666667 5.0
# Team stat1 stat2 stat3 stat4 NewCol1 NewCol2 NewCol3 NewCol4
#1 ARI 3 NA 4 6 5.0 4.333333 4.500000 3.5
#2 BAL NA 2 NA 1 1.5 1.000000 1.500000 2.0
#3 CAR 5 4 6 2 4.0 4.333333 3.666667 5.0
或另一个带有c_的选项
df1 %>%
rowwise %>%
mutate(NewCol1 = mean(c_across(c(where(is.numeric), -stat1)), na.rm = TRUE),
NewCol2 = mean(c_across(c(starts_with('stat'), -stat2)), na.rm = TRUE),
NewCol3 = mean(c_across(c(starts_with('stat'), -stat3)), na.rm = TRUE),
NewCol4 = mean(c_across(c(starts_with('stat'), -stat4)), na.rm = TRUE)) %>%
ungroup
map_dfc(nm1, ~
df1 %>%
select(starts_with('stat'), -.x) %>% rowwise %>%
transmute(!! str_c('NewCol', readr::parse_number(.x)) := mean(c_across(everything()), na.rm = TRUE))) %>%
ungroup %>%
bind_cols(df1, .)
-输出
# Team stat1 stat2 stat3 stat4 NewCol1 NewCol2
#1 ARI 3 NA 4 6 5.0 4.333333
#2 BAL NA 2 NA 1 1.5 1.000000
#3 CAR 5 4 6 2 4.0 4.333333
# A tibble: 3 x 9
# Team stat1 stat2 stat3 stat4 NewCol1 NewCol2 NewCol3 NewCol4
# <chr> <int> <int> <int> <int> <dbl> <dbl> <dbl> <dbl>
#1 ARI 3 NA 4 6 5 4.33 4.5 3.5
#2 BAL NA 2 NA 1 1.5 1 1.5 2
#3 CAR 5 4 6 2 4 4.33 3.67 5
df1
# Team stat1 stat2 stat3 stat4 NewCol1 NewCol2 NewCol3 NewCol4
#1 ARI 3 NA 4 6 5.0 4.333333 4.500000 3.5
#2 BAL NA 2 NA 1 1.5 1.000000 1.500000 2.0
#3 CAR 5 4 6 2 4.0 4.333333 3.666667 5.0
# Team stat1 stat2 stat3 stat4 NewCol1 NewCol2 NewCol3 NewCol4
#1 ARI 3 NA 4 6 5.0 4.333333 4.500000 3.5
#2 BAL NA 2 NA 1 1.5 1.000000 1.500000 2.0
#3 CAR 5 4 6 2 4.0 4.333333 3.666667 5.0
或者在tidyverse中完全做到这一点
library(stringr)
map_dfc(nm1, ~
df1 %>%
select(starts_with('stat'), -.x) %>%
transmute(!! str_c('NewCol', readr::parse_number(.x)) :=
rowMeans(., na.rm = TRUE))) %>%
bind_cols(df1, .)
# Team stat1 stat2 stat3 stat4 NewCol1 NewCol2 NewCol3 NewCol4
#1 ARI 3 NA 4 6 5.0 4.333333 4.500000 3.5
#2 BAL NA 2 NA 1 1.5 1.000000 1.500000 2.0
#3 CAR 5 4 6 2 4.0 4.333333 3.666667 5.0
或者在整个
df1 %>%
rowwise %>%
mutate(NewCol1 = mean(c_across(c(where(is.numeric), -stat1)), na.rm = TRUE),
NewCol2 = mean(c_across(c(starts_with('stat'), -stat2)), na.rm = TRUE),
NewCol3 = mean(c_across(c(starts_with('stat'), -stat3)), na.rm = TRUE),
NewCol4 = mean(c_across(c(starts_with('stat'), -stat4)), na.rm = TRUE)) %>%
ungroup
map_dfc(nm1, ~
df1 %>%
select(starts_with('stat'), -.x) %>% rowwise %>%
transmute(!! str_c('NewCol', readr::parse_number(.x)) := mean(c_across(everything()), na.rm = TRUE))) %>%
ungroup %>%
bind_cols(df1, .)
-输出
# Team stat1 stat2 stat3 stat4 NewCol1 NewCol2
#1 ARI 3 NA 4 6 5.0 4.333333
#2 BAL NA 2 NA 1 1.5 1.000000
#3 CAR 5 4 6 2 4.0 4.333333
# A tibble: 3 x 9
# Team stat1 stat2 stat3 stat4 NewCol1 NewCol2 NewCol3 NewCol4
# <chr> <int> <int> <int> <int> <dbl> <dbl> <dbl> <dbl>
#1 ARI 3 NA 4 6 5 4.33 4.5 3.5
#2 BAL NA 2 NA 1 1.5 1 1.5 2
#3 CAR 5 4 6 2 4 4.33 3.67 5
df1
# Team stat1 stat2 stat3 stat4 NewCol1 NewCol2 NewCol3 NewCol4
#1 ARI 3 NA 4 6 5.0 4.333333 4.500000 3.5
#2 BAL NA 2 NA 1 1.5 1.000000 1.500000 2.0
#3 CAR 5 4 6 2 4.0 4.333333 3.666667 5.0
# Team stat1 stat2 stat3 stat4 NewCol1 NewCol2 NewCol3 NewCol4
#1 ARI 3 NA 4 6 5.0 4.333333 4.500000 3.5
#2 BAL NA 2 NA 1 1.5 1.000000 1.500000 2.0
#3 CAR 5 4 6 2 4.0 4.333333 3.666667 5.0
或使用base R
df1[paste0("NewCol", seq_along(nm1))] <- lapply(nm1,
function(x) rowMeans(df1[setdiff(names(df1)[-1], x)], na.rm = TRUE))
df1[paste0(“NewCol”,seq_-on(nm1))]在底端R中,您可以找到包含'stat'
的列,并将其从lappy
中逐个移除,并对其进行行平均
cols <- grep('stat', names(df))
new_cols <- paste0('remove_', names(df)[cols])
df[new_cols] <- lapply(cols, function(x) rowMeans(df[, -c(1, x)], na.rm = TRUE))
df
# Team stat1 stat2 stat3 stat4 remove_stat1 remove_stat2 remove_stat3 remove_stat4
#1 ARI 3 NA 4 6 5.0 4.333333 4.500000 3.5
#2 BAL NA 2 NA 1 1.5 1.000000 1.500000 2.0
#3 CAR 5 4 6 2 4.0 4.333333 3.666667 5.0
cols在底端R中,您可以找到其中包含'stat'
的列,然后将其从lapply
中逐个删除,并对其进行行平均
cols <- grep('stat', names(df))
new_cols <- paste0('remove_', names(df)[cols])
df[new_cols] <- lapply(cols, function(x) rowMeans(df[, -c(1, x)], na.rm = TRUE))
df
# Team stat1 stat2 stat3 stat4 remove_stat1 remove_stat2 remove_stat3 remove_stat4
#1 ARI 3 NA 4 6 5.0 4.333333 4.500000 3.5
#2 BAL NA 2 NA 1 1.5 1.000000 1.500000 2.0
#3 CAR 5 4 6 2 4.0 4.333333 3.666667 5.0
cols这非常有用;非常感谢。让其他示例正常工作,但我尝试使用您的示例自动执行此操作(第3个和第4个示例),但出现了一个错误,即x必须是数字-此代码示例中的“x”是否应该替换为某个内容?@ChazC这一定是您的数据列类型的问题。这非常有用;非常感谢。让其他示例正常工作,但我尝试使用您的示例自动执行此操作(第3个和第4个示例),但出现了一个错误,即x必须是数字-此代码示例中的“x”是否应该替换为某个内容?@ChazC这一定是您的数据列类型的问题。谢谢您的回答。我得到了这个错误:rowMeans中的错误(df[,-c(1,x)],na.rm=TRUE):“x”必须是数字-这里有什么建议吗?@ChazCTeam
列是数据帧中的第一列吗?除了Team
之外,您还有其他非数字列吗?是的,通过将代码更改为rowMeans(df[,-c(1:5,x)],na.rm=TRUE来修复它-再次感谢!谢谢你的回答。我得到了这个错误:rowMeans中的错误(df[,-c(1,x)],na.rm=TRUE):“x”必须是数字-这里有什么建议吗?@ChazCTeam
列是数据帧中的第一列吗?除了Team
之外,您还有其他非数字列吗?是的,通过将代码更改为rowMeans(df[,-c(1:5,x)],na.rm=TRUE来修复它-再次感谢!