R 为案例聚合变量
各位飞越者你好 目标是在一个大数据集上处理数据操作的某些步骤。在第一步中,代表特定信息的不同情况的特定变量应针对每种情况进行聚合。总有5个变量要聚合R 为案例聚合变量,r,dplyr,R,Dplyr,各位飞越者你好 目标是在一个大数据集上处理数据操作的某些步骤。在第一步中,代表特定信息的不同情况的特定变量应针对每种情况进行聚合。总有5个变量要聚合 groups <- split(colnames(d), gsub("\\d", "", colnames(d))) groups $a [1] "a1" "a2" "a3" "a4" "a5" $b
groups <- split(colnames(d), gsub("\\d", "", colnames(d)))
groups
$a
[1] "a1" "a2" "a3" "a4" "a5"
$b
[1] "b1" "b2" "b3" "b4" "b5"
现在,数据集如下所示:
a1 a2 a3 a4 a5 b1 b2 b3 b4 b5 ... xyz5 A B C
case1 3 4 7 9 6 21 13 4 1 7 8
case2 9 12 8 17 25 31 7 2 7 6
case3 5 3 11 10 32 19 13 5 1 6 8
...
应该是这样的
mean-a mean-b ...mean-xyz A B C
case1 5,8 17 6,4 1 7 8
case2 9,6 24,3 8,3 2 7 6
case3 7,25 21,3 7 1 6 8
...
我不确定构建函数或使用dplyr
包中的cross
函数是否正确,因为大约有2000个变量需要聚合
任何帮助都将不胜感激
提前多谢 示例数据:
您还可以使用以下解决方案:
library(dplyr)
library(stringr)
library(purrr)
# First we extract the unique letter values of column names
letters <- unique(str_remove(names(df), "\\d"))
[1] "a" "b"
# Then we iterate over each unique values and extract the columns that contain that unique letter
letters %>%
map(~ df %>%
select(contains(.x)) %>%
rowwise() %>%
mutate("mean_{.x}" := mean(c_across(contains(.x)), na.rm = TRUE))) %>%
bind_cols() %>%
relocate(contains("mean"), .after = last_col())
# A tibble: 3 x 12
# Rowwise:
a1 a2 a3 a4 a5 b1 b2 b3 b4 b5 mean_a mean_b
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 3 4 7 9 6 21 13 7 8 4 5.8 10.6
2 9 12 8 17 25 31 4 2 2 7 14.2 9.2
3 5 3 11 10 32 19 13 2 2 5 12.2 8.2
库(dplyr)
图书馆(stringr)
图书馆(purrr)
#首先,我们提取列名的唯一字母值
字母%
地图(~df%>%
选择(包含(.x))%>%
行()
mutate(“mean_{.x}”):=mean(c_横跨(包含(.x)),na.rm=TRUE))%>%
绑定列()%>%
重新定位(包含(“平均值”),.after=last_col()
#一个tibble:3x12
#顺时针:
a1 a2 a3 a4 a5 b1 b2 b3 b4 b5平均值a平均值b
1 3 4 7 9 6 21 13 7 8 4 5.8 10.6
2 9 12 8 17 25 31 4 2 2 7 14.2 9.2
3 5 3 11 10 32 19 13 2 2 5 12.2 8.2
数据
df <- tribble(
~a1, ~a2, ~a3, ~a4, ~a5, ~b1, ~b2, ~b3, ~b4, ~b5,
3, 4, 7, 9, 6, 21, 13, 7, 8, 4,
9, 12, 8, 17, 25, 31, 4, 2, 2, 7,
5, 3, 11, 10, 32, 19, 13, 2, 2, 5
)
df如果您创建一个小的可复制示例以及预期的输出,那么会更容易提供帮助。了解。
library(dplyr)
library(stringr)
library(purrr)
# First we extract the unique letter values of column names
letters <- unique(str_remove(names(df), "\\d"))
[1] "a" "b"
# Then we iterate over each unique values and extract the columns that contain that unique letter
letters %>%
map(~ df %>%
select(contains(.x)) %>%
rowwise() %>%
mutate("mean_{.x}" := mean(c_across(contains(.x)), na.rm = TRUE))) %>%
bind_cols() %>%
relocate(contains("mean"), .after = last_col())
# A tibble: 3 x 12
# Rowwise:
a1 a2 a3 a4 a5 b1 b2 b3 b4 b5 mean_a mean_b
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 3 4 7 9 6 21 13 7 8 4 5.8 10.6
2 9 12 8 17 25 31 4 2 2 7 14.2 9.2
3 5 3 11 10 32 19 13 2 2 5 12.2 8.2
df <- tribble(
~a1, ~a2, ~a3, ~a4, ~a5, ~b1, ~b2, ~b3, ~b4, ~b5,
3, 4, 7, 9, 6, 21, 13, 7, 8, 4,
9, 12, 8, 17, 25, 31, 4, 2, 2, 7,
5, 3, 11, 10, 32, 19, 13, 2, 2, 5
)