R 为案例聚合变量_R_Dplyr - Fatal编程技术网

R 为案例聚合变量

R 为案例聚合变量,r,dplyr,R,Dplyr,各位飞越者你好目标是在一个大数据集上处理数据操作的某些步骤。在第一步中，代表特定信息的不同情况的特定变量应针对每种情况进行聚合。总有5个变量要聚合 groups <- split(colnames(d), gsub("\\d", "", colnames(d))) groups $a [1] "a1" "a2" "a3" "a4" "a5" $b

各位飞越者你好

目标是在一个大数据集上处理数据操作的某些步骤。在第一步中，代表特定信息的不同情况的特定变量应针对每种情况进行聚合。总有5个变量要聚合

groups <- split(colnames(d), gsub("\\d", "", colnames(d)))
groups
$a
[1] "a1" "a2" "a3" "a4" "a5"

$b
[1] "b1" "b2" "b3" "b4" "b5"

现在，数据集如下所示：

      a1 a2 a3 a4 a5 b1 b2 b3 b4 b5 ... xyz5 A B C 
case1 3  4  7  9  6  21 13              4    1 7 8 
case2 9  12 8        17 25 31           7    2 7 6
case3 5  3  11 10    32 19 13           5    1 6 8
...

应该是这样的

      mean-a  mean-b ...mean-xyz A B C 
case1 5,8     17        6,4      1 7 8 
case2 9,6     24,3      8,3      2 7 6
case3 7,25    21,3      7        1 6 8
...

我不确定构建函数或使用

dplyr

包中的

cross

函数是否正确，因为大约有2000个变量需要聚合

任何帮助都将不胜感激

提前多谢

示例数据：

您还可以使用以下解决方案：

library(dplyr)
library(stringr)
library(purrr)

# First we extract the unique letter values of column names
letters <- unique(str_remove(names(df), "\\d"))
[1] "a" "b"   
  

# Then we iterate over each unique values and extract the columns that contain that unique letter

letters %>%
  map(~ df %>% 
        select(contains(.x)) %>% 
        rowwise() %>%
        mutate("mean_{.x}" := mean(c_across(contains(.x)), na.rm = TRUE))) %>%
  bind_cols() %>%
  relocate(contains("mean"), .after = last_col())


# A tibble: 3 x 12
# Rowwise: 
     a1    a2    a3    a4    a5    b1    b2    b3    b4    b5 mean_a mean_b
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>
1     3     4     7     9     6    21    13     7     8     4    5.8   10.6
2     9    12     8    17    25    31     4     2     2     7   14.2    9.2
3     5     3    11    10    32    19    13     2     2     5   12.2    8.2

库（dplyr）
图书馆（stringr）
图书馆（purrr）
#首先，我们提取列名的唯一字母值
字母%
地图（~df%>%
选择（包含（.x））%>%
行（）
mutate（“mean_{.x}”）：=mean（c_横跨（包含（.x）），na.rm=TRUE））%>%
绑定列（）%>%
重新定位（包含（“平均值”），.after=last_col（）
#一个tibble:3x12
#顺时针：
a1 a2 a3 a4 a5 b1 b2 b3 b4 b5平均值a平均值b
1     3     4     7     9     6    21    13     7     8     4    5.8   10.6
2     9    12     8    17    25    31     4     2     2     7   14.2    9.2
3     5     3    11    10    32    19    13     2     2     5   12.2    8.2

数据

df <- tribble(
  ~a1, ~a2, ~a3, ~a4, ~a5, ~b1, ~b2, ~b3, ~b4, ~b5,
 3, 4, 7, 9, 6, 21, 13, 7, 8, 4, 
 9, 12, 8, 17, 25, 31, 4, 2, 2, 7,
 5, 3, 11, 10, 32, 19, 13, 2, 2, 5
)

df如果您创建一个小的可复制示例以及预期的输出，那么会更容易提供帮助。了解。
library(dplyr)
library(stringr)
library(purrr)

# First we extract the unique letter values of column names
letters <- unique(str_remove(names(df), "\\d"))
[1] "a" "b"   
  

# Then we iterate over each unique values and extract the columns that contain that unique letter

letters %>%
  map(~ df %>% 
        select(contains(.x)) %>% 
        rowwise() %>%
        mutate("mean_{.x}" := mean(c_across(contains(.x)), na.rm = TRUE))) %>%
  bind_cols() %>%
  relocate(contains("mean"), .after = last_col())


# A tibble: 3 x 12
# Rowwise: 
     a1    a2    a3    a4    a5    b1    b2    b3    b4    b5 mean_a mean_b
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>
1     3     4     7     9     6    21    13     7     8     4    5.8   10.6
2     9    12     8    17    25    31     4     2     2     7   14.2    9.2
3     5     3    11    10    32    19    13     2     2     5   12.2    8.2

df <- tribble(
  ~a1, ~a2, ~a3, ~a4, ~a5, ~b1, ~b2, ~b3, ~b4, ~b5,
 3, 4, 7, 9, 6, 21, 13, 7, 8, 4, 
 9, 12, 8, 17, 25, 31, 4, 2, 2, 7,
 5, 3, 11, 10, 32, 19, 13, 2, 2, 5
)