Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/83.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 为案例聚合变量_R_Dplyr - Fatal编程技术网

R 为案例聚合变量

R 为案例聚合变量,r,dplyr,R,Dplyr,各位飞越者你好 目标是在一个大数据集上处理数据操作的某些步骤。在第一步中,代表特定信息的不同情况的特定变量应针对每种情况进行聚合。总有5个变量要聚合 groups <- split(colnames(d), gsub("\\d", "", colnames(d))) groups $a [1] "a1" "a2" "a3" "a4" "a5" $b

各位飞越者你好

目标是在一个大数据集上处理数据操作的某些步骤。在第一步中,代表特定信息的不同情况的特定变量应针对每种情况进行聚合。总有5个变量要聚合

groups <- split(colnames(d), gsub("\\d", "", colnames(d)))
groups
$a
[1] "a1" "a2" "a3" "a4" "a5"

$b
[1] "b1" "b2" "b3" "b4" "b5"
现在,数据集如下所示:

      a1 a2 a3 a4 a5 b1 b2 b3 b4 b5 ... xyz5 A B C 
case1 3  4  7  9  6  21 13              4    1 7 8 
case2 9  12 8        17 25 31           7    2 7 6
case3 5  3  11 10    32 19 13           5    1 6 8
...

应该是这样的

      mean-a  mean-b ...mean-xyz A B C 
case1 5,8     17        6,4      1 7 8 
case2 9,6     24,3      8,3      2 7 6
case3 7,25    21,3      7        1 6 8
...

我不确定构建函数或使用
dplyr
包中的
cross
函数是否正确,因为大约有2000个变量需要聚合

任何帮助都将不胜感激

提前多谢

示例数据:
您还可以使用以下解决方案:

library(dplyr)
library(stringr)
library(purrr)

# First we extract the unique letter values of column names
letters <- unique(str_remove(names(df), "\\d"))
[1] "a" "b"   
  

# Then we iterate over each unique values and extract the columns that contain that unique letter

letters %>%
  map(~ df %>% 
        select(contains(.x)) %>% 
        rowwise() %>%
        mutate("mean_{.x}" := mean(c_across(contains(.x)), na.rm = TRUE))) %>%
  bind_cols() %>%
  relocate(contains("mean"), .after = last_col())


# A tibble: 3 x 12
# Rowwise: 
     a1    a2    a3    a4    a5    b1    b2    b3    b4    b5 mean_a mean_b
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>
1     3     4     7     9     6    21    13     7     8     4    5.8   10.6
2     9    12     8    17    25    31     4     2     2     7   14.2    9.2
3     5     3    11    10    32    19    13     2     2     5   12.2    8.2
库(dplyr)
图书馆(stringr)
图书馆(purrr)
#首先,我们提取列名的唯一字母值
字母%
地图(~df%>%
选择(包含(.x))%>%
行()
mutate(“mean_{.x}”):=mean(c_横跨(包含(.x)),na.rm=TRUE))%>%
绑定列()%>%
重新定位(包含(“平均值”),.after=last_col()
#一个tibble:3x12
#顺时针:
a1 a2 a3 a4 a5 b1 b2 b3 b4 b5平均值a平均值b
1     3     4     7     9     6    21    13     7     8     4    5.8   10.6
2     9    12     8    17    25    31     4     2     2     7   14.2    9.2
3     5     3    11    10    32    19    13     2     2     5   12.2    8.2
数据

df <- tribble(
  ~a1, ~a2, ~a3, ~a4, ~a5, ~b1, ~b2, ~b3, ~b4, ~b5,
 3, 4, 7, 9, 6, 21, 13, 7, 8, 4, 
 9, 12, 8, 17, 25, 31, 4, 2, 2, 7,
 5, 3, 11, 10, 32, 19, 13, 2, 2, 5
)

df如果您创建一个小的可复制示例以及预期的输出,那么会更容易提供帮助。了解。
library(dplyr)
library(stringr)
library(purrr)

# First we extract the unique letter values of column names
letters <- unique(str_remove(names(df), "\\d"))
[1] "a" "b"   
  

# Then we iterate over each unique values and extract the columns that contain that unique letter

letters %>%
  map(~ df %>% 
        select(contains(.x)) %>% 
        rowwise() %>%
        mutate("mean_{.x}" := mean(c_across(contains(.x)), na.rm = TRUE))) %>%
  bind_cols() %>%
  relocate(contains("mean"), .after = last_col())


# A tibble: 3 x 12
# Rowwise: 
     a1    a2    a3    a4    a5    b1    b2    b3    b4    b5 mean_a mean_b
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>
1     3     4     7     9     6    21    13     7     8     4    5.8   10.6
2     9    12     8    17    25    31     4     2     2     7   14.2    9.2
3     5     3    11    10    32    19    13     2     2     5   12.2    8.2
df <- tribble(
  ~a1, ~a2, ~a3, ~a4, ~a5, ~b1, ~b2, ~b3, ~b4, ~b5,
 3, 4, 7, 9, 6, 21, 13, 7, 8, 4, 
 9, 12, 8, 17, 25, 31, 4, 2, 2, 7,
 5, 3, 11, 10, 32, 19, 13, 2, 2, 5
)