Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/73.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
通过循环匹配列名中的字符串模式,并将其作为新列追加到dataframe_R - Fatal编程技术网

通过循环匹配列名中的字符串模式,并将其作为新列追加到dataframe

通过循环匹配列名中的字符串模式,并将其作为新列追加到dataframe,r,R,我有一个列名称如下的数据框: abc_alpha = c(1,2,3,4) abc_beta = c(5,6,7,8) abc_char = c(9,10,11,12) xyz_alpha = c(4,3,2,1) xyz_beta = c(8,7,6,5) xyz_char = c(12,11,10,9) 和我的数据帧(df): 我希望循环遍历列并匹配具有相同字符串结尾的列(在下划线之后),取两个匹配列的平均值,并将其作为新变量附加到数据帧的结尾(新变量的列名称将是下划线之后的匹配字符串)。

我有一个列名称如下的数据框:

abc_alpha = c(1,2,3,4)
abc_beta = c(5,6,7,8)
abc_char = c(9,10,11,12)
xyz_alpha = c(4,3,2,1)
xyz_beta = c(8,7,6,5)
xyz_char = c(12,11,10,9)
和我的数据帧(df):

我希望循环遍历列并匹配具有相同字符串结尾的列(在下划线之后),取两个匹配列的平均值,并将其作为新变量附加到数据帧的结尾(新变量的列名称将是下划线之后的匹配字符串)。我想使用循环而不是硬编码列名,因为实际数据集有太多的列

预期产出将是:

abc_alpha abc_beta abc_char xyz_alpha xyz_beta xyz_char alpha beta char
   1         5        9        4         8       12      2.5  6.5  10.5
   2         6        10       3         7       11      2.5  6.5  10.5
   3         7        11       2         6       10      2.5  6.5  10.5
   4         8        12       1         5       9       2.5  6.5  10.5
我已经编写了循环函数的第一部分,但似乎无法通过向dataframe追加新列来完成:

for (i in 1:ncol(df)) {

  x <- (strsplit(names(df)[i], split = '_', fixed = T))[[1]][2]
for(1中的i:ncol(df)){

x我们可以
通过删除子字符串创建的分组变量来拆分数据,并获得
行平均值

cbind(df, sapply(split.default(df, sub(".*_", "", names(df))), rowMeans))
#abc_alpha abc_beta abc_char xyz_alpha xyz_beta xyz_char alpha beta char
#1         1        5        9         4        8       12   2.5  6.5 10.5
#2         2        6       10         3        7       11   2.5  6.5 10.5
#3         3        7       11         2        6       10   2.5  6.5 10.5
#4         4        8       12         1        5        9   2.5  6.5 10.5

或者使用
tidyverse
将列收集成“长”格式,然后
用分隔符将“键”列分隔成两列,
汇总
,在按行名和“键2”分组后,获得
平均值
排列成“宽”并与原始数据集usi绑定ng'bind_cols'

library(tidyverse)
df %>% 
  rownames_to_column('rn') %>% # create a rowname column
  gather(key, val, -rn) %>% # convert to long format
  separate(key, into = c('key1', 'key2')) %>% # split column into two
  group_by(rn, key2) %>% # grouping with columns
  summarise(val = mean(val)) %>% # get the mean 
  spread(key2, val) %>% # convert to wide format
  ungroup %>% # remove the groups
  select(-rn) %>% # select only columns of interest
  bind_cols(df, .) # bind with the original dataset
# abc_alpha abc_beta abc_char xyz_alpha xyz_beta xyz_char alpha beta char
#1         1        5        9         4        8       12   2.5  6.5 10.5
#2         2        6       10         3        7       11   2.5  6.5 10.5
#3         3        7       11         2        6       10   2.5  6.5 10.5
#4         4        8       12         1        5        9   2.5  6.5 10.5
数据
df
library(tidyverse)
df %>% 
  rownames_to_column('rn') %>% # create a rowname column
  gather(key, val, -rn) %>% # convert to long format
  separate(key, into = c('key1', 'key2')) %>% # split column into two
  group_by(rn, key2) %>% # grouping with columns
  summarise(val = mean(val)) %>% # get the mean 
  spread(key2, val) %>% # convert to wide format
  ungroup %>% # remove the groups
  select(-rn) %>% # select only columns of interest
  bind_cols(df, .) # bind with the original dataset
# abc_alpha abc_beta abc_char xyz_alpha xyz_beta xyz_char alpha beta char
#1         1        5        9         4        8       12   2.5  6.5 10.5
#2         2        6       10         3        7       11   2.5  6.5 10.5
#3         3        7       11         2        6       10   2.5  6.5 10.5
#4         4        8       12         1        5        9   2.5  6.5 10.5
df <- data.frame(abc_alpha, abc_beta, abc_char, xyz_alpha, xyz_beta, xyz_char)