Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/67.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 按名称汇总数据表列_R_Data.table - Fatal编程技术网

R 按名称汇总数据表列

R 按名称汇总数据表列,r,data.table,R,Data.table,我有一个数据表,其中许多变量被分为正分量和负分量。我希望合并这些列,以便显示变量的有符号值。(此类变量的名称中始终有正和负,其他变量没有。但是正和负子字符串可能出现在变量中的任何位置,即只有grepl((正)|(负)”,名称(dt))正确识别它们。) 比如说, library(data.table) set.seed(1) (DT <- data.table(x = 1:5, a_positive = sample(1:5),

我有一个数据表,其中许多变量被分为正分量和负分量。我希望合并这些列,以便显示变量的有符号值。(此类变量的名称中始终有
,其他变量没有。但是
子字符串可能出现在变量中的任何位置,即只有
grepl((正)|(负)”,名称(dt))
正确识别它们。)

比如说,

library(data.table)

set.seed(1)

(DT <- data.table(x = 1:5, 
                  a_positive = sample(1:5), 
                  a_negative = sample(1:5), 
                  b_positive = sample(1:5), 
                  b_negative = sample(1:5), 
                  c_normal = sample(1:5)))

   x a_positive a_negative b_positive b_negative c_normal
1: 1          2          5          2          3        5
2: 2          5          4          1          5        1
3: 3          4          2          3          4        2
4: 4          3          3          4          1        4
5: 5          1          1          5          2        3
我的方法依赖于
for
循环和
dplyr

library(dplyr)
library(lazyeval)
library(magrittr) 

unite_positive_negative <- function(dt){
  signed_names <- 
    names(dt)[
      duplicated(gsub("(positive)|(negative)", "", names(dt))) | 
        duplicated(gsub("(positive)|(negative)", "", names(dt)), fromLast = TRUE)]

  unsigned_names <- 
    gsub("_*((positive)|(negative))_*", "", signed_names)

  the_names <- 
    data.table(signed_names = signed_names, 
               unsigned_names = unsigned_names) 

  for (unsigned_name in unsigned_names){
    poz <- the_names[unsigned_names == unsigned_name & grepl("positive", signed_names, fixed = TRUE)][["signed_names"]]
    neg <- the_names[unsigned_names == unsigned_name & grepl("negative", signed_names, fixed = TRUE)][["signed_names"]]

    dt %<>%
      mutate_(.dots = setNames(list(interp(~p - n, p = as.name(poz), n = as.name(neg))), unsigned_name)) 
  }

  # Unimportant
  unselect_ <- function(.data, .dots){
    all_names <- names(.data)
    keeps <- names(.data)[!names(.data) %in% .dots]
    dplyr::select_(.data, .dots = keeps)
  }

  dt %>%
    unselect_(.dots = signed_names)
}
库(dplyr)
图书馆(懒汉)
图书馆(magrittr)

联合阳性和阴性我们可以尝试使用
melt/dcast
。使用
melt
id.var
指定为'x'和'c_normal'列(如果有许多'normal'列,我们也可以使用
grep
来将数据集从'wide'格式重塑为'long'格式。使用
tstrsplit
将'variable'列拆分为两个。按'x'、'c_normal'和'var1'分组(从
split
)中,我们将“value”的“negative”和“positive”子集,将它们与
-1/1相乘,然后将它们相加。然后,
dcast
从“long”格式转换为“wide”格式

library(data.table)
dcast(melt(DT, id.var = c("x", "c_normal"))[, 
       c("var1", "var2") := tstrsplit(variable, "_")
        ][, -1*value[var2=="negative"] + value[var2=="positive"] ,
        by = .(x, c_normal, var1)],
              x + c_normal~var1, value.var="V1")
#   x c_normal  a  b
#1: 1        5 -3 -1
#2: 2        1  1 -4
#3: 3        2  2 -1
#4: 4        4  0  3
#5: 5        3  0  3

没有
melt/dcast
的另一个选项是将数据集的“正”列和“负”列(假设它们是有序的)子集,乘以
1/-1
,进行加法(
+
),并将这些输出分配给没有“正/负”列的数据集子集


DT1很抱歉没有说清楚,但是变量是“无符号”列,当且仅当它包含子字符串
positive
negative
。因此并非所有的“normal”列都包含
normal
,并且
positive
negative
可能出现在列名的任何地方。@Hugh我的解决方案是bas我对您提供的示例进行了详细介绍,他们都适用于该示例。
library(data.table)
dcast(melt(DT, id.var = c("x", "c_normal"))[, 
       c("var1", "var2") := tstrsplit(variable, "_")
        ][, -1*value[var2=="negative"] + value[var2=="positive"] ,
        by = .(x, c_normal, var1)],
              x + c_normal~var1, value.var="V1")
#   x c_normal  a  b
#1: 1        5 -3 -1
#2: 2        1  1 -4
#3: 3        2  2 -1
#4: 4        4  0  3
#5: 5        3  0  3
DT1 <- DT[, c("x", grep("normal", names(DT), value=TRUE)), with = FALSE]
DT2 <- DT[, grep("positive", names(DT)), with = FALSE] +
          -1 * DT[, grep("negative", names(DT)), with = FALSE]
DT1[, c("a", "b") := DT2]
DT1
#    x c_normal  a  b
# 1: 1        5 -3 -1
# 2: 2        1  1 -4
# 3: 3        2  2 -1
# 4: 4        4  0  3
# 5: 5        3  0  3