Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/backbone.js/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 二进制矩阵的字符串列表_R_Dplyr_Data.table - Fatal编程技术网

R 二进制矩阵的字符串列表

R 二进制矩阵的字符串列表,r,dplyr,data.table,R,Dplyr,Data.table,我想根据字符串列表创建一个二进制矩阵 dt = data.table(id = c('id1','id2','id3','id4','id5','id6'), sample = c("MER-1,MER-3,MER-4","MER-5","MER-2","MER-2,MER-3,MER-4,MER-5","MER_3","MER-5" )) dt id sample 1: id1 MER-1,MER-3,MER-4 2: id2

我想根据字符串列表创建一个二进制矩阵

dt = data.table(id = c('id1','id2','id3','id4','id5','id6'), sample = c("MER-1,MER-3,MER-4","MER-5","MER-2","MER-2,MER-3,MER-4,MER-5","MER_3","MER-5" ))

dt
    id                  sample
1: id1       MER-1,MER-3,MER-4
2: id2                   MER-5
3: id3                   MER-2
4: id4 MER-2,MER-3,MER-4,MER-5
5: id5                   MER_3
6: id6                   MER-5

结果应该是:

m_count = matrix(c(1,0,1,1,0, 0,0,0,0,1, 0,1,0,0,0, 0,1,1,1,1, 0,0,1,0,0, 0,0,0,0,1), nrow = 6, ncol = 5)

m_count

    MER-1 MER-2 MER-3 MER-4 MER-5
id1     1     0     0     1     0
id2     0     0     0     1     0
id3     1     0     0     0     0
id4     1     1     0     0     0
id5     0     0     1     1     0
id6     0     1     1     0     1
我可以循环遍历列表中的每个元素,并填充矩阵,但考虑到表的大小,这将非常缓慢。有没有更快/更优雅的方式?也许用dplyr/tidyverse?
谢谢

使用
dt
从最后的注释中修复问题中的打字错误,使用
分隔行
逐行展开数据,然后使用
计算计数

library(data.table)
library(dplyr)
library(tidyr)

dt %>%
  separate_rows(sample, sep = ",") %>%
  table
给予:

     sample
id    MER-1 MER-2 MER-3 MER-4 MER-5
  id1     1     0     1     1     0
  id2     0     0     0     0     1
  id3     0     1     0     0     0
  id4     0     1     1     1     1
  id5     0     0     1     0     0
  id6     0     0     0     0     1
库(data.table)

dt您也可以使用库
splitstackshape

table(cSplit(dt, "sample", sep = ",", direction = "long"))

     sample
id    MER-1 MER-2 MER-3 MER-4 MER-5
  id1     1     0     1     1     0
  id2     0     0     0     0     1
  id3     0     1     0     0     0
  id4     0     1     1     1     1
  id5     0     0     1     0     0
  id6     0     0     0     0     1
或者使用专门为此场景创建的
cSplit_e
(由@A5C1D2H2I1M1N2O1R2T1提供):


您可以使用
strsplit

table(dt[,unlist(strsplit(sample,",")),by=id])
     V1
id    MER-1 MER-2 MER-3 MER-4 MER-5 MER_3
  id1     1     0     1     1     0     0
  id2     0     0     0     0     1     0
  id3     0     1     0     0     0     0
  id4     0     1     1     1     1     0
  id5     0     0     0     0     0     1
  id6     0     0     0     0     1     0

为此,您应该使用
cSplit_e
。@A5C1D2H2I1M1N2O1R2T1看起来是一个不错的选项,但是,我无法让它处理此数据
cSplit_e(dt,“sample”,sep=“,”,fill=0)
抛出一个错误。在其中添加一个
type=“character”
。@A5C1D2H2I1M1N2O1R2T1谢谢,这真是一个简洁的函数:)
cSplit_e(dt, "sample", sep = ",", type = "character", fill = 0, drop = TRUE)
table(dt[,unlist(strsplit(sample,",")),by=id])
     V1
id    MER-1 MER-2 MER-3 MER-4 MER-5 MER_3
  id1     1     0     1     1     0     0
  id2     0     0     0     0     1     0
  id3     0     1     0     0     0     0
  id4     0     1     1     1     1     0
  id5     0     0     0     0     0     1
  id6     0     0     0     0     1     0