将逗号分隔的列转换为R中的列表_R_Data Visualization_Tsv

将逗号分隔的列转换为R中的列表

将逗号分隔的列转换为R中的列表,r,data-visualization,tsv,R,Data Visualization,Tsv,我对R语言还不太熟悉，不太清楚该怎么做。如果我有一个tsv（制表符分隔文件），并通过以下方式读取到表中： > table <- read.delim(file='test.tsv',sep='\t',header=TRUE,stringsAsFactors=FALSE) id features 1. 131 FeatureA,FeatureB,FeatureC, 2. 132 FeatureA,FeatureD,FeatureE,Featur

我对R语言还不太熟悉，不太清楚该怎么做。如果我有一个tsv（制表符分隔文件），并通过以下方式读取到表中：

> table <- read.delim(file='test.tsv',sep='\t',header=TRUE,stringsAsFactors=FALSE)

    id              features
1. 131  FeatureA,FeatureB,FeatureC,
2. 132  FeatureA,FeatureD,FeatureE,FeatureF
3. 135  FeatureD,FeatureE,FeatureC
4. 139  FeatureF,FeatureB

>table我的“splitstackshape”包就是为了处理这些类型的任务而编写的。您可以浏览concat.split
函数系列
以下是几个例子：
作为列表
。（但是函数会对输出进行排序——最好使用strsplit
，直到我添加了一个不对输出排序的选项）
作为“长”数据帧
：
x2 <- concat.split.multiple(mydf, split.col="features", sep=",")
x2
#     id features_1 features_2 features_3 features_4
# 1. 131   FeatureA   FeatureB   FeatureC       <NA>
# 2. 132   FeatureA   FeatureD   FeatureE   FeatureF
# 3. 135   FeatureD   FeatureE   FeatureC       <NA>
# 4. 139   FeatureF   FeatureB       <NA>       <NA>

x3 <- concat.split.multiple(mydf, split.cols="features", seps=",", direction="long")
x3
#     id time features
# 1  131    1 FeatureA
# 2  132    1 FeatureA
# 3  135    1 FeatureD
# 4  139    1 FeatureF
# 5  131    2 FeatureB
# 6  132    2 FeatureD
# 7  135    2 FeatureE
# 8  139    2 FeatureB
# 9  131    3 FeatureC
# 10 132    3 FeatureE
# 11 135    3 FeatureC
# 12 139    3     <NA>
# 13 131    4     <NA>
# 14 132    4 FeatureF
# 15 135    4     <NA>
# 16 139    4     <NA>

我的“splitstackshape”包就是为了处理这些类型的任务而编写的。您可以浏览concat.split
函数系列
以下是几个例子：
作为列表
。（但是函数会对输出进行排序——最好使用strsplit
，直到我添加了一个不对输出排序的选项）
作为“长”数据帧
：
x2 <- concat.split.multiple(mydf, split.col="features", sep=",")
x2
#     id features_1 features_2 features_3 features_4
# 1. 131   FeatureA   FeatureB   FeatureC       <NA>
# 2. 132   FeatureA   FeatureD   FeatureE   FeatureF
# 3. 135   FeatureD   FeatureE   FeatureC       <NA>
# 4. 139   FeatureF   FeatureB       <NA>       <NA>

x3 <- concat.split.multiple(mydf, split.cols="features", seps=",", direction="long")
x3
#     id time features
# 1  131    1 FeatureA
# 2  132    1 FeatureA
# 3  135    1 FeatureD
# 4  139    1 FeatureF
# 5  131    2 FeatureB
# 6  132    2 FeatureD
# 7  135    2 FeatureE
# 8  139    2 FeatureB
# 9  131    3 FeatureC
# 10 132    3 FeatureE
# 11 135    3 FeatureC
# 12 139    3     <NA>
# 13 131    4     <NA>
# 14 132    4 FeatureF
# 15 135    4     <NA>
# 16 139    4     <NA>

您可以使用strsplit：
table$list.features = strsplit(table$features,",")

您可能还希望为这些功能创建指标变量：
table[unique(unlist(table$list.features))]=0
for (i in 1:nrow(table)) table[i,table$list.features[[i]]]=1

您可以使用strsplit：
table$list.features = strsplit(table$features,",")

您可能还希望为这些功能创建指标变量：
table[unique(unlist(table$list.features))]=0
for (i in 1:nrow(table)) table[i,table$list.features[[i]]]=1

请分享您想要的输出。为features列创建一个列表
应该和strsplit（表$features，“，”）

一样简单，但我不确定这是否是您想要的输出类型。理想情况下，我希望第二列是可索引的，例如，如果我想访问第二行的第二个功能，我可以使用[2,2[2]]将返回“特色”。请共享您想要的输出。为features列创建一个

列表

应该和strsplit（表$features，“，”）一样简单，但我不确定这是否是您想要的输出类型。理想情况下，我希望第二列是可索引的，例如，如果我想访问第二行的第二个功能，我可以使用[2,2[2]]我想我的评论和我现有的答案已经考虑到了

strsplit

。您的制表方法可能不是最有效的。此外，“splitstackshape”软件包也涵盖了这一点：

concat.split.expanded（mydf，“features”，type=“character”，fill=0）

：-）谢谢你的回答，这里的两个答案都帮助我学习如何完成这项任务。我认为

strsplit

已经被我的评论和我现有的答案处理好了。您的制表方法可能不是最有效的。此外，“splitstackshape”软件包也涵盖了这一点：

concat.split.expanded（mydf，“features”，type=“character”，fill=0）

：-）谢谢您的回答，这里的两个答案都帮助我学习如何完成此任务。