R 有效地分割数据帧
我有这个R 有效地分割数据帧,r,split,dataframe,apply,R,Split,Dataframe,Apply,我有这个数据框: set.seed(1) n=20 df <- data.frame(s1 = paste(sample(0:3, n, replace = TRUE),sample(0:3, n, replace = TRUE),sep="/"), s2 = paste(sample(0:3, n, replace = TRUE),sample(0:3, n, replace = TRUE),sep="/"), s3
数据框
:
set.seed(1)
n=20
df <- data.frame(s1 = paste(sample(0:3, n, replace = TRUE),sample(0:3, n, replace = TRUE),sep="/"),
s2 = paste(sample(0:3, n, replace = TRUE),sample(0:3, n, replace = TRUE),sep="/"),
s3 = paste(sample(0:3, n, replace = TRUE),sample(0:3, n, replace = TRUE),sep="/"),
stringsAsFactors = FALSE)
但我想知道是否还有更有效的方法这里有点奇怪:
library(data.table)
fwrite(df, sep = "/", quote = FALSE,
col.names = FALSE, file = "df.txt")
NN <- 2L*ncol(df)
DT1 <- fread("df.txt", sep = "/", select = seq(from = 1L, to = NN, by = 2L))
DT2 <- fread("df.txt", sep = "/", select = seq(from = 2L, to = NN, by = 2L))
库(data.table)
fwrite(df,sep=“/”,引号=FALSE,
col.names=FALSE,file=“df.txt”)
建议:使用stri_split_fixed。。。下面显示了一些基准。。。
(代码假定您以矩阵形式读取数据,然后将其转换为字符向量,使用“/”进行拆分,然后使用矩阵(prevOutput,nrow=origNrow,ncol=2*origNcol)
选项(stringsAsFactors=F)
图书馆(rbenchmark)
图书馆(stringi)
图书馆(tidyr)
种子(1)
ncols你有什么机会得到一些基因型数据?你抓住我了。但是,它不是传统的VCF格式。它只有CHROM.POS和GT字段。有什么建议吗
library(data.table)
fwrite(df, sep = "/", quote = FALSE,
col.names = FALSE, file = "df.txt")
NN <- 2L*ncol(df)
DT1 <- fread("df.txt", sep = "/", select = seq(from = 1L, to = NN, by = 2L))
DT2 <- fread("df.txt", sep = "/", select = seq(from = 2L, to = NN, by = 2L))
options(stringsAsFactors=F)
library(rbenchmark)
library(stringi)
library(tidyr)
set.seed(1)
ncols <- 1
nrows <- 10*1000
strdat <- paste(sample(0:3, nrows*ncols, replace=T),
sample(0:3, nrows*ncols, replace=T), sep="/")
benchmark(strsplitMtd=lapply(strdat, function(x) strsplit(x,"/")[[1]]),
striMtd=stri_list2matrix(stri_split_fixed(strdat, "/"), byrow=T),
tidyrMtd=separate(data.frame(S=strdat), S, c("S1","S2"), "/"))