R 在数据框中排列染色体编号
我有一个包含样本染色体及其频率的文件:R 在数据框中排列染色体编号,r,R,我有一个包含样本染色体及其频率的文件: a sample Chr_No frequency sample-1 chr1: 0 sample-1 chr2: 0 sample-1 chr3: 0 sample-1 chr4: 1 sample-1 chr5: 0 sample-1 chr6: 0 sample-1 chr7: 0 sample-1
a
sample Chr_No frequency
sample-1 chr1: 0
sample-1 chr2: 0
sample-1 chr3: 0
sample-1 chr4: 1
sample-1 chr5: 0
sample-1 chr6: 0
sample-1 chr7: 0
sample-1 chr8: 0
sample-1 chr9: 1
sample-1 chr10 0
sample-1 chr11 0
......
我想将其转换为数据帧,因此,我在R中使用:
b <- dcast( a, Sample ~ Chr_No, value.var = "Frequency", fill = 0 )
b首先从名称中删除冒号,然后使用mixedsort
将名称排列为chr1
,chr2
library(gtools)
names(b) <- sub(":", "", names(b))
cbind(b[1], b[-1][mixedsort(names(b[-1]))])
# sample chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11
#1 sample-1 0 0 0 1 0 0 0 0 1 0 0
在dcast
之前的order
的另一个选项是将其更改为factor
列,在删除'Chr\u No'中字符串末尾的:
后指定级别
library(data.table)
setDT(a)[, Chr_No := factor(sub(':$', '', Chr_No), levels = paste0("chr", 1:11))]
然后,执行dcast
dcast( a, sample ~ Chr_No, value.var = "frequency", fill = 0 )
# sample chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11
#1: sample-1 0 0 0 1 0 0 0 0 1 0 0
数据
a在我的情况下它不起作用。是的,我意识到你需要的不止这些。我添加了一个答案,看看它是否适用于您的案例。@RochiSaurabh更新了答案。假设第一列中有sample
。
dcast( a, sample ~ Chr_No, value.var = "frequency", fill = 0 )
# sample chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11
#1: sample-1 0 0 0 1 0 0 0 0 1 0 0
a <- structure(list(sample = c("sample-1", "sample-1", "sample-1",
"sample-1", "sample-1", "sample-1", "sample-1", "sample-1", "sample-1",
"sample-1", "sample-1"), Chr_No = c("chr1:", "chr2:", "chr3:",
"chr4:", "chr5:", "chr6:", "chr7:", "chr8:", "chr9:", "chr10",
"chr11"), frequency = c(0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L,
0L)), class = "data.frame", row.names = c(NA, -11L))