从下一代测序数据重塑R中的重复测量

从下一代测序数据重塑R中的重复测量,r,bioinformatics,reshape,R,Bioinformatics,Reshape,我有R中的数据集: ddat <- data.frame(gene=rep(1:4,1), ID.pat=rep(c("0", "1"), each=10), allele.freq =runif(20,min=0,max=1), SNV=round(runif(20,min=0,max=4))) ddat gene ID.pat allele.freq SNV 1 1 0 0.96841970 1 2 2 0 0.778594

我有R中的数据集:

ddat <- data.frame(gene=rep(1:4,1), ID.pat=rep(c("0", "1"), each=10), allele.freq =runif(20,min=0,max=1), SNV=round(runif(20,min=0,max=4)))

ddat

     gene ID.pat allele.freq SNV
 1     1      0  0.96841970   1
 2     2      0  0.77859462   2
 3     3      0  0.38308071   0
 4     4      0  0.03842660   4
 5     1      0  0.11313244   1
 6     2      0  0.25727911   0
 7     3      0  0.73430856   1
 8     4      0  0.93272543   0
 9     1      0  0.48698303   3
 10    2      0  0.76762848   1
 11    3      1  0.86238286   1
 12    4      1  0.87513463   2
 13    1      1  0.78232771   2
 14    2      1  0.24493196   1
 15    3      1  0.41582649   0
 16    4      1  0.49521680   4
 17    1      1  0.17983000   2
 18    2      1  0.06170987   0
 19    3      1  0.23552103   1
 20    4      1  0.26549472   0

如何修改代码以生成所需的输出?

加载
重塑2

library(reshape2)
首先,修改您的
SNV
变量(在输出数据框中将其标记为前缀“SNP_2;”,因此我将使用它)

使用
restrape2
中的
dcast
将数据帧形成宽格式:

dcast(ddat,ID.pat+gene~SNV,fun.aggregate,value.var="allele.freq")
然后您的输出将如下所示:

 ID.pat gene SNP_0 SNP_1       SNP_2 SNP_3 SNP_4
1      0    1  <NA> 0.387       0.125 0.825  <NA>
2      0    2  <NA>  <NA> 0.296,0.775  <NA> 0.971
3      0    3  <NA> 0.172        <NA>  <NA> 0.873
4      0    4  0.87 0.337        <NA>  <NA>  <NA>
5      1    1  <NA>  0.49        <NA> 0.455  <NA>
6      1    2 0.169  <NA>       0.402  <NA>  <NA>
7      1    3  <NA>  <NA> 0.754,0.168 0.509  <NA>
8      1    4  <NA>  <NA>        0.86 0.737 0.625
ID.pat基因SNP_0 SNP_1 SNP_2 SNP_3 SNP_4
1      0    1   0.387       0.125 0.825  
2      0    2     0.296,0.775   0.971
3      0    3   0.172           0.873
4      0    4  0.87 0.337            
5      1    1    0.49         0.455  
6      1    2 0.169         0.402    
7      1    3     0.754,0.168 0.509  
8      1    4            0.86 0.737 0.625
fun.aggregate <- function(x) 
  if(length(x)==0) as.character(NA) else paste(round(x,3),collapse=",")
dcast(ddat,ID.pat+gene~SNV,fun.aggregate,value.var="allele.freq")
 ID.pat gene SNP_0 SNP_1       SNP_2 SNP_3 SNP_4
1      0    1  <NA> 0.387       0.125 0.825  <NA>
2      0    2  <NA>  <NA> 0.296,0.775  <NA> 0.971
3      0    3  <NA> 0.172        <NA>  <NA> 0.873
4      0    4  0.87 0.337        <NA>  <NA>  <NA>
5      1    1  <NA>  0.49        <NA> 0.455  <NA>
6      1    2 0.169  <NA>       0.402  <NA>  <NA>
7      1    3  <NA>  <NA> 0.754,0.168 0.509  <NA>
8      1    4  <NA>  <NA>        0.86 0.737 0.625