R-将复杂层段反褶积到位置&；位置信息_R_Position_Location_Intervals

R-将复杂层段反褶积到位置&；位置信息

R-将复杂层段反褶积到位置&；位置信息,r,position,location,intervals,R,Position,Location,Intervals,这个问题是问题的延伸现在，fileA中有一个额外的列，在从间隔提取位置信息时需要考虑该列。例如，在下面的示例中，位置X处的位置123-78000标记为romeo，而位置Y处的相同位置123-78000标记为mario： location start end value label X 123 78000 0 romeo #value 0 at positions X(123 to 77999 included).

这个问题是问题的延伸

现在，

fileA

中有一个额外的列，在从间隔提取位置信息时需要考虑该列。例如，在下面的示例中，位置X处的位置123-78000标记为

romeo

，而位置Y处的相同位置123-78000标记为

mario

：

location  start     end      value    label
X         123       78000    0        romeo    #value 0 at positions X(123 to 77999 included).
X         78000     78004    56       romeo    
X         78004     78005    12       romeo    #value 12 at position X(78004).
X         78006     78008    21       juliet   
X         78008     78056    8        juliet  
Y         123       78000    1        mario    #value 1 at positions Y(123 to 77999 included).
Y         78000     78004    24       mario    
Y         78004     78005    4        mario    #value 4 at position Y(78004).
Y         78006     78008    12       luigi   
Y         78008     78056    14       luigi

另一方面，

fileB

定义了我真正感兴趣的时间间隔：

location  start     end      label
X         77998     78005    romeo
X         78007     78012    juliet
Y         77998     78005    mario
Y         78007     78012    luigi

fileA

中的标签最初是从

fileB

中提取的，因此可以安全地假设重叠间隔的标签总是相等的

我试图提取

fileA

中与

fileB

中的间隔相对应的所有单个位置的信息–由于缺少更好的词，我将此过程称为反褶积。这一次，我想在考虑位置的同时做到这一点——从位置中提取位置是危险的，因为相同的位置编号可能出现在多个位置。输出

fileC

应如下所示：

location  position  value   label
X         77998     0       romeo
X         77999     0       romeo
X         78000     56      romeo
X         78001     56      romeo
X         78002     56      romeo
X         78003     56      romeo
X         78004     12      romeo   
X         78007     21      juliet
X         78008     8       juliet
X         78009     8       juliet
X         78010     8       juliet
X         78011     8       juliet
Y         77998     1       mario
Y         77999     1       mario
Y         78000     24      mario
Y         78001     24      mario
Y         78002     24      mario
Y         78003     24      mario
Y         78004     4       mario   
Y         78007     12      luigi
Y         78008     14      luigi
Y         78009     14      luigi
Y         78010     14      luigi
Y         78011     14      luigi

我原以为我自己可以从解决方案中实现这一点，但我被卡住了，特别是在这一部分，我不知道如何将位置信息合并到位置信息中：

# create sequence of positions
s <- unlist(apply(B, MARGIN=1, FUN=function(x) seq(x[2], as.numeric(x[3])-1)))

#创建位置序列
这似乎产生了您的示例输出
# It is essential that there be NO FACTORS
A<-read.table("fileA.txt",header=T,stringsAsFactors=F)
B<-read.table("fileB.txt",header=T,stringsAsFactors=F)

# build template with position in the appropriate ranges
template <- do.call(rbind,lapply(1:nrow(B),
                    function(i) cbind(location=B[i,]$location, 
                                      position=seq(B[i,]$start,B[i,]$end-1), 
                                      label=B[i,]$label)
))
template <- data.frame(template, stringsAsFactors=F)
# add position column to A, return as C
C <- merge(A,template,by=c("location","label"),all=T)

is.between <- function(x,low,hi) return(x>=low & x<=hi)
C <- C[is.between(C$position,C$start,C$end-1),]
C <- C[,c("location","position",value="value","label")]
C
#    location position value  label
# 1         X    78007    21 juliet
# 7         X    78008     8 juliet
# 8         X    78009     8 juliet
# 9         X    78010     8 juliet
# 10        X    78011     8 juliet
# 11        X    77998     0  romeo
# 12        X    77999     0  romeo
# 20        X    78000    56  romeo
# 21        X    78001    56  romeo
# 22        X    78002    56  romeo
# 23        X    78003    56  romeo
# 31        X    78004    12  romeo
# ...

#没有任何因素至关重要
A.