R 基因组范围增加了覆盖范围_R

R 基因组范围增加了覆盖范围

R 基因组范围增加了覆盖范围,r,R,我正在研究RNA-seq数据，并试图按基因型绘制平均覆盖率曲线，类似于这里所做的每个基因型的RNA序列覆盖率（来源：pickrell等人，《自然》杂志，2010）为了绘制这个图，我有来自100个个体的bigwig文件，其中包含来自RNA序列数据（在特定区域）的覆盖信息，我在R中读取这些信息，作为基因组范围对象这将为我提供GRanges对象，例如在以下玩具示例中获得的对象： gr1=GRanges（seqname=1，range=IRanges（start=c（1,5,10,15,30,5

我正在研究RNA-seq数据，并试图按基因型绘制平均覆盖率曲线，类似于这里所做的

每个基因型的RNA序列覆盖率（来源：pickrell等人，《自然》杂志，2010）

为了绘制这个图，我有来自100个个体的bigwig文件，其中包含来自RNA序列数据（在特定区域）的覆盖信息，我在R中读取这些信息，作为基因组范围对象

这将为我提供GRanges对象，例如在以下玩具示例中获得的对象：

gr1=GRanges（seqname=1，range=IRanges（start=c（1,5,10,15,30,55），end=c（4,9,14,29,39,60）））

gr1$cov=c（3,1,8,6,2,10）

gr2=GRanges（seqname=1，range=IRanges（start=c（3,20,24），end=c（7,23,26）））

gr2$cov=c（3,5,3）

开始=唯一（排序（c（范围（gr1）@start，范围（gr2）@start）））

gr1

GRanges对象具有6个范围和1个元数据列：
SeqName系列钢绞线| cov
| 
1  [ 1,  4]      * |         3
1  [ 5,  9]      * |         1
1  [10, 14]      * |         8
1  [15, 29]      * |         6
1  [30, 39]      * |         2
1  [55, 60]      * |        10 
-------
seqinfo:1个来自未指定基因组的序列；没有长度

gr2

GRanges对象具有3个范围和1个元数据列：
SeqName系列钢绞线| cov
| 
1  [ 3,  7]      * |         3
1  [20, 23]      * |         5
1  [24, 26]      * |         3
-------
seqinfo:1个来自未指定基因组的序列；没有长度

问题是我每个个体都有这些（gr1和gr2是两个不同的个体），我想将它们结合起来，创建一个基因组范围对象，它为我提供了个体1和2的每个位置的总覆盖率这将如下所示：

gr3

GRanges对象具有6个范围和1个元数据列：
SeqName系列钢绞线| cov
| 
1  [ 1,  2]      * |         3
1  [ 3,  4]      * |         6 (=3+3)
1  [ 5,  7]      * |         4 (=1+3)
1  [ 8,  9]      * |         1
1  [10, 14]      * |         8
1  [15, 19]      * |         6
1  [20, 23]      * |         11 (=6+5)
1  [24, 26]      * |         9 (=6+3)
1  [27, 29]      * |         6
1  [30, 39]      * |         2
1  [55, 60]      * |        10

有人知道一个简单的方法吗？还是我注定了

谢谢你的回答

附言：我的数据不是搁浅的，但如果你有搁浅的数据，那就更好了

PPS：理想情况下，我也希望能够计算乘法，或应用任何具有两个参数x和y的函数，而不是简单地增加覆盖率。

已经快一年了，但以下是我的答案供将来参考

每当我找不到一个函数来直接执行这样的任务时，我只需将

GRanges

对象展开为单个bp分辨率。这允许我对元数据列执行任何必需的操作，将它们视为简单的

data.frame

列，因为

IRanges

现在在两个

Granges

对象之间匹配

在这个问题的具体情况下，以下工作

### Sort seqlevels
# (not necessary here, but in real world examples,
# with multiple sequences, you will want to do this)
gr1 <- sort(GenomeInfoDb::sortSeqlevels(gr1))
gr2 <- sort(GenomeInfoDb::sortSeqlevels(gr2))

### Add seqlengths
# (this corresponds to the actual sequence lengths;
# here we use the highest position between the two objects: 60)
seqlengths(gr1) <- 60

### Make 1-bp tiles covering the genome
# (using either one of gr1 and gr2 as a reference)
bins <- GenomicRanges::tileGenome(GenomeInfoDb::seqlengths(gr1),
                                  tilewidth=1,
                                  cut.last.tile.in.chrom=TRUE)

### Get coverage signal as Rle object
gr1_cov <- coverage(gr1, weight="cov")
gr2_cov <- coverage(gr2, weight="cov")

### Get average coverage in each bin
# (since the bins are 1-bp wide, this just keeps the original coverage value)
gr1_bins <- GenomicRanges::binnedAverage(bins, gr1_cov, "binned_cov")
gr2_bins <- GenomicRanges::binnedAverage(bins, gr2_cov, "binned_cov")

### Make final object:
# We can now sum the values in the metadata columns
# Addressing the PPS, you could do any other operation or apply a function
gr3 <- gr1_bins
gr3$binned_cov <- gr1_bins$binned_cov + gr2_bins$binned_cov

要压缩它并获得问题中的确切

gr3

，我们可以执行以下操作

### Compress back to variable-width IRanges (by cov)
gr3_Rle <- coverage(gr3, weight='binned_cov')
gr3 <- as(gr3_Rle, "GRanges")

### Drop 0-score rows
gr3 <- gr3[gr3$score > 0]

### Rename metadata column
names(mcols(gr3)) <- 'cov'

> gr3

GRanges object with 11 ranges and 1 metadata column:
       seqnames    ranges strand |       cov
          <Rle> <IRanges>  <Rle> | <numeric>
   [1]        1  [ 1,  2]      * |         3
   [2]        1  [ 3,  4]      * |         6
   [3]        1  [ 5,  7]      * |         4
   [4]        1  [ 8,  9]      * |         1
   [5]        1  [10, 14]      * |         8
   [6]        1  [15, 19]      * |         6
   [7]        1  [20, 23]      * |        11
   [8]        1  [24, 26]      * |         9
   [9]        1  [27, 29]      * |         6
  [10]        1  [30, 39]      * |         2
  [11]        1  [55, 60]      * |        10
  -------
  seqinfo: 1 sequence from an unspecified genome

####压缩回可变宽度IRanges（通过cov）
gr3_Rle
GRanges object with 6 ranges and 1 metadata column:
seqnames    ranges strand |       cov
   <Rle> <IRanges>  <Rle> | <numeric>
       1  [ 1,  2]      * |         3
       1  [ 3,  4]      * |         6 (=3+3)
       1  [ 5,  7]      * |         4 (=1+3)
       1  [ 8,  9]      * |         1
       1  [10, 14]      * |         8
       1  [15, 19]      * |         6
       1  [20, 23]      * |         11 (=6+5)
       1  [24, 26]      * |         9 (=6+3)
       1  [27, 29]      * |         6
       1  [30, 39]      * |         2
       1  [55, 60]      * |        10 

### Sort seqlevels
# (not necessary here, but in real world examples,
# with multiple sequences, you will want to do this)
gr1 <- sort(GenomeInfoDb::sortSeqlevels(gr1))
gr2 <- sort(GenomeInfoDb::sortSeqlevels(gr2))

### Add seqlengths
# (this corresponds to the actual sequence lengths;
# here we use the highest position between the two objects: 60)
seqlengths(gr1) <- 60

### Make 1-bp tiles covering the genome
# (using either one of gr1 and gr2 as a reference)
bins <- GenomicRanges::tileGenome(GenomeInfoDb::seqlengths(gr1),
                                  tilewidth=1,
                                  cut.last.tile.in.chrom=TRUE)

### Get coverage signal as Rle object
gr1_cov <- coverage(gr1, weight="cov")
gr2_cov <- coverage(gr2, weight="cov")

### Get average coverage in each bin
# (since the bins are 1-bp wide, this just keeps the original coverage value)
gr1_bins <- GenomicRanges::binnedAverage(bins, gr1_cov, "binned_cov")
gr2_bins <- GenomicRanges::binnedAverage(bins, gr2_cov, "binned_cov")

### Make final object:
# We can now sum the values in the metadata columns
# Addressing the PPS, you could do any other operation or apply a function
gr3 <- gr1_bins
gr3$binned_cov <- gr1_bins$binned_cov + gr2_bins$binned_cov

> gr3

GRanges object with 60 ranges and 1 metadata column:
     seqnames    ranges strand | binned_cov
        <Rle> <IRanges>  <Rle> |  <numeric>
 [1]        1    [1, 1]      * |          3
 [2]        1    [2, 2]      * |          3
 [3]        1    [3, 3]      * |          6
 [4]        1    [4, 4]      * |          6
 [5]        1    [5, 5]      * |          4
 ...      ...       ...    ... .        ...
[56]        1  [56, 56]      * |         10
[57]        1  [57, 57]      * |         10
[58]        1  [58, 58]      * |         10
[59]        1  [59, 59]      * |         10
[60]        1  [60, 60]      * |         10
-------
seqinfo: 1 sequence from an unspecified genome

### Compress back to variable-width IRanges (by cov)
gr3_Rle <- coverage(gr3, weight='binned_cov')
gr3 <- as(gr3_Rle, "GRanges")

### Drop 0-score rows
gr3 <- gr3[gr3$score > 0]

### Rename metadata column
names(mcols(gr3)) <- 'cov'

> gr3

GRanges object with 11 ranges and 1 metadata column:
       seqnames    ranges strand |       cov
          <Rle> <IRanges>  <Rle> | <numeric>
   [1]        1  [ 1,  2]      * |         3
   [2]        1  [ 3,  4]      * |         6
   [3]        1  [ 5,  7]      * |         4
   [4]        1  [ 8,  9]      * |         1
   [5]        1  [10, 14]      * |         8
   [6]        1  [15, 19]      * |         6
   [7]        1  [20, 23]      * |        11
   [8]        1  [24, 26]      * |         9
   [9]        1  [27, 29]      * |         6
  [10]        1  [30, 39]      * |         2
  [11]        1  [55, 60]      * |        10
  -------
  seqinfo: 1 sequence from an unspecified genome