Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/angularjs/20.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 使用基因组范围的唯一坐标_R_Bioconductor - Fatal编程技术网

R 使用基因组范围的唯一坐标

R 使用基因组范围的唯一坐标,r,bioconductor,R,Bioconductor,如何通过使用基因组范围比较两个数据集来找到唯一的(非重叠的基因组坐标) 数据集1= chr start end CNA 1 170900001 171500001 loss 1 11840001 19420001 loss 1 60300001 62700001 gain 1 25520001 25820001 gain 数据集2= chr start end CNA 1 1

如何通过使用基因组范围比较两个数据集来找到唯一的(非重叠的基因组坐标)

数据集1=

chr start         end       CNA
1   170900001   171500001   loss
1   11840001    19420001    loss
1   60300001    62700001    gain
1   25520001    25820001    gain
数据集2=

chr  start       end        CNA
1   170940001   171500001   gain
1   60300001    62700001    gain
1   25520001    25840001    gain
1   119860001   123040001   loss
1   171500001   171580001   gain
1   79240001    84420001    gain
预期产量

     chr     start       end        CNA
     1     170940001 171500001  gain
     1    119860001  123040001    loss
     1    171500001  171580001    gain
     1    79240001   84420001     gain
试试这个:

require("GenomicRanges"        )

#data
x1 <- read.table(text="chr start         end       CNA
1   170900001   171500001   loss
1   11840001    19420001    loss
1   60300001    62700001    gain
1   25520001    25820001    gain",header=TRUE)

x2 <- read.table(text="chr  start       end        CNA
1   170940001   171500001   loss
1   60300001    62700001    gain
1   25520001    25840001    gain
1   119860001   123040001   loss
1   171500001   171580001   gain
1   79240001    84420001    gain",header=TRUE)

g1 <-  GRanges(seqnames=paste0("chr",x1$chr),
               IRanges(start=x1$start,
                       end=x1$end)
               )

g2 <-  GRanges(seqnames=paste0("chr",x2$chr),
               IRanges(start=x2$start,
                       end=x2$end)
               )

#result
setdiff(g1,g2)

#output
# GRanges object with 2 ranges and 0 metadata columns:
#       seqnames                 ranges strand
#          <Rle>              <IRanges>  <Rle>
#   [1]     chr1 [ 11840001,  19420001]      *
#   [2]     chr1 [170900001, 170940000]      *
#   -------
#   seqinfo: 1 sequence from an unspecified genome; no seqlengths

@zx8754..在前一种情况下,查找唯一重叠确实有效。但如果我还想考虑CNAS来寻找坐标的唯一性,那该怎么办呢?我已经更新了示例Yes。我想比较坐标(起点和终点)和CNA,找出唯一的区域。在本例中,第一个重叠段的CNA不匹配。@zx我只使用g1_损耗和g1_增益来获得输出。它起作用了,让我们来吧。
#result
g1_loss <- setdiff(g1[g1@elementMetadata$CNA=="loss"],
                   g2[g2@elementMetadata$CNA=="loss"])
g1_loss@elementMetadata$CNA <- "loss"

g2_loss <- setdiff(g2[g2@elementMetadata$CNA=="loss"],
                   g1[g1@elementMetadata$CNA=="loss"])
g2_loss@elementMetadata$CNA <- "loss"

g1_gain <- setdiff(g1[g1@elementMetadata$CNA=="gain"],
                   g2[g2@elementMetadata$CNA=="gain"])
g1_gain@elementMetadata$CNA <- "gain"

g2_gain <- setdiff(g2[g2@elementMetadata$CNA=="gain"],
                   g1[g1@elementMetadata$CNA=="gain"])
g2_gain@elementMetadata$CNA <- "gain"

#merge
c(g1_gain,g2_gain,g1_loss,g1_gain)

#output
# GRanges object with 5 ranges and 1 metadata column:
#     seqnames                 ranges strand |         CNA
#        <Rle>              <IRanges>  <Rle> | <character>
# [1]     chr1 [ 25820002,  25840001]      * |        gain
# [2]     chr1 [ 79240001,  84420001]      * |        gain
# [3]     chr1 [170940001, 171580001]      * |        gain
# [4]     chr1 [ 11840001,  19420001]      * |        loss
# [5]     chr1 [170900001, 171500001]      * |        loss
# -------
#   seqinfo: 1 sequence from an unspecified genome; no seqlengths