R 使用基因组范围的唯一坐标
如何通过使用基因组范围比较两个数据集来找到唯一的(非重叠的基因组坐标) 数据集1=R 使用基因组范围的唯一坐标,r,bioconductor,R,Bioconductor,如何通过使用基因组范围比较两个数据集来找到唯一的(非重叠的基因组坐标) 数据集1= chr start end CNA 1 170900001 171500001 loss 1 11840001 19420001 loss 1 60300001 62700001 gain 1 25520001 25820001 gain 数据集2= chr start end CNA 1 1
chr start end CNA
1 170900001 171500001 loss
1 11840001 19420001 loss
1 60300001 62700001 gain
1 25520001 25820001 gain
数据集2=
chr start end CNA
1 170940001 171500001 gain
1 60300001 62700001 gain
1 25520001 25840001 gain
1 119860001 123040001 loss
1 171500001 171580001 gain
1 79240001 84420001 gain
预期产量
chr start end CNA
1 170940001 171500001 gain
1 119860001 123040001 loss
1 171500001 171580001 gain
1 79240001 84420001 gain
试试这个:
require("GenomicRanges" )
#data
x1 <- read.table(text="chr start end CNA
1 170900001 171500001 loss
1 11840001 19420001 loss
1 60300001 62700001 gain
1 25520001 25820001 gain",header=TRUE)
x2 <- read.table(text="chr start end CNA
1 170940001 171500001 loss
1 60300001 62700001 gain
1 25520001 25840001 gain
1 119860001 123040001 loss
1 171500001 171580001 gain
1 79240001 84420001 gain",header=TRUE)
g1 <- GRanges(seqnames=paste0("chr",x1$chr),
IRanges(start=x1$start,
end=x1$end)
)
g2 <- GRanges(seqnames=paste0("chr",x2$chr),
IRanges(start=x2$start,
end=x2$end)
)
#result
setdiff(g1,g2)
#output
# GRanges object with 2 ranges and 0 metadata columns:
# seqnames ranges strand
# <Rle> <IRanges> <Rle>
# [1] chr1 [ 11840001, 19420001] *
# [2] chr1 [170900001, 170940000] *
# -------
# seqinfo: 1 sequence from an unspecified genome; no seqlengths
@zx8754..在前一种情况下,查找唯一重叠确实有效。但如果我还想考虑CNAS来寻找坐标的唯一性,那该怎么办呢?我已经更新了示例Yes。我想比较坐标(起点和终点)和CNA,找出唯一的区域。在本例中,第一个重叠段的CNA不匹配。@zx我只使用g1_损耗和g1_增益来获得输出。它起作用了,让我们来吧。
#result
g1_loss <- setdiff(g1[g1@elementMetadata$CNA=="loss"],
g2[g2@elementMetadata$CNA=="loss"])
g1_loss@elementMetadata$CNA <- "loss"
g2_loss <- setdiff(g2[g2@elementMetadata$CNA=="loss"],
g1[g1@elementMetadata$CNA=="loss"])
g2_loss@elementMetadata$CNA <- "loss"
g1_gain <- setdiff(g1[g1@elementMetadata$CNA=="gain"],
g2[g2@elementMetadata$CNA=="gain"])
g1_gain@elementMetadata$CNA <- "gain"
g2_gain <- setdiff(g2[g2@elementMetadata$CNA=="gain"],
g1[g1@elementMetadata$CNA=="gain"])
g2_gain@elementMetadata$CNA <- "gain"
#merge
c(g1_gain,g2_gain,g1_loss,g1_gain)
#output
# GRanges object with 5 ranges and 1 metadata column:
# seqnames ranges strand | CNA
# <Rle> <IRanges> <Rle> | <character>
# [1] chr1 [ 25820002, 25840001] * | gain
# [2] chr1 [ 79240001, 84420001] * | gain
# [3] chr1 [170940001, 171580001] * | gain
# [4] chr1 [ 11840001, 19420001] * | loss
# [5] chr1 [170900001, 171500001] * | loss
# -------
# seqinfo: 1 sequence from an unspecified genome; no seqlengths