如何在R bioconductor中检索UCSC refseq基因
我正在分析一些芯片序列数据,我能够使用基因组浏览器检索与每个芯片染色体区域相关的序列元素。在解析和搜索特定的基序后,我得到如下输出:如何在R bioconductor中检索UCSC refseq基因,r,bioconductor,R,Bioconductor,我正在分析一些芯片序列数据,我能够使用基因组浏览器检索与每个芯片染色体区域相关的序列元素。在解析和搜索特定的基序后,我得到如下输出: head (chr.reg) [,1] [1,] "chr1:181030981-181032670" [2,] "chr3:55709147-55709901" [3,] "chr3:119813410-119814934" [4,] "chr4:185201060-185205420" [5,] "c
head (chr.reg)
[,1]
[1,] "chr1:181030981-181032670"
[2,] "chr3:55709147-55709901"
[3,] "chr3:119813410-119814934"
[4,] "chr4:185201060-185205420"
[5,] "chr4:39610956-39611545"
[6,] "chr6:126253238-126253636"
每个染色体区域都包含我感兴趣的转录因子基序
我的问题如下:
是否有一种方法可以检索与这些区域相关联的refseq基因名?我试着查看bioconductor封装,但我找不到任何封装,或者我只是忽略了一个!有谁知道一个具体的软件包可以帮助我解决这个问题吗
提前感谢:)我相信答案就在
ChIPpeakAnno
软件包中。
下面是一个示例代码:
require(ChIPpeakAnno)
peak <- RangedData(space="chr4", IRanges(39610956, 39611545))#chromosome start, end
data (TSS.human.GRCh37)
ap <- annotatePeakInBatch(peak,Annotation=TSS.human.GRCh37 , PeakLocForDistance="end")
> ap
RangedData with 1 row and 9 value columns across 1 space
space ranges | peak strand
<factor> <IRanges> | <character> <character>
1 ENSG00000163683 4 [39610956, 39611545] | 1 -
feature start_position end_position insideFeature
<character> <numeric> <numeric> <character>
1 ENSG00000163683 ENSG00000163683 39552535 39640513 inside
distancetoFeature shortestDistance fromOverlappingOrNearest
<numeric> <numeric> <character>
1 ENSG00000163683 28968 28968 NearestStart
require (org.Hs.eg.db)
gene.anno <- select(org.Hs.eg.db, keys= ap$feature,keytype = "ENSEMBL", columns=c("ENSEMBL",
"SYMBOL"))
> gene.anno
ENSEMBL ENTREZID SYMBOL
1 ENSG00000163683 201895 SMIM14