获取R中基因名称列表的基因ID_R_Bioinformatics_Genetics

获取R中基因名称列表的基因ID

获取R中基因名称列表的基因ID,r,bioinformatics,genetics,R,Bioinformatics,Genetics,我有一个巨大的基因名称列表，我想将对应的基因ID映射到每个名称。我尝试过使用这个R库：org.Hs.eg.db，但是它创建的ID比名称多，这使得很难将结果映射到一起，特别是如果列表很长的话输入文件示例（7个基因名称）：理想输出为（7个ID）：当前输出（8个ID！！）：对如何解决这个问题有什么建议吗？或者使用其他简单工具完成所需任务（映射基因ID）？这是我正在使用的代码： library("org.Hs.eg.db") #load the library input <- r

我有一个巨大的基因名称列表，我想将对应的基因ID映射到每个名称。我尝试过使用这个R库：

org.Hs.eg.db

，但是它创建的ID比名称多，这使得很难将结果映射到一起，特别是如果列表很长的话

输入文件示例（7个基因名称）：

理想输出为（7个ID）：

当前输出（8个ID！！）：

对如何解决这个问题有什么建议吗？或者使用其他简单工具完成所需任务（映射基因ID）？

这是我正在使用的代码：

library("org.Hs.eg.db") #load the library

input <- read.csv("myfile.csv",TRUE,",") #read input file

GeneCol = as.character(input$Gene.name) #access the column that has gene names in my file

output = unlist(mget(x = GeneCol, envir = org.Hs.egALIAS2EG, ifnotfound=NA)) #get IDs

write.csv(output, file = "GeneIDs.csv") #write the list of IDs to a CSV file

library（“org.Hs.eg.db”）#加载库
在org.Hs.eg.db包上输入使用mapIds（）。但是您看到8个ID的原因是符号之间的映射不是1:1。您需要决定一种策略来处理这样多个地图。另外，在Bioconductor支持站点上询问有关Bioconductor软件包的问题
这是一个完整的示例（请注意，我如何不需要您的文件“myfile.csv”来运行它，因此很容易复制）
你现在的代码是什么？这些ID应该从哪里来？你有某种查找表吗？在问题中添加了代码
6199
23198
9659
57136
84629
9858
51126

6199
23198
9659
57136
27320 *undesired output ID*
84629
9858
51126

library("org.Hs.eg.db") #load the library

input <- read.csv("myfile.csv",TRUE,",") #read input file

GeneCol = as.character(input$Gene.name) #access the column that has gene names in my file

output = unlist(mget(x = GeneCol, envir = org.Hs.egALIAS2EG, ifnotfound=NA)) #get IDs

write.csv(output, file = "GeneIDs.csv") #write the list of IDs to a CSV file

library(org.Hs.eg.db)
symbol <- c(
    "RPS6KB2", "PSME4", "PDE4DIP", "APMAP", "TNRC18",
    "PPP1R26", "NAA20"
)
mapIds(org.Hs.eg.db, symbol, "ENTREZID", "SYMBOL")

> mapIds(org.Hs.eg.db, symbol, "ENTREZID", "SYMBOL")
'select()' returned 1:1 mapping between keys and columns
RPS6KB2   PSME4 PDE4DIP   APMAP  TNRC18 PPP1R26   NAA20 
 "6199" "23198"  "9659" "57136" "84629"  "9858" "51126"