在R中解析来自NCBI entrez的xml

在R中解析来自NCBI entrez的xml,r,xml,xml-parsing,rentrez,R,Xml,Xml Parsing,Rentrez,我想从NCBI条目的features部分提取一些信息,我正在使用这段代码。 下载资料 fetch2 <- entrez_fetch(db = "nucleotide", id = 1028916732, rettype = "gbc", retmode="xml", parsed = TRUE) 但是,我不明白如何使用 <INSDQualifier_name> <INSDQualifier_value> 我看过

我想从NCBI条目的features部分提取一些信息,我正在使用这段代码。 下载资料

fetch2 <- entrez_fetch(db = "nucleotide", id = 1028916732, 
                       rettype = "gbc", retmode="xml", parsed = TRUE)
但是,我不明白如何使用

<INSDQualifier_name>
<INSDQualifier_value>

我看过Pubmed的一些教程,效果很好,但输出有不同的结构。
最后,我想做一个循环来从ID列表中提取数据,因为不是所有的条目都有相同的结构,所以我想使用诸如“代码>主机< /COD> >代码> Objix/Cord>之类的标签来检索该信息。

< P>因为XML相当平坦,请考虑XML的方便处理程序,<代码> XMLToDATAFRAME< /COD>:

library(XML)

fetch2 <- ...
doc <- xmlParse(fetch2)
df <- xmlToDataFrame(doc, nodes=getNodeSet(doc, "//INSDQualifier"))

df
#    INSDQualifier_name                               INSDQualifier_value
# 1            organism                          Alanphillipsia aloeigena
# 2            mol_type                                       genomic DNA
# 3              strain                                         CPC 21286
# 4    isolation_source                                            leaves
# 5                host                                 Aloe melanacantha
# 6  culture_collection                                        CBS:136408
# 7  culture_collection                                         CPC:21286
# 8       type_material culture from holotype of Alanphillipsia aloeigena
# 9             db_xref                                     taxon:1414674
# 10            country  South Africa: Namakwaland, Koegap Nature Reserve
# 11       collected_by                                    M.J. Wingfield
# 12               note   ex-holotype culture of Alanphillipsia aloeigena
库(XML)
获取2
Organism | culture_collection | host  
Alanphillipsia aloeigena | CBS:136408 | Aloe melanacantha
<INSDQualifier_name>
<INSDQualifier_value>
library(XML)

fetch2 <- ...
doc <- xmlParse(fetch2)
df <- xmlToDataFrame(doc, nodes=getNodeSet(doc, "//INSDQualifier"))

df
#    INSDQualifier_name                               INSDQualifier_value
# 1            organism                          Alanphillipsia aloeigena
# 2            mol_type                                       genomic DNA
# 3              strain                                         CPC 21286
# 4    isolation_source                                            leaves
# 5                host                                 Aloe melanacantha
# 6  culture_collection                                        CBS:136408
# 7  culture_collection                                         CPC:21286
# 8       type_material culture from holotype of Alanphillipsia aloeigena
# 9             db_xref                                     taxon:1414674
# 10            country  South Africa: Namakwaland, Koegap Nature Reserve
# 11       collected_by                                    M.J. Wingfield
# 12               note   ex-holotype culture of Alanphillipsia aloeigena
final_df <- data.frame(t(df), stringsAsFactors = FALSE)

colnames(final_df) <- as.character(final_df[1,])
final_df <- final_df[-1,]
rownames(final_df) <- NULL

final_df 
#                   organism    mol_type    strain isolation_source              host culture_collection culture_collection                                     type_material
# 1 Alanphillipsia aloeigena genomic DNA CPC 21286           leaves Aloe melanacantha         CBS:136408          CPC:21286 culture from holotype of Alanphillipsia aloeigena
#          db_xref                                          country   collected_by                                            note
#  1 taxon:1414674 South Africa: Namakwaland, Koegap Nature Reserve M.J. Wingfield ex-holotype culture of Alanphillipsia aloeigena