在R中解析来自NCBI entrez的xml
我想从NCBI条目的features部分提取一些信息,我正在使用这段代码。 下载资料在R中解析来自NCBI entrez的xml,r,xml,xml-parsing,rentrez,R,Xml,Xml Parsing,Rentrez,我想从NCBI条目的features部分提取一些信息,我正在使用这段代码。 下载资料 fetch2 <- entrez_fetch(db = "nucleotide", id = 1028916732, rettype = "gbc", retmode="xml", parsed = TRUE) 但是,我不明白如何使用 <INSDQualifier_name> <INSDQualifier_value> 我看过
fetch2 <- entrez_fetch(db = "nucleotide", id = 1028916732,
rettype = "gbc", retmode="xml", parsed = TRUE)
但是,我不明白如何使用
<INSDQualifier_name>
<INSDQualifier_value>
我看过Pubmed的一些教程,效果很好,但输出有不同的结构。
最后,我想做一个循环来从ID列表中提取数据,因为不是所有的条目都有相同的结构,所以我想使用诸如“代码>主机< /COD> >代码> Objix/Cord>之类的标签来检索该信息。< P>因为XML相当平坦,请考虑XML的方便处理程序,<代码> XMLToDATAFRAME< /COD>:
library(XML)
fetch2 <- ...
doc <- xmlParse(fetch2)
df <- xmlToDataFrame(doc, nodes=getNodeSet(doc, "//INSDQualifier"))
df
# INSDQualifier_name INSDQualifier_value
# 1 organism Alanphillipsia aloeigena
# 2 mol_type genomic DNA
# 3 strain CPC 21286
# 4 isolation_source leaves
# 5 host Aloe melanacantha
# 6 culture_collection CBS:136408
# 7 culture_collection CPC:21286
# 8 type_material culture from holotype of Alanphillipsia aloeigena
# 9 db_xref taxon:1414674
# 10 country South Africa: Namakwaland, Koegap Nature Reserve
# 11 collected_by M.J. Wingfield
# 12 note ex-holotype culture of Alanphillipsia aloeigena
库(XML)
获取2
Organism | culture_collection | host
Alanphillipsia aloeigena | CBS:136408 | Aloe melanacantha
<INSDQualifier_name>
<INSDQualifier_value>
library(XML)
fetch2 <- ...
doc <- xmlParse(fetch2)
df <- xmlToDataFrame(doc, nodes=getNodeSet(doc, "//INSDQualifier"))
df
# INSDQualifier_name INSDQualifier_value
# 1 organism Alanphillipsia aloeigena
# 2 mol_type genomic DNA
# 3 strain CPC 21286
# 4 isolation_source leaves
# 5 host Aloe melanacantha
# 6 culture_collection CBS:136408
# 7 culture_collection CPC:21286
# 8 type_material culture from holotype of Alanphillipsia aloeigena
# 9 db_xref taxon:1414674
# 10 country South Africa: Namakwaland, Koegap Nature Reserve
# 11 collected_by M.J. Wingfield
# 12 note ex-holotype culture of Alanphillipsia aloeigena
final_df <- data.frame(t(df), stringsAsFactors = FALSE)
colnames(final_df) <- as.character(final_df[1,])
final_df <- final_df[-1,]
rownames(final_df) <- NULL
final_df
# organism mol_type strain isolation_source host culture_collection culture_collection type_material
# 1 Alanphillipsia aloeigena genomic DNA CPC 21286 leaves Aloe melanacantha CBS:136408 CPC:21286 culture from holotype of Alanphillipsia aloeigena
# db_xref country collected_by note
# 1 taxon:1414674 South Africa: Namakwaland, Koegap Nature Reserve M.J. Wingfield ex-holotype culture of Alanphillipsia aloeigena