R 如何使用XML2包解析XML文件的子路径
我有如下内容,看起来像这样,我需要使用 但是,使用此代码,我无法获取R 如何使用XML2包解析XML文件的子路径,r,xml,tidyverse,R,Xml,Tidyverse,我有如下内容,看起来像这样,我需要使用 但是,使用此代码,我无法获取subcellularLocationxpath下的列表: library(xml2) xmlfile <- "https://www.uniprot.org/uniprot/P09429.xml" doc <- xmlfile %>% xml2::read_xml() xml_name(doc) xml_children(doc) x <- xml_find_all(doc, "//subce
subcellularLocation
xpath下的列表:
library(xml2)
xmlfile <- "https://www.uniprot.org/uniprot/P09429.xml"
doc <- xmlfile %>%
xml2::read_xml()
xml_name(doc)
xml_children(doc)
x <- xml_find_all(doc, "//subcellularLocation")
xml_path(x)
# character(0)
如果您不介意,可以使用
rvest
软件包:
library(rvest)
a=read_html(xmlfile)%>%
html_nodes("subcellularlocation")
a%>%html_children()%>%html_text()
[1] "Nucleus" "Chromosome"
[3] "Cytoplasm" "Secreted"
[5] "Cell membrane" "Peripheral membrane protein"
[7] "Extracellular side" "Endosome"
[9] "Endoplasmic reticulum-Golgi intermediate compartment"
使用
x非常感谢。如何获取subcellularlocation
节点下的子节点(即位置)的内容?(例如细胞核、染色体、细胞质等)@scamander您希望的输出是什么?你能用期望的结果更新你的问题吗?举个小例子,应该能让你对自己的想法有所了解expect@Scamander,此解决方案是否仍不能回答您的问题?但不完全在那里(例如,细胞膜等,应分开)请参阅我的答案(仅使用xml2)。如果答案解决了您的问题,请单击“接受”标记,以便其他人知道此问题已解决
library(rvest)
a=read_html(xmlfile)%>%
html_nodes("subcellularlocation")
a%>%html_children()%>%html_text()
[1] "Nucleus" "Chromosome"
[3] "Cytoplasm" "Secreted"
[5] "Cell membrane" "Peripheral membrane protein"
[7] "Extracellular side" "Endosome"
[9] "Endoplasmic reticulum-Golgi intermediate compartment"
# Namespaces ---------------------------------------------------------------
# If the document uses namespaces, you'll need use xml_ns to form
# a unique mapping between full namespace url and a short prefix
x <- read_xml('
<root xmlns:f = "http://foo.com" xmlns:g = "http://bar.com">
<f:doc><g:baz /></f:doc>
<f:doc><g:baz /></f:doc>
</root>
')
xml_find_all(x, ".//f:doc")
xml_find_all(x, ".//f:doc", xml_ns(x))
d1 <-> http://uniprot.org/uniprot
xsi <-> http://www.w3.org/2001/XMLSchema-instance
xml_find_all(doc, "//d1:subcellularLocation")
%>% xml_children()
%>% xml_text()
## [1] "Nucleus"
## [2] "Chromosome"
## [3] "Cytoplasm"
## [4] "Secreted"
## [5] "Cell membrane"
## [6] "Peripheral membrane protein"
## [7] "Extracellular side"
## [8] "Endosome"
## [9] "Endoplasmic reticulum-Golgi intermediate compartment"ent"