Xml R中的XPath:选择值
我有一个XML文件,如下所示:Xml R中的XPath:选择值,xml,r,xpath,bioinformatics,Xml,R,Xpath,Bioinformatics,我有一个XML文件,如下所示: <?xml version="1.0"?> <!DOCTYPE pathway SYSTEM "http://www.kegg.jp/kegg/xml/KGML_v0.7.1_.dtd"> <!-- Creation date: Sep 1, 2014 12:00:13 +0900 (GMT+09:00) --> <pathway name="path:hsa04010" org="hsa" number="04010"
<?xml version="1.0"?>
<!DOCTYPE pathway SYSTEM "http://www.kegg.jp/kegg/xml/KGML_v0.7.1_.dtd">
<!-- Creation date: Sep 1, 2014 12:00:13 +0900 (GMT+09:00) -->
<pathway name="path:hsa04010" org="hsa" number="04010"
title="MAPK signaling pathway"
image="http://www.kegg.jp/kegg/pathway/hsa/hsa04010.png"
link="http://www.kegg.jp/kegg-bin/show_pathway?hsa04010">
<entry id="1" name="cpd:C00338" type="compound"
link="http://www.kegg.jp/dbget-bin/www_bget?C00338">
<graphics name="C00338" fgcolor="#000000" bgcolor="#FFFFFF"
type="circle" x="138" y="743" width="8" height="8"/>
</entry>
<entry id="2" name="hsa:5923 hsa:5924" type="gene"
link="http://www.kegg.jp/dbget-bin/www_bget?hsa:5923+hsa:5924">
<graphics name="RASGRF1, CDC25, CDC25L, GNRP, GRF1, GRF55, H-GRF55, PP13187, ras-GRF1..." fgcolor="#000000" bgcolor="#BFFFBF"
type="rectangle" x="392" y="236" width="46" height="17"/>
<relation entry1="47" entry2="40" type="PPrel">
<subtype name="activation" value="-->"/>
</relation>
<relation entry1="46" entry2="40" type="PPrel">
<subtype name="activation" value="-->"/>
</relation>
<relation entry1="45" entry2="40" type="PPrel">
<subtype name="activation" value="-->"/>
</relation>
。。。这是正确的,但我不知道如何获取两个单独的值(第二种情况下都是),并将它们存储在某个地方。我试过了
getNodeSet(root, '/pathway/entry[@type="gene"]/@id')
。。。但这只会给我一个错误:
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘saveXML’ for signature ‘"XMLAttributeValue"’
即使它能工作,我也只会得到id
属性,而不是name
,这是我想要的。但鉴于我似乎无法获得哪怕是一个属性值,那么 你可以试试
lapply(data['/pathway/entry[@type="gene"]/@id | /pathway/entry[@type="gene"]/*//@name'], as, "character")
# [[1]]
# [1] "2"
#
# [[2]]
# [1] "RASGRF1, CDC25, CDC25L, GNRP, GRF1, GRF55, H-GRF55, PP13187, ras-GRF1..."
#
# [[3]]
# [1] "activation"
#
# [[4]]
# [1] "activation"
#
# [[5]]
# [1] "activation"
及
编辑:
数据
是
data <- xmlParse('<?xml version="1.0"?>
<!DOCTYPE pathway SYSTEM "http://www.kegg.jp/kegg/xml/KGML_v0.7.1_.dtd">
<!-- Creation date: Sep 1, 2014 12:00:13 +0900 (GMT+09:00) -->
<pathway name="path:hsa04010" org="hsa" number="04010"
title="MAPK signaling pathway"
image="http://www.kegg.jp/kegg/pathway/hsa/hsa04010.png"
link="http://www.kegg.jp/kegg-bin/show_pathway?hsa04010">
<entry id="1" name="cpd:C00338" type="compound"
link="http://www.kegg.jp/dbget-bin/www_bget?C00338">
<graphics name="C00338" fgcolor="#000000" bgcolor="#FFFFFF"
type="circle" x="138" y="743" width="8" height="8"/>
</entry>
<entry id="2" name="hsa:5923 hsa:5924" type="gene"
link="http://www.kegg.jp/dbget-bin/www_bget?hsa:5923+hsa:5924">
<graphics name="RASGRF1, CDC25, CDC25L, GNRP, GRF1, GRF55, H-GRF55, PP13187, ras-GRF1..." fgcolor="#000000" bgcolor="#BFFFBF"
type="rectangle" x="392" y="236" width="46" height="17"/>
<relation entry1="47" entry2="40" type="PPrel">
<subtype name="activation" value="-->"/>
</relation>
<relation entry1="46" entry2="40" type="PPrel">
<subtype name="activation" value="-->"/>
</relation>
<relation entry1="45" entry2="40" type="PPrel">
<subtype name="activation" value="-->"/>
</relation>
</entry>
</pathway>', asText = TRUE)
data包中有一个KGML解析器可能会有所帮助。请查看小插曲以了解详细信息
library(KEGGgraph)
url <- "http://rest.kegg.jp/get/hsa04010/kgml"
x <- parseKGML(url)
库(KEGGgraph)
url我似乎没有得到相同的结果<代码>>lapply(数据['/pathway/entry[@type=“gene”]/@id |/pathway/entry[@type=“gene”]/*/@name'],如,…[截断];$;字符(0)
是我在第一种情况下得到的,而对于xpathapplic(root,/pathway/entry[@type=“gene”]///relations',xmlAttrs),则是list()
…我做错了什么吗?我只是从你的帖子中复制粘贴了它。嗯,但我写了xpathApply(数据,'/pathway/entry[@type=“gene”]//relation',xmlAttrs)
而不是xpathApply(root,'/pathway/entry[@type=“gene”]//relation',xmlAttrs)
:->无论如何,我添加了我使用的变量data
。啊,好的,这与我如何读取文件有关!我有data=xmlTreeParse(“~/Downloads/hsa04010.xml”)
,我应该如何更改它来处理您的代码?我假设我使用xmlParse
和asText=TRUE
而不是xmlTreeParse
,但是给出我的文件路径只会给我似乎不是xml
。xmlParse
就像xmlTreeParse(…,useInternalNodes=TRUE)
。我将
添加到您的示例中,以便能够读取xml。数据=xmlParse('~/Downloads/hsa04010.xml',useInternalNodes=TRUE)
不起作用?这就是我现在拥有的:数据=xmlParse('~/Downloads/hsa04010.xml',useInternalNodes=TRUE)
;xpathApply(数据,/path/entry[@type=“gene”]//关系,xmlAttrs)
,从中我得到了某种结果,list()
,我不知道为什么会这样=/非常感谢,您使用XPath查询的第二种方法奏效了!keggraph包看起来不错,但我更喜欢学习通用的XML解决方案,xmlatrstodataframe
效果很好!
data <- xmlParse('<?xml version="1.0"?>
<!DOCTYPE pathway SYSTEM "http://www.kegg.jp/kegg/xml/KGML_v0.7.1_.dtd">
<!-- Creation date: Sep 1, 2014 12:00:13 +0900 (GMT+09:00) -->
<pathway name="path:hsa04010" org="hsa" number="04010"
title="MAPK signaling pathway"
image="http://www.kegg.jp/kegg/pathway/hsa/hsa04010.png"
link="http://www.kegg.jp/kegg-bin/show_pathway?hsa04010">
<entry id="1" name="cpd:C00338" type="compound"
link="http://www.kegg.jp/dbget-bin/www_bget?C00338">
<graphics name="C00338" fgcolor="#000000" bgcolor="#FFFFFF"
type="circle" x="138" y="743" width="8" height="8"/>
</entry>
<entry id="2" name="hsa:5923 hsa:5924" type="gene"
link="http://www.kegg.jp/dbget-bin/www_bget?hsa:5923+hsa:5924">
<graphics name="RASGRF1, CDC25, CDC25L, GNRP, GRF1, GRF55, H-GRF55, PP13187, ras-GRF1..." fgcolor="#000000" bgcolor="#BFFFBF"
type="rectangle" x="392" y="236" width="46" height="17"/>
<relation entry1="47" entry2="40" type="PPrel">
<subtype name="activation" value="-->"/>
</relation>
<relation entry1="46" entry2="40" type="PPrel">
<subtype name="activation" value="-->"/>
</relation>
<relation entry1="45" entry2="40" type="PPrel">
<subtype name="activation" value="-->"/>
</relation>
</entry>
</pathway>', asText = TRUE)
library(KEGGgraph)
url <- "http://rest.kegg.jp/get/hsa04010/kgml"
x <- parseKGML(url)
doc <- xmlParse(url)
genes <- XML:::xmlAttrsToDataFrame(doc["//entry[@type='gene']"])
relations <- XML:::xmlAttrsToDataFrame(doc["//relation"])
relations
entry1 entry2 type
1 47 40 PPrel
2 46 40 PPrel
3 45 40 PPrel
4 44 39 PPrel
5 43 38 PPrel
...