R:在R中解析XML
我有一个如下的XML文件R:在R中解析XML,r,xml,parsing,R,Xml,Parsing,我有一个如下的XML文件 <?xml version="1.0" encoding="ISO-8859-1" standalone="no"?> <t:Forecast xmlns:t="http://example.com"> <Sender Abbreviation="abc" Name="xyz"/> <Recipient Abbreviation="efg" Name="cba"/> <create
<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<t:Forecast xmlns:t="http://example.com">
<Sender Abbreviation="abc" Name="xyz"/>
<Recipient Abbreviation="efg" Name="cba"/>
<createdUTC>2017-11-24T10:41:11Z</createdUTC>
<MessageID>bcjs</MessageID>
<SystemState>test</SystemState>
<ForecastData>
<DataHeader GroupKey="rkolo">
<Timeseries ID="abc123">
<TimeInt ISTUTC="2017-11-24T10:45:00Z" Out="858"/>
<TimeInt ISTUTC="2017-11-24T11:45:00Z" Out="868"/>
</Timeseries>
<Timeseries ID="xyz">
<TimeInt ISTUTC="2017-11-24T10:45:00Z" Out="870"/>
<TimeInt ISTUTC="2017-11-24T11:45:00Z" Out="890"/>
</Timeseries>
</ForecastData>
</t:Forecast>
另一个数据帧如图所示
TimeInt out
2017-11-24T10:45:00Z 870
2017-11-24T11:45:00Z 890
到目前为止,我已经做了以下工作:
require(XML)
temp = xmlParse("datafile.xml")
data = xmlToList(temp)
但是数据的输出
包含许多嵌套列表。如何获取数据帧
编辑1:changed
out
值考虑三重冒号方法xmlatrstodataframe
,但循环遍历时间序列的每个节点索引,甚至使用相应的时间序列id命名每个元素
再次感谢
Out
不会像那样作为组指示符配对。
require(XML)
temp = xmlParse("datafile.xml")
data = xmlToList(temp)
library(XML)
txt='<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<t:Forecast xmlns:t="http://example.com">
<Sender Abbreviation="abc" Name="xyz"/>
<Recipient Abbreviation="efg" Name="cba"/>
<createdUTC>2017-11-24T10:41:11Z</createdUTC>
<MessageID>bcjs</MessageID>
<SystemState>test</SystemState>
<ForecastData>
<DataHeader GroupKey="rkolo"/>
<Timeseries ID="abc123">
<TimeInt ISTUTC="2017-11-24T10:45:00Z" Out="858"/>
<TimeInt ISTUTC="2017-11-24T11:45:00Z" Out="858"/>
</Timeseries>
<Timeseries ID="xyz">
<TimeInt ISTUTC="2017-11-24T10:45:00Z" Out="870"/>
<TimeInt ISTUTC="2017-11-24T11:45:00Z" Out="870"/>
</Timeseries>
</ForecastData>
</t:Forecast>'
doc <- xmlParse(txt)
dfList <- lapply(1:length(xpathSApply(doc, "//Timeseries", xmlAttrs)), function(i)
XML:::xmlAttrsToDataFrame(getNodeSet(doc, path=paste0('//Timeseries[',i,']/TimeInt')))
)
dfList <- setNames(dfList, xpathSApply(doc, path='//Timeseries', xmlAttrs))
dfList
dfList$abc123
# ISTUTC Out
# 1 2017-11-24T10:45:00Z 858
# 2 2017-11-24T11:45:00Z 858
dfList$xyz
# ISTUTC Out
# 3 2017-11-24T10:45:00Z 870
# 4 2017-11-24T11:45:00Z 870