在R中解析XML,遇到不同的行错误
我想这个问题以前可能有人问过,但经过研究,我什么也找不到。我不熟悉解析XML文档。我正在尝试解析如下所示的XML页面:在R中解析XML,遇到不同的行错误,r,xml,R,Xml,我想这个问题以前可能有人问过,但经过研究,我什么也找不到。我不熟悉解析XML文档。我正在尝试解析如下所示的XML页面: schedule = xmlParse("MYXML.XML") # here's what schedule looks like <all-games> <game-schedule> <team name="Knicks"> <outcome winner="OtherTeam"> </gam
schedule = xmlParse("MYXML.XML")
# here's what schedule looks like
<all-games>
<game-schedule>
<team name="Knicks">
<outcome winner="OtherTeam">
</game-schedule>
<game-schedule>
<team name="Lakers">
<outcome winner="HomeTeam">
</game-schedule>
<game-schedule>
<team name="Celtics">
</game-schedule>
</all-games>
# here's my code to parse the XML
my_df = data.frame(
team = sapply(schedule["//game-schedule/team/@name"], as, "character"),
winner = sapply(schedule["//game-schedule/outcome/@winner"], as, "character")
)
我想解析数据帧,这样丢失的子项就可以简单地作为NA填充。也就是说,我正在尝试获取以下数据帧:
my_df
team winner
1 Knicks OtherTeam
2 Lakers HomeTeam
3 Celtics NA
NA在XML文档中反映出游戏尚未进行。如果缺少标记,您需要一个可以返回NA的包装器,类似下面的
xpath2
的xpathsaply
。然后获取节点并在当前节点的任何位置应用xpath2
。//”
xpath2
my_df
team winner
1 Knicks OtherTeam
2 Lakers HomeTeam
3 Celtics NA
xpath2 <-function(x, ...){
y <- xpathSApply(x, ...)
ifelse(length(y) == 0, NA, paste(y, collapse=", "))
}
nd <- getNodeSet(schedule, "//game-schedule")
data.frame(
team = sapply(nd, xpath2, ".//team", xmlGetAttr, "name"),
winner = sapply(nd, xpath2, ".//outcome", xmlGetAttr, "winner")
)
team winner
1 Knicks OtherTeam
2 Lakers HomeTeam
3 Celtics <NA>