加载xml";“行”;进入R数据表

加载xml";“行”;进入R数据表,xml,r,dataframe,Xml,R,Dataframe,我有一些数据是这样的: <people> <person first="Mary" last="Jane" sex="F" /> <person first="Susan" last="Smith" sex="F" height="168" /> <person last="Black" first="Joseph" sex="M" /> <person first="Jessica" last="Jones" sex="F"

我有一些数据是这样的:

<people>
  <person first="Mary" last="Jane" sex="F" />
  <person first="Susan" last="Smith" sex="F" height="168" />
  <person last="Black" first="Joseph" sex="M" />
  <person first="Jessica" last="Jones" sex="F" />
</people>
    first  last sex height
1    Mary  Jane   F     NA
2   Susan Smith   F    168
3  Joseph Black   M     NA
4 Jessica Jones   F     NA
我已经走了这么远:

library(XML)
xpeople <- xmlRoot(xmlParse(xml))
lst <- xmlApply(xpeople, xmlAttrs)
names(lst) <- 1:length(lst)
库(XML)

xpeople
txt
plyr::rbind.fill(lappy(xmlToList(txt),函数(x)as.data.frame(t(x),stringsAsFactors=FALSE))
可能稍微容易一点,但不是xml解决方案。然后,您可以使用
as.numeric(
实际上可能有用)将因子高度转换为数值。有一个函数可以将属性获取到data.frame
XML:::xmlatrstodataframe(XML[“//person”])
lst <- xmlApply(xpeople, function(node) {
  attrs = xmlAttrs(node)
  if (!("height" %in% names(attrs))) {
    attrs[["height"]] <- NA
  }
  attrs
})
df = as.data.frame(lst)
txt <- '<people>
          <person first="Mary" last="Jane" sex="F" />
          <person first="Susan" last="Smith" sex="F" height="168" />
          <person last="Black" first="Joseph" sex="M" />
          <person first="Jessica" last="Jones" sex="F" />
        </people>'
library(XML)         # for xmlTreeParse
library(data.table)  # for rbindlist(...)
xml <- xmlTreeParse(txt, asText=TRUE, useInternalNodes = TRUE)
rbindlist(lapply(xml["//person"],function(x)as.list(xmlAttrs(x))),fill=TRUE)
#      first  last sex height
# 1:    Mary  Jane   F     NA
# 2:   Susan Smith   F    168
# 3:  Joseph Black   M     NA
# 4: Jessica Jones   F     NA