Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/xml/13.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/83.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Xml xpath和r-创建键表_Xml_R_Xpath_Apply - Fatal编程技术网

Xml xpath和r-创建键表

Xml xpath和r-创建键表,xml,r,xpath,apply,Xml,R,Xpath,Apply,我不熟悉r的xml包,也不熟悉xpath。我正在解析一个非常大的xml文件。我使用循环编写了一些代码,但循环很长,所以我使用xpath编写了更高效的代码。 xml如下所示: ... <person personId="1"> <personNames> <personName nameId="1000"> <first>Joe<last> <last>Jones<last> </personName>

我不熟悉r的xml包,也不熟悉xpath。我正在解析一个非常大的xml文件。我使用循环编写了一些代码,但循环很长,所以我使用xpath编写了更高效的代码。 xml如下所示:

...
<person personId="1">
<personNames>
<personName nameId="1000">
<first>Joe<last>
<last>Jones<last>
</personName>
<personName nameId="1001">
<first>Joseph><first>
<last>Jones<last>
</personName>
<personName nameId="1002"
<first>The One and only Joe<first>
</personName>
</personNames>
</person>
...
nameId  first                  last                  personId
1000    Joe                    Jones                 1
1001    Joseph                 Jones                 1
1002    The one and only Joe   <NA>                  1
我希望结果是1。在personId的数据框中添加列的最有效方法是什么

鉴于上述示例,我想要一个如下所示的数据帧:

...
<person personId="1">
<personNames>
<personName nameId="1000">
<first>Joe<last>
<last>Jones<last>
</personName>
<personName nameId="1001">
<first>Joseph><first>
<last>Jones<last>
</personName>
<personName nameId="1002"
<first>The One and only Joe<first>
</personName>
</personNames>
</person>
...
nameId  first                  last                  personId
1000    Joe                    Jones                 1
1001    Joseph                 Jones                 1
1002    The one and only Joe   <NA>                  1
nameId第一个最后一个personId
1000乔·琼斯1
1001约瑟夫·琼斯1
1002唯一的乔1

因为名字和姓氏是不平衡的,所以您需要更加小心地匹配它们,然后一次提取它们

下面是一些有效的测试数据

library(XML)
dd<-xmlInternalTreeParse('<people><person personId="1">
<personNames>
<personName nameId="1000"><first>Joe</first><last>Jones</last></personName>
<personName nameId="1001"><first>Joseph</first><last>Jones</last></personName>
<personName nameId="1002"><first>The One and only Joe</first></personName>
</personNames>
</person></people>')
那我就可以了

rbind.fill(xpathApply(dd, "//person", function(x) {
    pn <- xpathApply(x, "./personNames/personName", function(x) {
        data.frame(
            nameId=xmlGetAttr(x, "nameId"), 
            first=getXmlValue(x, "first"), 
            last=getXmlValue(x,"last"))
    })
    cbind(personID=xmlGetAttr(x, "personId"), rbind.fill(pn))
}))
rbind.fill(xpathApply(dd,//person),函数(x){

pn下面的内容有点复杂;它的灵感来自于创建许多单行data.frames,然后将它们绑定在一起的成本。我不知道这是否更有效(获得反馈会很有趣…)

在第一个过程中,我记录事件发生时的“几何图形”

geom <- xpathSApply(dd, "//person|//personName|//first|//last", xmlName)

你能提供你想要的样本输入数据的最终结果吗?我不确定你想要的输出的确切形式。编辑完成。谢谢你的时间。
  personID nameId                first  last
1        1   1000                  Joe Jones
2        1   1001               Joseph Jones
3        1   1002 The One and only Joe  <NA>
geom <- xpathSApply(dd, "//person|//personName|//first|//last", xmlName)
## hack: implement XMLAttributeValue method for xmlValue
xmlValue.XMLAttributeValue <- as.character
nms <- xpathSApply(dd, 
    "//person/@personId|//personName/@nameId|//first|//last", xmlValue)
cols <- c(nameId="personName", first="first", last="last")
pidx = geom == "person"
ridx = cumsum(geom == "personName")
cidx <- match(geom, cols, 0)

## fill matrix with leaf nodes
m <- matrix(character(), max(ridx), max(cidx), 
            dimnames=list(NULL, names(cols)))
m[cbind(ridx, cidx)] <- nms[!pidx]

## 'expand' parent elements and bind to matrix
times <- diff(c(ridx[pidx], max(ridx)))
m <- cbind(personId=rep(nms[pidx], times), m)
> m
     personId nameId first                  last   
[1,] "1"      "1000" "Joe"                  "Jones"
[2,] "1"      "1001" "Joseph"               "Jones"
[3,] "1"      "1002" "The One and only Joe" NA