R:如何搜索XML节点
我有一个XML文件,其中包含个人姓名和当前雇佣的公司,建模如下:R:如何搜索XML节点,xml,r,Xml,R,我有一个XML文件,其中包含个人姓名和当前雇佣的公司,建模如下: <Indvl> <Info lastNm="Smith" firstNm="John" midNm="Patrick"/> <CrntEmps> <CrntEmp orgNm="ABC Incorporated" str1="1000 Main Street" city="Helena" state="MT" cntry="UNITED STATE
<Indvl>
<Info lastNm="Smith" firstNm="John" midNm="Patrick"/>
<CrntEmps>
<CrntEmp orgNm="ABC Incorporated" str1="1000 Main Street" city="Helena" state="MT" cntry="UNITED STATES" >
</CrntEmp>
</CrntEmps>
</Indvl/>
<Indvl>
<Info lastNm="Wesley" firstNm="Jackie" midNm="Jonas"/>
<CrntEmps>
<CrntEmp orgNm="XYZ Incorporated" str1="1000 Main Street" city="Helena" state="MT" cntry="UNITED STATES" >
</CrntEmp>
<CrntEmp orgNm="Sub Contractor1" str1="1000 Some Street" city="Lincoln" state="NB" cntry="UNITED STATES" >
</CrntEmp>
</CrntEmps>
</Indvl/>
我感兴趣的是在一个表中提取个人的姓名信息,我能够做到这一点
我还对提取他们第一个显示的雇主感兴趣(如CrntEmp标签)。大多数病例只有一个,但有些病例有两个。但我只想榨取他们的第一个雇主:
约翰·帕特里克·史密斯:ABC公司
杰基·乔纳斯·韦斯利:XYZ公司
关于如何在R中轻松实现这一点,您有什么想法吗?非常感谢您的输入,谢谢。如果是XML,那么使用XML库将是最简单的。示例数据的唯一问题是没有根节点。应该有一个根来正确解析数据。在这里,我将添加一个名为“root”的 我们在这里使用标准xpath表达式来查找所需的数据。在这里,我为每个人返回一个列表,如下所示
[[1]]
[[1]]$indivInfo
lastNm firstNm midNm
"Smith" "John" "Patrick"
[[1]]$empInfo
orgNm str1 city state
"ABC Incorporated" "1000 Main Street" "Helena" "MT"
cntry
"UNITED STATES"
[[2]]
[[2]]$indivInfo
lastNm firstNm midNm
"Wesley" "Jackie" "Jonas"
[[2]]$empInfo
orgNm str1 city state
"XYZ Incorporated" "1000 Main Street" "Helena" "MT"
cntry
"UNITED STATES"
因此,您可以轻松地以R友好的列表形式而不是XML访问该数据。您可能一直在使用XML包,因此假设
doc
是解析文本:
xp <- c("//Info", "//CrntEmps/CrntEmp[1]")
L <- lapply(xp, xpathSApply, doc = doc, fun = xmlAttrs) # list
t(do.call(rbind, L)) # reform into a matrix
## lastNm firstNm midNm orgNm str1 city state cntry
## [1,] "Smith" "John" "Patrick" "ABC Incorporated" "1000 Main Street" "Helena" "MT" "UNITED STATES"
## [2,] "Wesley" "Jackie" "Jonas" "XYZ Incorporated" "1000 Main Street" "Helena" "MT" "UNITED STATES"
xp请使您的数据自包含(没有根节点),并显示您使用的代码,以确保它对示例数据有效。
xpathApply(doc,"/root/Indvl", function(x) {
list(
indivInfo = xpathSApply(x,"Info/@*"),
empInfo = xpathSApply(x,"CrntEmps/CrntEmp[1]/@*")
)
})
[[1]]
[[1]]$indivInfo
lastNm firstNm midNm
"Smith" "John" "Patrick"
[[1]]$empInfo
orgNm str1 city state
"ABC Incorporated" "1000 Main Street" "Helena" "MT"
cntry
"UNITED STATES"
[[2]]
[[2]]$indivInfo
lastNm firstNm midNm
"Wesley" "Jackie" "Jonas"
[[2]]$empInfo
orgNm str1 city state
"XYZ Incorporated" "1000 Main Street" "Helena" "MT"
cntry
"UNITED STATES"
xp <- c("//Info", "//CrntEmps/CrntEmp[1]")
L <- lapply(xp, xpathSApply, doc = doc, fun = xmlAttrs) # list
t(do.call(rbind, L)) # reform into a matrix
## lastNm firstNm midNm orgNm str1 city state cntry
## [1,] "Smith" "John" "Patrick" "ABC Incorporated" "1000 Main Street" "Helena" "MT" "UNITED STATES"
## [2,] "Wesley" "Jackie" "Jonas" "XYZ Incorporated" "1000 Main Street" "Helena" "MT" "UNITED STATES"