R:如何搜索XML节点

R:如何搜索XML节点,xml,r,Xml,R,我有一个XML文件,其中包含个人姓名和当前雇佣的公司,建模如下: <Indvl> <Info lastNm="Smith" firstNm="John" midNm="Patrick"/> <CrntEmps> <CrntEmp orgNm="ABC Incorporated" str1="1000 Main Street" city="Helena" state="MT" cntry="UNITED STATE

我有一个XML文件,其中包含个人姓名和当前雇佣的公司,建模如下:

 <Indvl>
      <Info lastNm="Smith" firstNm="John" midNm="Patrick"/>
      <CrntEmps>
        <CrntEmp orgNm="ABC Incorporated" str1="1000 Main Street" city="Helena" state="MT" cntry="UNITED STATES" >
        </CrntEmp>
      </CrntEmps>
</Indvl/>
 <Indvl>
      <Info lastNm="Wesley" firstNm="Jackie" midNm="Jonas"/>
      <CrntEmps>
        <CrntEmp orgNm="XYZ Incorporated" str1="1000 Main Street" city="Helena" state="MT" cntry="UNITED STATES" >
        </CrntEmp>
        <CrntEmp orgNm="Sub Contractor1" str1="1000 Some Street" city="Lincoln" state="NB" cntry="UNITED STATES" >
        </CrntEmp>
      </CrntEmps>
</Indvl/>

我感兴趣的是在一个表中提取个人的姓名信息,我能够做到这一点

我还对提取他们第一个显示的雇主感兴趣(如CrntEmp标签)。大多数病例只有一个,但有些病例有两个。但我只想榨取他们的第一个雇主:

约翰·帕特里克·史密斯:ABC公司 杰基·乔纳斯·韦斯利:XYZ公司


关于如何在R中轻松实现这一点,您有什么想法吗?非常感谢您的输入,谢谢。

如果是XML,那么使用XML库将是最简单的。示例数据的唯一问题是没有根节点。应该有一个根来正确解析数据。在这里,我将添加一个名为“root”的

我们在这里使用标准xpath表达式来查找所需的数据。在这里,我为每个人返回一个列表,如下所示

[[1]]
[[1]]$indivInfo
   lastNm   firstNm     midNm 
  "Smith"    "John" "Patrick" 

[[1]]$empInfo
             orgNm               str1               city              state 
"ABC Incorporated" "1000 Main Street"           "Helena"               "MT" 
             cntry 
   "UNITED STATES" 


[[2]]
[[2]]$indivInfo
  lastNm  firstNm    midNm 
"Wesley" "Jackie"  "Jonas" 

[[2]]$empInfo
             orgNm               str1               city              state 
"XYZ Incorporated" "1000 Main Street"           "Helena"               "MT" 
             cntry 
   "UNITED STATES" 

因此,您可以轻松地以R友好的列表形式而不是XML访问该数据。

您可能一直在使用XML包,因此假设
doc
是解析文本:

xp <- c("//Info", "//CrntEmps/CrntEmp[1]")
L <- lapply(xp, xpathSApply, doc = doc, fun = xmlAttrs) # list
t(do.call(rbind, L)) # reform into a matrix

##     lastNm   firstNm  midNm     orgNm              str1               city     state cntry          
## [1,] "Smith"  "John"   "Patrick" "ABC Incorporated" "1000 Main Street" "Helena" "MT"  "UNITED STATES"
## [2,] "Wesley" "Jackie" "Jonas"   "XYZ Incorporated" "1000 Main Street" "Helena" "MT"  "UNITED STATES"

xp请使您的数据自包含(没有根节点),并显示您使用的代码,以确保它对示例数据有效。
xpathApply(doc,"/root/Indvl", function(x) { 
    list(
        indivInfo = xpathSApply(x,"Info/@*"),
        empInfo = xpathSApply(x,"CrntEmps/CrntEmp[1]/@*")
    )   
})
[[1]]
[[1]]$indivInfo
   lastNm   firstNm     midNm 
  "Smith"    "John" "Patrick" 

[[1]]$empInfo
             orgNm               str1               city              state 
"ABC Incorporated" "1000 Main Street"           "Helena"               "MT" 
             cntry 
   "UNITED STATES" 


[[2]]
[[2]]$indivInfo
  lastNm  firstNm    midNm 
"Wesley" "Jackie"  "Jonas" 

[[2]]$empInfo
             orgNm               str1               city              state 
"XYZ Incorporated" "1000 Main Street"           "Helena"               "MT" 
             cntry 
   "UNITED STATES" 
xp <- c("//Info", "//CrntEmps/CrntEmp[1]")
L <- lapply(xp, xpathSApply, doc = doc, fun = xmlAttrs) # list
t(do.call(rbind, L)) # reform into a matrix

##     lastNm   firstNm  midNm     orgNm              str1               city     state cntry          
## [1,] "Smith"  "John"   "Patrick" "ABC Incorporated" "1000 Main Street" "Helena" "MT"  "UNITED STATES"
## [2,] "Wesley" "Jackie" "Jonas"   "XYZ Incorporated" "1000 Main Street" "Helena" "MT"  "UNITED STATES"