将数据从xml提取到R数据帧

将数据从xml提取到R数据帧,r,xml,dataframe,xml2,R,Xml,Dataframe,Xml2,我对R中的XML和xml2包相当陌生,我正在努力将XML中的数据提取到数据帧中 xml文件中的示例数据 <?xml version="1.0" encoding="utf-8"?> <mod:ModificationSet xmlns:hci="http://riziv.fgov.be/szv/HealthCareInstitution" xmlns:per="http://riziv.fgov.be/szv/Person" xmlns:xs="http://www.w3.or

我对R中的
XML
xml2
包相当陌生,我正在努力将XML中的数据提取到数据帧中

xml文件中的示例数据

<?xml version="1.0" encoding="utf-8"?>
<mod:ModificationSet xmlns:hci="http://riziv.fgov.be/szv/HealthCareInstitution" xmlns:per="http://riziv.fgov.be/szv/Person" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:pto="http://riziv.fgov.be/szv/PersonToOrganization" xmlns:org="http://riziv.fgov.be/szv/Organization" xmlns:hca="http://riziv.fgov.be/szv/HealthCareAppliance" xmlns:ati="http://riziv.fgov.be/szv/HcApplianceToHcInstitution" xmlns:p12="http://www.w3.org/2001/XMLSchema-instance" xmlns:szv="http://riziv.fgov.be/szv/BasicTypes" xmlns:hcw="http://riziv.fgov.be/szv/HealthCareWorker" xMmmlns:mod="http://riziv.fgov.be/szv/ModificationSet" xmlns:dev="http://riziv.fgov.be/szv/Device" xmlns:wti="http://riziv.fgov.be/szv/HcWorkerToHcInstitution">
  <mod:Payload>
    <mod:Modifications>
      <mod:Modification>
        <mod:Context>
          <szv:Origin>63080900</szv:Origin>
          <szv:CreationDate>2018-04-05</szv:CreationDate>
          <szv:OperationType>01</szv:OperationType>
          <szv:OperationDate>2018-04-05</szv:OperationDate>
        </mod:Context>
        <mod:HealthCareAppliance>
          <hca:Identification>
            <hca:RizivNumber>00000182</hca:RizivNumber>
          </hca:Identification>
          <hca:Device>
            <dev:DeviceNumber>30016</dev:DeviceNumber>
            <dev:DeviceType>PET-CT</dev:DeviceType>
            <dev:Model>Philips-Gemini TF Big Bore PET/CT</dev:Model>
            <dev:StartDateInvoicing>2016-06-01</dev:StartDateInvoicing>
            <dev:EndDateInvoicing p12:nil="true" />
            <dev:LocationIsAddress>false</dev:LocationIsAddress>
            <dev:IsFixedDevice>true</dev:IsFixedDevice>
            <dev:IsExtraMuros>false</dev:IsExtraMuros>
          </hca:Device>
        </mod:HealthCareAppliance>
      </mod:Modification>
      <mod:Modification>
        <mod:Context>
          <szv:Origin>63080900</szv:Origin>
          <szv:CreationDate>2018-04-05</szv:CreationDate>
          <szv:OperationType>01</szv:OperationType>
          <szv:OperationDate>2010-07-13</szv:OperationDate>
        </mod:Context>
        <mod:HealthCareAppliance>
          <hca:Identification>
            <hca:RizivNumber>00000182</hca:RizivNumber>
          </hca:Identification>
          <hca:Status>
            <hca:StatusCode>InUse</hca:StatusCode>
            <hca:StatusStartDate>2010-07-13</hca:StatusStartDate>
          </hca:Status>
        </mod:HealthCareAppliance>
      </mod:Modification>
      <mod:Modification>
        <mod:Context>
          <szv:Origin>63080900</szv:Origin>
          <szv:CreationDate>2018-04-05</szv:CreationDate>
          <szv:OperationType>01</szv:OperationType>
          <szv:OperationDate>2018-04-05</szv:OperationDate>
        </mod:Context>
        <mod:HcApplianceToHcInstitution>
          <ati:HealthCareInstitution>
            <ati:RizivNumber>71024388</ati:RizivNumber>
            <ati:InstitutionCode>710</ati:InstitutionCode>
          </ati:HealthCareInstitution>
          <ati:HealthCareAppliance>
            <ati:RizivNumber>00000182</ati:RizivNumber>
          </ati:HealthCareAppliance>
          <ati:Period>
            <szv:StartDate>2016-08-19</szv:StartDate>
            <szv:EndDate p12:nil="true" />
          </ati:Period>
        </mod:HcApplianceToHcInstitution>
      </mod:Modification>
    </mod:Modifications>
  </mod:Payload>

提前感谢你的帮助。请注意,很抱歉之前发布了重复的问题。

这是您希望实现的目标吗

library(xml2)
library(dplyr)

xmldoc <- read_xml("./Desktop/test.xml", encoding = "utf-8", as_html = FALSE)

RizivNumber <- xmldoc %>% 
               xml_find_all(".//hca:RizivNumber") %>% 
               xml_text()
#> RizivNumber
#[1] "00000182" "00000182"

DeviceNumber <- xmldoc %>% 
                xml_find_all(".//dev:DeviceNumber") %>% 
                xml_text()
#> DeviceNumber
#[1] "30016"

DeviceType <- xmldoc %>% 
              xml_find_all(".//dev:DeviceType") %>% 
              xml_text()
#> DeviceType
#[1] "PET-CT"
库(xml2)
图书馆(dplyr)
xmldoc%
xml_text()
#>里齐夫数
#[1] "00000182" "00000182"
设备编号%
xml\u find\u all(“.//dev:DeviceNumber”)%>%
xml_text()
#>设备编号
#[1] "30016"
设备类型%
xml\u find\u all(“.//dev:DeviceType”)%>%
xml_text()
#>设备类型
#[1] “PET-CT”

。。。以此类推

实际上,您的代码非常混乱。如果你发布一个可复制的例子,那就太好了。也许一些直接指向“
xml
string”的
regex
可以解决您的问题。谢谢您的帮助!您可以扩展您的解决方案以包含XML中缺少的值吗。通过对完整的XML文件集运行
xml2
包解决方案,我观察到缺少节点。我想用“NA”或Null值来填充那些缺失的值。我尝试使用if(is.null(…)方法,但它没有给出理想的输出。我猜它必须在xml_find_all命令中编码,但我找不到这样的选项。
library(xml2)
library(dplyr)

xmldoc <- read_xml("./Desktop/test.xml", encoding = "utf-8", as_html = FALSE)

RizivNumber <- xmldoc %>% 
               xml_find_all(".//hca:RizivNumber") %>% 
               xml_text()
#> RizivNumber
#[1] "00000182" "00000182"

DeviceNumber <- xmldoc %>% 
                xml_find_all(".//dev:DeviceNumber") %>% 
                xml_text()
#> DeviceNumber
#[1] "30016"

DeviceType <- xmldoc %>% 
              xml_find_all(".//dev:DeviceType") %>% 
              xml_text()
#> DeviceType
#[1] "PET-CT"