将数据从xml提取到R数据帧
我对R中的将数据从xml提取到R数据帧,r,xml,dataframe,xml2,R,Xml,Dataframe,Xml2,我对R中的XML和xml2包相当陌生,我正在努力将XML中的数据提取到数据帧中 xml文件中的示例数据 <?xml version="1.0" encoding="utf-8"?> <mod:ModificationSet xmlns:hci="http://riziv.fgov.be/szv/HealthCareInstitution" xmlns:per="http://riziv.fgov.be/szv/Person" xmlns:xs="http://www.w3.or
XML
和xml2
包相当陌生,我正在努力将XML中的数据提取到数据帧中
xml文件中的示例数据
<?xml version="1.0" encoding="utf-8"?>
<mod:ModificationSet xmlns:hci="http://riziv.fgov.be/szv/HealthCareInstitution" xmlns:per="http://riziv.fgov.be/szv/Person" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:pto="http://riziv.fgov.be/szv/PersonToOrganization" xmlns:org="http://riziv.fgov.be/szv/Organization" xmlns:hca="http://riziv.fgov.be/szv/HealthCareAppliance" xmlns:ati="http://riziv.fgov.be/szv/HcApplianceToHcInstitution" xmlns:p12="http://www.w3.org/2001/XMLSchema-instance" xmlns:szv="http://riziv.fgov.be/szv/BasicTypes" xmlns:hcw="http://riziv.fgov.be/szv/HealthCareWorker" xMmmlns:mod="http://riziv.fgov.be/szv/ModificationSet" xmlns:dev="http://riziv.fgov.be/szv/Device" xmlns:wti="http://riziv.fgov.be/szv/HcWorkerToHcInstitution">
<mod:Payload>
<mod:Modifications>
<mod:Modification>
<mod:Context>
<szv:Origin>63080900</szv:Origin>
<szv:CreationDate>2018-04-05</szv:CreationDate>
<szv:OperationType>01</szv:OperationType>
<szv:OperationDate>2018-04-05</szv:OperationDate>
</mod:Context>
<mod:HealthCareAppliance>
<hca:Identification>
<hca:RizivNumber>00000182</hca:RizivNumber>
</hca:Identification>
<hca:Device>
<dev:DeviceNumber>30016</dev:DeviceNumber>
<dev:DeviceType>PET-CT</dev:DeviceType>
<dev:Model>Philips-Gemini TF Big Bore PET/CT</dev:Model>
<dev:StartDateInvoicing>2016-06-01</dev:StartDateInvoicing>
<dev:EndDateInvoicing p12:nil="true" />
<dev:LocationIsAddress>false</dev:LocationIsAddress>
<dev:IsFixedDevice>true</dev:IsFixedDevice>
<dev:IsExtraMuros>false</dev:IsExtraMuros>
</hca:Device>
</mod:HealthCareAppliance>
</mod:Modification>
<mod:Modification>
<mod:Context>
<szv:Origin>63080900</szv:Origin>
<szv:CreationDate>2018-04-05</szv:CreationDate>
<szv:OperationType>01</szv:OperationType>
<szv:OperationDate>2010-07-13</szv:OperationDate>
</mod:Context>
<mod:HealthCareAppliance>
<hca:Identification>
<hca:RizivNumber>00000182</hca:RizivNumber>
</hca:Identification>
<hca:Status>
<hca:StatusCode>InUse</hca:StatusCode>
<hca:StatusStartDate>2010-07-13</hca:StatusStartDate>
</hca:Status>
</mod:HealthCareAppliance>
</mod:Modification>
<mod:Modification>
<mod:Context>
<szv:Origin>63080900</szv:Origin>
<szv:CreationDate>2018-04-05</szv:CreationDate>
<szv:OperationType>01</szv:OperationType>
<szv:OperationDate>2018-04-05</szv:OperationDate>
</mod:Context>
<mod:HcApplianceToHcInstitution>
<ati:HealthCareInstitution>
<ati:RizivNumber>71024388</ati:RizivNumber>
<ati:InstitutionCode>710</ati:InstitutionCode>
</ati:HealthCareInstitution>
<ati:HealthCareAppliance>
<ati:RizivNumber>00000182</ati:RizivNumber>
</ati:HealthCareAppliance>
<ati:Period>
<szv:StartDate>2016-08-19</szv:StartDate>
<szv:EndDate p12:nil="true" />
</ati:Period>
</mod:HcApplianceToHcInstitution>
</mod:Modification>
</mod:Modifications>
</mod:Payload>
提前感谢你的帮助。请注意,很抱歉之前发布了重复的问题。
这是您希望实现的目标吗
library(xml2)
library(dplyr)
xmldoc <- read_xml("./Desktop/test.xml", encoding = "utf-8", as_html = FALSE)
RizivNumber <- xmldoc %>%
xml_find_all(".//hca:RizivNumber") %>%
xml_text()
#> RizivNumber
#[1] "00000182" "00000182"
DeviceNumber <- xmldoc %>%
xml_find_all(".//dev:DeviceNumber") %>%
xml_text()
#> DeviceNumber
#[1] "30016"
DeviceType <- xmldoc %>%
xml_find_all(".//dev:DeviceType") %>%
xml_text()
#> DeviceType
#[1] "PET-CT"
库(xml2)
图书馆(dplyr)
xmldoc%
xml_text()
#>里齐夫数
#[1] "00000182" "00000182"
设备编号%
xml\u find\u all(“.//dev:DeviceNumber”)%>%
xml_text()
#>设备编号
#[1] "30016"
设备类型%
xml\u find\u all(“.//dev:DeviceType”)%>%
xml_text()
#>设备类型
#[1] “PET-CT”
。。。以此类推实际上,您的代码非常混乱。如果你发布一个可复制的例子,那就太好了。也许一些直接指向“
xml
string”的regex
可以解决您的问题。谢谢您的帮助!您可以扩展您的解决方案以包含XML中缺少的值吗。通过对完整的XML文件集运行xml2
包解决方案,我观察到缺少节点。我想用“NA”或Null值来填充那些缺失的值。我尝试使用if(is.null(…)方法,但它没有给出理想的输出。我猜它必须在xml_find_all命令中编码,但我找不到这样的选项。
library(xml2)
library(dplyr)
xmldoc <- read_xml("./Desktop/test.xml", encoding = "utf-8", as_html = FALSE)
RizivNumber <- xmldoc %>%
xml_find_all(".//hca:RizivNumber") %>%
xml_text()
#> RizivNumber
#[1] "00000182" "00000182"
DeviceNumber <- xmldoc %>%
xml_find_all(".//dev:DeviceNumber") %>%
xml_text()
#> DeviceNumber
#[1] "30016"
DeviceType <- xmldoc %>%
xml_find_all(".//dev:DeviceType") %>%
xml_text()
#> DeviceType
#[1] "PET-CT"