Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/xml/15.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用R将重复的子节点XML融合到一个整洁的数据集中_Xml_R_Xml Parsing - Fatal编程技术网

使用R将重复的子节点XML融合到一个整洁的数据集中

使用R将重复的子节点XML融合到一个整洁的数据集中,xml,r,xml-parsing,Xml,R,Xml Parsing,我正试图从R中的各种安全控件构建一个数据mashup。我在输出CSV、JSON等的设备上取得了巨大成功,但XML真的让我大吃一惊。你很快就会发现,我并不是我想要成为的boss R开发者,但我非常感谢曾经能提供的任何帮助。下面是我试图解析的XML的简化版本 <devices> <host id="169274" persistent_id="21741"> <ip>some_IP_here</ip> <host

我正试图从R中的各种安全控件构建一个数据mashup。我在输出CSV、JSON等的设备上取得了巨大成功,但XML真的让我大吃一惊。你很快就会发现,我并不是我想要成为的boss R开发者,但我非常感谢曾经能提供的任何帮助。下面是我试图解析的XML的简化版本

 <devices>
    <host id="169274" persistent_id="21741">
      <ip>some_IP_here</ip>
      <hostname>Some_DNS_name_here </hostname>
      <netbiosname>Some_NetBios_Name_here</netbiosname>
      <hscore>663</hscore>
      <howner>4</howner>
      <assetvalue>4</assetvalue>
      <os>Unix Variant</os>
      <nbtshares/>
      <fndvuln id="534" port="80" proto="tcp"/>
      <fndvuln id="1191" port="22" proto="tcp"/>
    </host>
    <host id="169275" persistent_id="21003">
      <ip>some_IP_here</ip>
      <hostname>Some_DNS_name_here </hostname>
      <netbiosname>Some_NetBios_Name_here</netbiosname>
      <hscore>0</hscore>
      <howner>4</howner>
      <assetvalue>4</assetvalue>
      <os>OS Undetermined</os>
      <nbtshares/>
      <fndvuln id="5452" port="ip" proto="ip"/>
      <fndvuln id="5092" port="123" proto="udp"/>
      <fndvuln id="16157" port="123" proto="udp"/>
    </host>
</devices>
在最简单的层面上,解析XML和提取构建基本数据框架所需的数据没有问题。但是,我很难在解析的XML中进行迭代,并且每次fndvuln元素出现在父XML节点中时都会创建一行单独的内容

到目前为止,我猜最好是单独加载每个元素,然后在最后绑定它们。我想这将允许我使用sapply来运行fndvuln的各种实例,并创建一个单独的条目。到目前为止,我的基本结构是:

library(XML)

setwd("My_file_location_here")

xmlfile <- "vuln.xml"
xmldoc <- xmlParse(xmlfile)
vuln <-getNodeSet(xmldoc, "//host")
x <- lapply(vuln, function(x)  data.frame(host = xpathSApply(x, "." , xmlGetAttr, "id"),
                                        ip = xpathSApply(x, ".//ip", xmlValue),
                                        hostname = xpathSApply(x, ".//hostname", xmlValue),
                                        netbiosname = xpathSApply(x, ".//netbiosname", xmlValue) ))

do.call("rbind", x)

我不知道我该怎么做剩下的事。此外,因为这个设备将产生一个相当大的XML文件,所以知道如何有效地实现这一点将是我的最终目标

将fndvuln元素添加到data.frame时,主机、ip、主机名等将重复出现(请尝试
data.frame(“a”,1:3)


x您介意提及weblink吗?您需要创建第二个带有ip和端口的data.frame,然后将其合并到您已经制作的data.frame上。嗨,Chris,这一切现在都很有意义。非常感谢你的帮助。
library(XML)

setwd("My_file_location_here")

xmlfile <- "vuln.xml"
xmldoc <- xmlParse(xmlfile)
vuln <-getNodeSet(xmldoc, "//host")
x <- lapply(vuln, function(x)  data.frame(host = xpathSApply(x, "." , xmlGetAttr, "id"),
                                        ip = xpathSApply(x, ".//ip", xmlValue),
                                        hostname = xpathSApply(x, ".//hostname", xmlValue),
                                        netbiosname = xpathSApply(x, ".//netbiosname", xmlValue) ))

do.call("rbind", x)
    host           ip            hostname            netbiosname
1 169274 some_IP_here Some_DNS_name_here  Some_NetBios_Name_here
2 169275 some_IP_here Some_DNS_name_here  Some_NetBios_Name_here
x <- lapply(vuln, function(x)  data.frame(
    host = xpathSApply(x, "." , xmlGetAttr, "id"),
     ip  = xpathSApply(x, ".//ip", xmlValue),
hostname = xpathSApply(x, ".//hostname", xmlValue),
  VulnID = xpathSApply(x, ".//fndvuln" , xmlGetAttr, "id"),
   port  = xpathSApply(x, ".//fndvuln" , xmlGetAttr, "port") ))

do.call("rbind", x)
    host           ip            hostname VulnID port
1 169274 some_IP_here Some_DNS_name_here     534   80
2 169274 some_IP_here Some_DNS_name_here    1191   22
3 169275 some_IP_here Some_DNS_name_here    5452   ip
4 169275 some_IP_here Some_DNS_name_here    5092  123
5 169275 some_IP_here Some_DNS_name_here   16157  123