Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/64.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/xml/12.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在R中,如何将XML节点值与公共父节点配对?_R_Xml_Xml Parsing - Fatal编程技术网

在R中,如何将XML节点值与公共父节点配对?

在R中,如何将XML节点值与公共父节点配对?,r,xml,xml-parsing,R,Xml,Xml Parsing,我有以下示例XML: <body> <div class="row"> <div class="column"> <span class="title">Color</span> </div> <div class="column property">Blue</div> </div> <div class="row">

我有以下示例XML:

<body>
  <div class="row">
    <div class="column">
      <span class="title">Color</span>
    </div>
    <div class="column property">Blue</div>
  </div> 
  <div class="row">
    <div class="column">
      <span class="title">Shape</span>
    </div>
    <div class="column property">Square</div>
  </div> 
</body>
我尝试了以下脚本,但标题周围有XML标记,并且缺少属性:

library(XML)

getDetails <- function(id) {
  html <- htmlTreeParse( "exampleXML.html" ,useInternal = TRUE)
  xpathSApply( html , "//div[@class='row']" , function(row) { 
    print( xmlElementsByTagName(row, "span", recursive = TRUE) )
  })
}

getDetails()
库(XML)
获取详细信息%extract\u信息
UseMethod(“xml\u find\u all”)中出错: 没有适用于“c”类对象的“xml_find_all”方法(“HTMLInternalDocument”、“HTMLInternalDocument”、“XMLInternalDocument”、“XMLAbstractDocument”)


使用
xml2
可以执行以下操作:

library(xml2)     #to install use: install.packages("xml2")
library(magrittr) #to install use: install.packages("magrittr")

extract_info <- function(x){
  title <- x %>% xml_find_first(".//span[@class='title']") %>% xml_text
  property <- x %>% xml_find_first(".//div[@class='column property']") %>% xml_text
  setNames(property, title)
}

html <- read_xml( "exampleXML.html" )
html %>% xml_find_all("//div[@class='row']") %>% extract_info

使用
xml2
可以执行以下操作:

library(xml2)     #to install use: install.packages("xml2")
library(magrittr) #to install use: install.packages("magrittr")

extract_info <- function(x){
  title <- x %>% xml_find_first(".//span[@class='title']") %>% xml_text
  property <- x %>% xml_find_first(".//div[@class='column property']") %>% xml_text
  setNames(property, title)
}

html <- read_xml( "exampleXML.html" )
html %>% xml_find_all("//div[@class='row']") %>% extract_info

如果XML格式正确(即元素顺序不变),则可以执行以下操作:

library(xml2)
library(purrr)

doc <- read_xml(txt)

vals <- xml_text(xml_find_all(doc, ".//*[@class='title' or @class='column property']"))
map_chr(seq(1, length(vals), by=2), ~sprintf("%s = %s", vals[.], vals[.+1])) %>% 
  cat(sep="\n")
库(xml2)
图书馆(purrr)

doc如果XML格式正确(即元素顺序不变),则可以执行以下操作:

library(xml2)
library(purrr)

doc <- read_xml(txt)

vals <- xml_text(xml_find_all(doc, ".//*[@class='title' or @class='column property']"))
map_chr(seq(1, length(vals), by=2), ~sprintf("%s = %s", vals[.], vals[.+1])) %>% 
  cat(sep="\n")
库(xml2)
图书馆(purrr)

doc考虑使用嵌套的
xpathsaply()
,其中外部循环跨行迭代,以解析每行标题和属性的对应值:


考虑使用嵌套的
xpathsaply()
,其中外部循环跨行迭代,以解析每行标题和属性的对应值:


我发现
错误:找不到函数“%>%”
。我试图从RStudio>Tools>install Packages…>xml2。还将
库(xml2)
添加到脚本的开头。不走运。我在我的问题上加了“也不走运:”一节。你能看一下并更新一下你的答案吗?你自己想出来的。对不起,我的回答有一半是完整的。诀窍是通过
read_xml
读取文件(与
rvest
不同,
xml2
更方便的包装器)
xml2
不导入
magrittr::%%>%%
为默认值。我得到
错误:找不到函数“%%>”
。我试图从RStudio>Tools>install Packages…>xml2。还将
库(xml2)
添加到脚本的开头。不走运。我在我的问题上加了“也不走运:”一节。你能看一下并更新一下你的答案吗?你自己想出来的。对不起,我的回答有一半是完整的。诀窍是通过
read_xml
读取文件,并且(与
rvest
不同,
xml2
更方便的包装器)默认情况下
xml2
不会导入
magrittr::%%>%
library(xml2)
library(purrr)

doc <- read_xml(txt)

vals <- xml_text(xml_find_all(doc, ".//*[@class='title' or @class='column property']"))
map_chr(seq(1, length(vals), by=2), ~sprintf("%s = %s", vals[.], vals[.+1])) %>% 
  cat(sep="\n")
library(XML)

example_html <- paste0('<body>',
                   '  <div class="row">',
                   '    <div class="column">',
                   '       <span class="title">Color</span>',
                   '    </div>',
                   '    <div class="column property">Blue</div>',
                   '  </div>',
                   '  <div class="row">',
                   '    <div class="column">',
                   '       <span class="title">Shape</span>',
                   '    </div>',
                   '    <div class="column property">Square</div>',
                   '  </div>', 
                   '</body>')

doc <- htmlTreeParse(example_html, useInternal = TRUE)

columns <- xpathSApply(doc, "//div[@class='row']", function(row){
   title <- xpathSApply(row, "div[@class='column']/span", xmlValue)
   property <- xpathSApply(row, "div[@class='column property']", xmlValue)
   setNames(gsub(" ", "", property), gsub(" ", "", title))    # GSUB TO STRIP WHITESPACE
})

columns <- setNames(property, title)
columns
#  Color    Shape 
#  "Blue" "Square" 
title <- xpathSApply(doc, "//div[@class='column']/span", xmlValue)
property <- xpathSApply(doc, "//div[@class='column property']", xmlValue)

columns <- setNames(property, title)
columns
#   Color    Shape 
#  "Blue" "Square"