R&;XML2:用NA替换缺少的XML元素
我使用R&;XML2:用NA替换缺少的XML元素,xml,r,xpath,Xml,R,Xpath,我使用XML2从在线XML文档中提取发布数据,比如一个文档,代码如下: xF <- read_xml(target, encoding = "UTF-8") ## target = above link xF使用xml2::xml\u find\u first() 例子: 假设我们需要来自此xml rss提要的博客文章类别: . 有些帖子有一个类别,有些帖子不止一个类别。搜索一个就可以了: feed <- "https://eagereyes.org/feed" doc <
XML2
从在线XML文档中提取发布数据,比如一个文档,代码如下:
xF <- read_xml(target, encoding = "UTF-8") ## target = above link
xF使用xml2::xml\u find\u first()
例子:
假设我们需要来自此xml rss提要的博客文章类别:
. 有些帖子有一个类别,有些帖子不止一个类别。搜索一个就可以了:
feed <- "https://eagereyes.org/feed"
doc <- httr::GET(feed) %>% xml2::read_xml()
channel <- xml2::xml_find_all(doc, "channel")
site <- xml2::xml_find_all(channel, "item")
categories <- tibble::tibble(
category1 = xml2::xml_text(xml2::xml_find_all(site, "category[1]"))
)
> categories
# A tibble: 10 x 1
category1
<chr>
1 Papers
2 Blog 2017
3 Links
4 Blog 2017
5 Blog 2017
6 Talk
7 ISOTYPE Books
8 Techniques
9 Basics
10 Blog 2017
希望有帮助 什么是“XML2”?我没有听说过任何这样的事情。@DimitreNovatchev,是一个R包,可在CRAN上使用。@DimitreNovatchev你是在巧妙地说(?)我应该回到经过验证的真正的XML包吗?@J.M.S.,不,我只是想知道XML是否已经过时了。2.0已作为W3C的建议出现。我知道的最新版本是1.1汉克斯!这看起来是正确的,即使有点晚了我很久以前就用Python完成了这项工作。啊,好吧!也许它会帮助有类似问题的人:-)
Peer.Rev <- xml_text(xml_find_all(xF, "//extensions-core:peerReviewed", xml_ns(xF)))
feed <- "https://eagereyes.org/feed"
doc <- httr::GET(feed) %>% xml2::read_xml()
channel <- xml2::xml_find_all(doc, "channel")
site <- xml2::xml_find_all(channel, "item")
categories <- tibble::tibble(
category1 = xml2::xml_text(xml2::xml_find_all(site, "category[1]"))
)
> categories
# A tibble: 10 x 1
category1
<chr>
1 Papers
2 Blog 2017
3 Links
4 Blog 2017
5 Blog 2017
6 Talk
7 ISOTYPE Books
8 Techniques
9 Basics
10 Blog 2017
categories <- tibble::tibble(
category1 = xml2::xml_text(xml2::xml_find_all(site, "category[1]")),
category2 = xml2::xml_text(xml2::xml_find_all(site, "category[2]"))
)
Error: Column `category2` must be length 1 or 10, not 3
categories <- tibble::tibble(
category1 = xml2::xml_text(xml2::xml_find_first(site, "category[1]")),
category2 = xml2::xml_text(xml2::xml_find_first(site, "category[2]"))
)
> categories
# A tibble: 10 x 2
category1 category2
<chr> <chr>
1 Papers paper
2 Blog 2017 conference
3 Links <NA>
4 Blog 2017 <NA>
5 Blog 2017 <NA>
6 Talk <NA>
7 ISOTYPE Books isotype
8 Techniques <NA>
9 Basics <NA>
10 Blog 2017 <NA>