使用XML Packager解析RSS提要

使用XML Packager解析RSS提要,xml,r,xml-parsing,Xml,R,Xml Parsing,我正在尝试抓取和解析下面的RSS提要。我已经查看了关于R和XML的其他查询,但在我的问题上没有取得任何进展。每个条目的xml代码 <item> <title><![CDATA[Five Rockets Intercepted By Iron Drone Systems Over Be'er Sheva]]></title> <link>http://www.huffingtonpost.co.uk/2

我正在尝试抓取和解析下面的RSS提要。我已经查看了关于R和XML的其他查询,但在我的问题上没有取得任何进展。每个条目的xml代码

        <item>
     <title><![CDATA[Five Rockets Intercepted By Iron Drone Systems Over Be'er Sheva]]></title>
     <link>http://www.huffingtonpost.co.uk/2012/11/15/tel-aviv-gaza-rocket_n_2138159.html#2_five-rockets-intercepted-by-iron-drone-systems-over-beer-sheva</link>
     <description><![CDATA[<a href="http://www.haaretz.com/news/diplomacy-defense/live-blog-rockets-strike-tel-aviv-area-three-israelis-killed-in-attack-on-south-1.477960" target="_hplink">Haaretz reports</a> that five more rockets intercepted by Iron Dome systems over Be'er Sheva. In total, there have been 274 rockets fired and 105 intercepted. The IDF has attacked 250 targets in Gaza.]]></description>
     <guid>http://www.huffingtonpost.co.uk/2012/11/15/tel-aviv-gaza-rocket_n_2138159.html#2_five-rockets-intercepted-by-iron-drone-systems-over-beer-sheva</guid>
     <pubDate>2012-11-15T12:56:09-05:00</pubDate>
     <source url="http://huffingtonpost.com/rss/liveblog/liveblog-1213.xml">Huffingtonpost.com</source>
  </item>

http://www.huffingtonpost.co.uk/2012/11/15/tel-aviv-gaza-rocket_n_2138159.html#2_five-铁无人机系统在比尔舍瓦上空拦截火箭
2012-11-15T12:56:09-05:00
赫芬顿邮报网站
对于每个条目/帖子,我想记录“日期”(pubDate)、“标题”(Title)、“描述”(全文)。我曾尝试在R中使用xml包,但我承认我是一个有点新手(几乎没有使用xml的经验,但有一些R经验)。我正在研究的代码,却毫无进展:

 library(XML)

 xml.url <- "http://www.huffingtonpost.com/rss/liveblog/liveblog-1213.xml"

 # Use the xmlTreePares-function to parse xml file directly from the web

 xmlfile <- xmlTreeParse(xml.url)

# Use the xmlRoot-function to access the top node

xmltop = xmlRoot(xmlfile)

xmlName(xmltop)

names( xmltop[[ 1 ]] )

  title          link   description      language     copyright 
  "title"        "link" "description"    "language"   "copyright" 
 category     generator          docs          item          item 
  "category"   "generator"        "docs"        "item"        "item"
库(XML)

xml.url我正在使用优秀的Rcurl库和xpathSApply

这个脚本提供了3个列表(标题、发布日期和描述)

库(RCurl)
库(XML)
xml.url
library(RCurl)
library(XML)
xml.url <- "http://www.huffingtonpost.com/rss/liveblog/liveblog-1213.xml"
script  <- getURL(xml.url)
doc     <- xmlParse(script)
titles    <- xpathSApply(doc,'//item/title',xmlValue)
descriptions    <- xpathSApply(doc,'//item/description',xmlValue)
pubdates <- xpathSApply(doc,'//item/pubDate',xmlValue)