如何在<；元名称…>；使用htmlpasse和xpathsaply在html中标记_Html_Xml_R

如何在<；元名称…>；使用htmlpasse和xpathsaply在html中标记

html xml r

如何在<；元名称…>；使用htmlpasse和xpathsaply在html中标记,html,xml,r,Html,Xml,R,我有一堆网页，我想提取它们的发布日期。对于某些网页，日期在“abbr”标签中（如：abbr class=\“published\”title=\“2012-03-14T07:13:39+00:00\”>2012-03-14，7:13），我可以使用以下方法获取日期： doc=htmlpasse（URL，asText=T） xpathSApply（doc，“//abbr”，xmlValue）但对于其他网页，日期在“mega”标签中，例如：元名称=\“已创建”内容=\“2011-12-29T11

我有一堆网页，我想提取它们的发布日期。对于某些网页，日期在“abbr”标签中（如：abbr class=\“published\”title=\“2012-03-14T07:13:39+00:00\”>2012-03-14，7:13），我可以使用以下方法获取日期： doc=htmlpasse（URL，asText=T） xpathSApply（doc，“//abbr”，xmlValue）

但对于其他网页，日期在“mega”标签中，例如：
元名称=\“已创建”内容=\“2011-12-29T11:49:23+00:00\”
元名称=\“OriginalPublicationDate\”内容=\“2012/11/14 10:56:58”

我尝试了xpathsaply（doc，“//meta”，xmlValue），但没有成功

那么，我应该使用什么模式来代替“//meta”

谢谢！

以本页为例：

library(XML)
url <- "http://stackoverflow.com/questions/22342501/"
doc <- htmlParse(url, useInternalNodes=T)
names   <- doc["//meta/@name"]
content <- doc["//meta/@content"]
cbind(names,content)
#      names            content                                                                                                           
# [1,] "twitter:card"   "summary"                                                                                                         
# [2,] "twitter:domain" "stackoverflow.com"                                                                                               
# [3,] "og:type"        "website"                                                                                                         
# [4,] "og:image"       "http://cdn.sstatic.net/stackoverflow/img/apple-touch-icon@2.png?v=fde65a5a78c6"                                  
# [5,] "og:title"       "how to get information within <meta name...> tag in html using htmlParse and xpathSApply"                        
# [6,] "og:description" "I have a bunch of webpages and I want to extract their publishing dates. \nFor some webpages, the da" [truncated]
# [7,] "og:url"         "http://stackoverflow.com/questions/22342501/how-to-get-information-within-meta-name-tag-in-html-usi" [truncated]

是不是

xmlValue（…）

返回元素内容（例如，元素的文本部分）。

标记没有文本。

它以什么方式不起作用？发生了什么事？@jlhoward页面中总共有8个元标记，我对其中一个感兴趣。xpathsaply函数给了我8个（NAs）

xpathSApply(doc, "//meta",xmlValue)