解析HTML文档并使用xpath获取两种模式的所有匹配项_Html_R_Xpath

解析HTML文档并使用xpath获取两种模式的所有匹配项

html r xpath

解析HTML文档并使用xpath获取两种模式的所有匹配项,html,r,xpath,Html,R,Xpath,因此，我解析了国际足联世界杯网站上的HTML代码，并希望获得所有比赛： wcup <- htmlTreeParse("http://www.fifa.com/worldcup/matches/", useInternalNodes=T) 那么，有什么方法可以同时搜索属性't-nText'和't-nText-kern'吗？或者你有其他解决办法吗？我想保持比赛的顺序不变 xpath不支持逻辑或： xpathSApply(wcup, "//span[@class='t-nText ' ||

因此，我解析了国际足联世界杯网站上的HTML代码，并希望获得所有比赛：

 wcup <- htmlTreeParse("http://www.fifa.com/worldcup/matches/", useInternalNodes=T)

那么，有什么方法可以同时搜索属性't-nText'和't-nText-kern'吗？或者你有其他解决办法吗？我想保持比赛的顺序不变

xpath不支持逻辑或：

xpathSApply(wcup, "//span[@class='t-nText ' || 't-nText kern']", xmlValue)
XPath error : Invalid expression
//span[@class='t-nText ' || 't-nText kern']
                          ^
XPath error : Invalid expression
//span[@class='t-nText ' || 't-nText kern']
                                          ^
Error in xpathApply.XMLInternalDocument(doc, path, fun, ..., namespaces = namespaces,  : 
  error evaluating xpath expression //span[@class='t-nText ' || 't-nText kern']

我最初发布了这个，然后注意到需要顺序，所以我搜索了“XPath或”

为什么不将两次搜索的结果附加在一起：

c( xpathSApply(wcup, "//span[@class='t-nText kern']", xmlValue), 
   xpathSApply(wcup, "//span[@class='t-nText ']", xmlValue)
  )

看哪，我想到了：

xpathSApply(wcup, "//*[starts-with(@class,'t-nText')]", xmlValue)

这与Martin Morgan的解决方案非常相似。我没有意识到XPath是它自己的语言。我想我至少落后于时代10年了。

使用“或”或者“以（）开头”

如果您真正需要的只是匹配数据，那么有一个Excel文件，可以与匹配一起使用（不需要XML解析：-）

c( xpathSApply(wcup, "//span[@class='t-nText kern']", xmlValue), 
   xpathSApply(wcup, "//span[@class='t-nText ']", xmlValue)
  )

xpathSApply(wcup, "//*[starts-with(@class,'t-nText')]", xmlValue)

wcup["//span[@class='t-nText kern' or @class='t-nText ']"]
wcup["//span[starts-with(@class, 't-nText ')]"]