R 使用xpath进行指定的表提取

R 使用xpath进行指定的表提取,r,xpath,R,Xpath,我想从web中提取一个表 也不是 xmltable <- xpathApply(xmltext, "//table[//tbody//tr//th//a[@title='CONCACAF Gold Cup']]") xmltable您必须使用。来获取xpath中的父元素://table[@class='wikitable']///th//a[@title='CONCACAF Gold Cup']/../../../../..//code> 要获取表,可以使用XML::readHTMLTa

我想从web中提取一个表

也不是

xmltable <- xpathApply(xmltext, "//table[//tbody//tr//th//a[@title='CONCACAF Gold Cup']]")

xmltable您必须使用
来获取xpath中的父元素:
//table[@class='wikitable']///th//a[@title='CONCACAF Gold Cup']/../../../../..//code>

要获取表,可以使用
XML::readHTMLTable

library(XML)
baseURL <- "http://en.wikipedia.org/wiki/Brazil_national_football_team"
xmltext <- htmlParse(baseURL)

## grep correct table
tableNode <- xpathApply(xmltext, "//table[@class='wikitable']//th//a[@title='CONCACAF Gold Cup']/../../..")[[1]]

## convert XMLNode into data.frame
concacafTable <- readHTMLTable(tableNode, header=FALSE, stringsAsFactors=FALSE)

## format table (remove useless "Gold Cup"-header (row 1) and set right header (row 2)
colnames(concacafTable) <- concacafTable[2, ]
concacafTable <- concacafTable[-c(1,2),]
concacafTable
#   Year       Round GP W D L GF GA
#3  1996  Runners-up  4 3 0 1 10  3
#4  1998 Third Place  5 2 2 1  6  2
#5  2003  Runners-up  5 3 0 2  6  4                                                 
#6 Total        3/11 14 8 2 4 22  9
库(XML)

baseURL我也发现两位秘书在解析网页

1.谁也不知道

tableNode <- xpathApply(xmltext, "//tbody") 

tableNode可能重复的是的,这是一个很好的例子,我想很好地理解,我的理想与那个不一样。
library(XML)
baseURL <- "http://en.wikipedia.org/wiki/Brazil_national_football_team"
xmltext <- htmlParse(baseURL)

## grep correct table
tableNode <- xpathApply(xmltext, "//table[@class='wikitable']//th//a[@title='CONCACAF Gold Cup']/../../..")[[1]]

## convert XMLNode into data.frame
concacafTable <- readHTMLTable(tableNode, header=FALSE, stringsAsFactors=FALSE)

## format table (remove useless "Gold Cup"-header (row 1) and set right header (row 2)
colnames(concacafTable) <- concacafTable[2, ]
concacafTable <- concacafTable[-c(1,2),]
concacafTable
#   Year       Round GP W D L GF GA
#3  1996  Runners-up  4 3 0 1 10  3
#4  1998 Third Place  5 2 2 1  6  2
#5  2003  Runners-up  5 3 0 2  6  4                                                 
#6 Total        3/11 14 8 2 4 22  9
tableNode <- xpathApply(xmltext, "//tbody") 
tableNode <- xpathApply(xmltext, "//table[@class='wikitable'][./tr/th/a[@title='CONCACAF Gold Cup']]") can work too.