R 使用xpath进行指定的表提取
我想从web中提取一个表 也不是R 使用xpath进行指定的表提取,r,xpath,R,Xpath,我想从web中提取一个表 也不是 xmltable <- xpathApply(xmltext, "//table[//tbody//tr//th//a[@title='CONCACAF Gold Cup']]") xmltable您必须使用。来获取xpath中的父元素://table[@class='wikitable']///th//a[@title='CONCACAF Gold Cup']/../../../../..//code> 要获取表,可以使用XML::readHTMLTa
xmltable <- xpathApply(xmltext, "//table[//tbody//tr//th//a[@title='CONCACAF Gold Cup']]")
xmltable您必须使用。
来获取xpath中的父元素://table[@class='wikitable']///th//a[@title='CONCACAF Gold Cup']/../../../../..//code>
要获取表,可以使用XML::readHTMLTable
:
library(XML)
baseURL <- "http://en.wikipedia.org/wiki/Brazil_national_football_team"
xmltext <- htmlParse(baseURL)
## grep correct table
tableNode <- xpathApply(xmltext, "//table[@class='wikitable']//th//a[@title='CONCACAF Gold Cup']/../../..")[[1]]
## convert XMLNode into data.frame
concacafTable <- readHTMLTable(tableNode, header=FALSE, stringsAsFactors=FALSE)
## format table (remove useless "Gold Cup"-header (row 1) and set right header (row 2)
colnames(concacafTable) <- concacafTable[2, ]
concacafTable <- concacafTable[-c(1,2),]
concacafTable
# Year Round GP W D L GF GA
#3 1996 Runners-up 4 3 0 1 10 3
#4 1998 Third Place 5 2 2 1 6 2
#5 2003 Runners-up 5 3 0 2 6 4
#6 Total 3/11 14 8 2 4 22 9
库(XML)
baseURL我也发现两位秘书在解析网页
1.谁也不知道
tableNode <- xpathApply(xmltext, "//tbody")
tableNode可能重复的是的,这是一个很好的例子,我想很好地理解,我的理想与那个不一样。
library(XML)
baseURL <- "http://en.wikipedia.org/wiki/Brazil_national_football_team"
xmltext <- htmlParse(baseURL)
## grep correct table
tableNode <- xpathApply(xmltext, "//table[@class='wikitable']//th//a[@title='CONCACAF Gold Cup']/../../..")[[1]]
## convert XMLNode into data.frame
concacafTable <- readHTMLTable(tableNode, header=FALSE, stringsAsFactors=FALSE)
## format table (remove useless "Gold Cup"-header (row 1) and set right header (row 2)
colnames(concacafTable) <- concacafTable[2, ]
concacafTable <- concacafTable[-c(1,2),]
concacafTable
# Year Round GP W D L GF GA
#3 1996 Runners-up 4 3 0 1 10 3
#4 1998 Third Place 5 2 2 1 6 2
#5 2003 Runners-up 5 3 0 2 6 4
#6 Total 3/11 14 8 2 4 22 9
tableNode <- xpathApply(xmltext, "//tbody")
tableNode <- xpathApply(xmltext, "//table[@class='wikitable'][./tr/th/a[@title='CONCACAF Gold Cup']]") can work too.