Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/74.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Xml 如何在R中从已删除的网页中分离单个元素_Xml_R_Web Scraping_Rcurl - Fatal编程技术网

Xml 如何在R中从已删除的网页中分离单个元素

Xml 如何在R中从已删除的网页中分离单个元素,xml,r,web-scraping,rcurl,Xml,R,Web Scraping,Rcurl,我想用R来抓取这一页:()和其他,以获得进球得分者和次数 到目前为止,我得到的是: require(RCurl) require(XML) theURL <-"http://www.fifa.com/worldcup/archive/germany2006/results/matches/match=97410001/report.html" webpage <- getURL(theURL, header=FALSE, verbose=TRUE) webpagecont <

我想用R来抓取这一页:()和其他,以获得进球得分者和次数

到目前为止,我得到的是:

require(RCurl)
require(XML)

theURL <-"http://www.fifa.com/worldcup/archive/germany2006/results/matches/match=97410001/report.html"
webpage <- getURL(theURL, header=FALSE, verbose=TRUE) 
webpagecont <- readLines(tc <- textConnection(webpage)); close(tc)  

pagetree <- htmlTreeParse(webpagecont, error=function(...){}, useInternalNodes = TRUE)
require(RCurl)
需要(XML)

URL在R中处理web抓取和XML时,这些问题非常有用:

  • 关于您的特定示例,虽然我不确定您希望输出是什么样子,但这会将“得分”作为字符向量:

    theURL <-"http://www.fifa.com/worldcup/archive/germany2006/results/matches/match=97410001/report.html"
    fifa.doc <- htmlParse(theURL)
    fifa <- xpathSApply(fifa.doc, "//*/div[@class='cont']", xmlValue)
    goals.scored <- grep("Goals scored", fifa, value=TRUE)
    

    做这样的事情时要小心。。。在大多数情况下,国际足联或国际篮联、NBA等组织不允许使用他们的数据——简单地说:他们的数据是他们的财产!所以下次提供一些伪HTML代码,或者只指向一些无害的站点很不错,我很久以前就在寻找类似的东西,但最终还是在Python中找到了!现在我可以运行littler脚本并填充数据集了!酷!
    theURL <-"http://www.fifa.com/worldcup/archive/germany2006/results/matches/match=97410001/report.html"
    fifa.doc <- htmlParse(theURL)
    fifa <- xpathSApply(fifa.doc, "//*/div[@class='cont']", xmlValue)
    goals.scored <- grep("Goals scored", fifa, value=TRUE)
    
    > gsub("Goals scored", "", strsplit(goals.scored, ", ")[[1]])
    [1] "Philipp LAHM (GER) 6'"    "Paulo WANCHOPE (CRC) 12'" "Miroslav KLOSE (GER) 17'" "Miroslav KLOSE (GER) 61'" "Paulo WANCHOPE (CRC) 73'"
    [6] "Torsten FRINGS (GER) 87'"