Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/css/36.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/65.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Css 无接触异常用硒刮除ESPN_Css_R_Xpath_Web Scraping_Rselenium - Fatal编程技术网

Css 无接触异常用硒刮除ESPN

Css 无接触异常用硒刮除ESPN,css,r,xpath,web-scraping,rselenium,Css,R,Xpath,Web Scraping,Rselenium,我用R(和铯)从ESPN中获取数据。这不是我第一次使用它,但在这种情况下,我得到了一个错误,我无法解决这个问题 考虑一下这一页: 让我们试着把时间线缩短一下。如果我检查页面,就会得到css选择器 #liveLeft 像往常一样,我和你一起去 checkForServer() remDr <- remoteDriver() remDr$open() matchId <- "142562" leagueString <- "premiership" seasonString &

我用R(和铯)从ESPN中获取数据。这不是我第一次使用它,但在这种情况下,我得到了一个错误,我无法解决这个问题

考虑一下这一页:

让我们试着把时间线缩短一下。如果我检查页面,就会得到css选择器

#liveLeft
像往常一样,我和你一起去

checkForServer()
remDr <- remoteDriver()
remDr$open()

matchId <- "142562"
leagueString <- "premiership"
seasonString <- "2011-12"


url <- paste0("http://en.espn.co.uk/",leagueString,"-",seasonString,"/rugby/match/",matchId,".html")

remDr$navigate(url)
我很困惑。我也尝试过Xpath,但没有成功。我还尝试了在没有运气的情况下获取页面的不同元素。唯一返回某些内容的选择器是

#scrumContent
从评论中可以看出

该元素位于
iframe
中,因此无法选择该元素。在控制台中使用
chrome
document.getElementById('liveLeft')
时,会显示这一点。在整个页面上,它将返回
null
,即元素不存在,即使它清晰可见。要解决这个问题,只需加载
iframe

如果您查看页面,您将看到
iframe
scr
/premiership-2011-12/rugby/current/match/142562.html?view=scorecard
。导航到此页面而不是“完整”页面将允许元素“可见”,因此可以选择
RSelenium

checkForServer()
remDr <- remoteDriver()
remDr$open()

matchId <- "142562"
leagueString <- "premiership"
seasonString <- "2011-12"

url <- paste0("http://en.espn.co.uk/",leagueString,"-",seasonString,"/rugby/current/match/",matchId,".html?view=scorecard")
# Amend url to return iframe

remDr$navigate(url)

div<- remDr$findElement(using = 'css selector','#liveLeft')

通常使用Selenium时,当您的网页带有框架/iFrame时,您需要使用
remoteDriver
类的
switchToFrame
方法:

library(RSelenium)
selServ <- startServer()
remDr <- remoteDriver()
remDr$open()
matchId <- "142562"
leagueString <- "premiership"
seasonString <- "2011-12"
url <- paste0("http://en.espn.co.uk/",leagueString,"-",seasonString,"/rugby/match/",matchId,".html")
remDr$navigate(url)
# check the iframes
iframes <- htmlParse(remDr$getPageSource()[[1]])["//iframe", fun = function(x){xmlGetAttr(x, "id")}]
# iframes[[3]] == "win_old" contains the data switch to this frame
remDr$switchToFrame(iframes[[3]])
# check you can access the element
div<- remDr$findElement(using = 'css selector','#liveLeft')
div$highlightElement()
# get data
ifSource <- htmlParse(remDr$getPageSource()[[1]])
out <- readHTMLTable(ifSource["//div[@id = 'liveLeft']"][[1]], header = TRUE)
库(RSelenium)

selServ这是我在查看页面时看到的:
code……
如果我将div悬停在“liveLeft”上,我想要的表将亮起。是的,我刚刚注意到。有趣的是,如果要加载页面并在开发人员控制台输入
document.getElementById('liveLeft')
它将返回
null
。但是,当您随后检查元素并重新运行
文档时,getElementById('liveLeft')
将返回。我不是
js
方面的专家,但是可能有一些
AJAX
正在进行,这意味着元素在原始树中可用,因此为什么它在重新评估节点树之前找不到它。它在
iframe
中。不要加载当前的页面,要加载
iframe
。如果您查看源代码,您将看到参考
/premiership-2011-12/rugby/current/match/142562.html?view=scorecard
。如果您要加载它,然后查找它应该工作的元素。我还没有在
RSelenium中测试过,所以暂时不会把它作为答案,但它可以在chrome上的开发者工具中使用<代码>http://en.espn.co.uk/premiership-2011-12/rugby/current/match/142562.html?view=scorecard这很好用!谢谢。请将其添加为答案:)
checkForServer()
remDr <- remoteDriver()
remDr$open()

matchId <- "142562"
leagueString <- "premiership"
seasonString <- "2011-12"

url <- paste0("http://en.espn.co.uk/",leagueString,"-",seasonString,"/rugby/current/match/",matchId,".html?view=scorecard")
# Amend url to return iframe

remDr$navigate(url)

div<- remDr$findElement(using = 'css selector','#liveLeft')
document.getElementById('liveLeft') # Will return null as iframe has seperate DOM

var doc = document.getElementById('win_old').contentDocument # Loads iframe DOM elements in the variable doc
doc.getElementById('liveLeft') # Will now return the desired element.
library(RSelenium)
selServ <- startServer()
remDr <- remoteDriver()
remDr$open()
matchId <- "142562"
leagueString <- "premiership"
seasonString <- "2011-12"
url <- paste0("http://en.espn.co.uk/",leagueString,"-",seasonString,"/rugby/match/",matchId,".html")
remDr$navigate(url)
# check the iframes
iframes <- htmlParse(remDr$getPageSource()[[1]])["//iframe", fun = function(x){xmlGetAttr(x, "id")}]
# iframes[[3]] == "win_old" contains the data switch to this frame
remDr$switchToFrame(iframes[[3]])
# check you can access the element
div<- remDr$findElement(using = 'css selector','#liveLeft')
div$highlightElement()
# get data
ifSource <- htmlParse(remDr$getPageSource()[[1]])
out <- readHTMLTable(ifSource["//div[@id = 'liveLeft']"][[1]], header = TRUE)