R 当我试图用硒擦洗时,我被阻塞了
我尝试使用以下代码刮取网站:R 当我试图用硒擦洗时,我被阻塞了,r,web-scraping,rselenium,R,Web Scraping,Rselenium,我尝试使用以下代码刮取网站: library(RSelenium) library(dplyr) library(rvest) rD<-rsDriver(browser = 'firefox', port = 4875L) remDr<-rD$client input_galaxus<-c('https://www.galaxus.ch/8606656','https://www.galaxus.ch/9796481','https://www.galaxus.ch/105
library(RSelenium)
library(dplyr)
library(rvest)
rD<-rsDriver(browser = 'firefox', port = 4875L)
remDr<-rD$client
input_galaxus<-c('https://www.galaxus.ch/8606656','https://www.galaxus.ch/9796481','https://www.galaxus.ch/10592688')
vec_galaxus<-vector()
i=0
for (j in input_galaxus){
remDr$navigate(j)
i=i+1
try(vec_galaxus[i]<-read_html(remDr$getPageSource()[[1]])%>%
html_nodes('div strong')%>%
html_text()%>%
nth(5))
Sys.sleep(runif(1, min=5, max=10))
}
库(RSelenium)
图书馆(dplyr)
图书馆(rvest)
rD我让它与rvest
会话一起工作-不需要硒。只需删除RSelenium行并将for
循环替换为
sess <- session(input_galaxus[1]) #to start the session
for (j in input_galaxus){
sess <- sess %>% session_jump_to(j) #jump to URL
i=i+1
try(vec_galaxus[i] <- read_html(sess) %>% #can read direct from sess
html_nodes('div strong') %>%
html_text() %>%
nth(5))
Sys.sleep(runif(1, min=5, max=10))
}
vec_galaxus
[1] " 399.–" " 660.–" " 931.–"
sess%
html_text()%>%
第n(5)节)
系统睡眠(runif(1,最小值=5,最大值=10))
}
加拉克斯
[1] " 399.–" " 660.–" " 931.–"