Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/79.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用rvest从搜索中获取URL_R_Web_Screen Scraping_Rvest - Fatal编程技术网

使用rvest从搜索中获取URL

使用rvest从搜索中获取URL,r,web,screen-scraping,rvest,R,Web,Screen Scraping,Rvest,我试图从尼日利亚各省获得搜索西联网站的URL。特别是,我想在下面搜索一个省份向量,每次搜索都要保留相应的URL,然后对每个获得的链接进行webscrape。我知道如何做第二步,但不知道第一步。特别是,我的代码是 #install.packages("selectr") #install.packages("xml2") library(selectr) library(xml2) library(rvest) library(xlsx) provinces = as.vector(read.xl

我试图从尼日利亚各省获得搜索西联网站的URL。特别是,我想在下面搜索一个省份向量,每次搜索都要保留相应的URL,然后对每个获得的链接进行webscrape。我知道如何做第二步,但不知道第一步。特别是,我的代码是

#install.packages("selectr")
#install.packages("xml2")
library(selectr)
library(xml2)
library(rvest)
library(xlsx)
provinces = as.vector(read.xlsx("provinces.xls", 1)[,1])

URL <- "https://locations.westernunion.com/search/nigeria/"
webpage <- read_html(URL)
#安装程序包(“选择器”)
#install.packages(“xml2”)
图书馆(选择器)
库(xml2)
图书馆(rvest)
图书馆(xlsx)
省份=作为.vector(读.xlsx(“省份.xls”,1)[,1])

URL我们可以获得
href
属性,该属性以class
info

library(rvest)
library(dplyr)

URL <- "https://locations.westernunion.com/search/nigeria/"

URL %>%
  read_html() %>%
  html_nodes("div.info a") %>%
  html_attr("href") %>%
  grep("Nigeria$", ., value = TRUE)

#[1] "/ng/ebonyi/onueke/47908be48d424b6fba108b020c60b517?loc=+Nigeria"      
#[2] "/ng/plateau/plateau/393aa00a34ded9201b3c0c2fd70c02b3?loc=+Nigeria"    
#[3] "/ng/bayelsa/otuoke/046d3ae90f58169a7cc896b96e9ccfac?loc=+Nigeria"     
#[4] "/ng/ogun/abeokuta/fab00c55961bc48312029f13e7b75277?loc=+Nigeria"      
#[5] "/ng/ogun/idi-iroko/63803a3c50d4cb4b44f473cfd8cb96b1?loc=+Nigeria"     
#[6] "/ng/-/akwaibom/4c1dd6c2953a0d396500157d97ddf0ca?loc=+Nigeria"  
#....
现在这些URL可以用于流程的第2步

URL %>%
  read_html() %>%
  html_nodes("div.info a") %>%
  html_attr("href") %>%
  grep("Nigeria$", ., value = TRUE) %>%
  paste0("https://locations.westernunion.com", .)

#[1] "https://locations.westernunion.com/ng/ebonyi/onueke/47908be48d424b6fba108b020c60b517?loc=+Nigeria"      
#[2] "https://locations.westernunion.com/ng/plateau/plateau/393aa00a34ded9201b3c0c2fd70c02b3?loc=+Nigeria"    
#[3] "https://locations.westernunion.com/ng/bayelsa/otuoke/046d3ae90f58169a7cc896b96e9ccfac?loc=+Nigeria"     
#[4] "https://locations.westernunion.com/ng/ogun/abeokuta/fab00c55961bc48312029f13e7b75277?loc=+Nigeria"      
#[5] "https://locations.westernunion.com/ng/ogun/idi-iroko/63803a3c50d4cb4b44f473cfd8cb96b1?loc=+Nigeria"     
#[6] "https://locations.westernunion.com/ng/-/akwaibom/4c1dd6c2953a0d396500157d97ddf0ca?loc=+Nigeria" 
#....