使用不同的css选择器在网页矢量上运行rvest

使用不同的css选择器在网页矢量上运行rvest,r,web-scraping,R,Web Scraping,我试图从不同的网站上获取联系方式,因此他们有不同的css选择器 数据: 我读过关于多个网页抓取的每一篇文章,它们都有相似的URL和相似的css选择器/xpath 我确实试过: library(rvest) i<- str_replace_all(file$website, "http://www.[.]+", "") urls<- "http://www." cssr<- as.vector(file$cssr) for (i in ur

我试图从不同的网站上获取联系方式,因此他们有不同的css选择器

数据:

我读过关于多个网页抓取的每一篇文章,它们都有相似的URL和相似的css选择器/xpath

我确实试过:

library(rvest)
        i<- str_replace_all(file$website, "http://www.[.]+", "")
    urls<- "http://www."
    cssr<- as.vector(file$cssr)
    for (i in urls){
      a01 <- paste0("http://www.",i, sep="")
      text <- read_html(a01) %>%
        html_nodes(cssr) %>% 
        html_text()
库(rvest)

我考虑到@Spacedman的评论,也许这就是你想要的:

file <- read.table(header = TRUE, stringsAsFactors = FALSE, text =
'website  status  email  phone   Fax  cssr
http://www.saudiacatering.com/en/home NA info@noorinvestment.com "+966 12-686-0011" "+966 12-686-1864" ".w-icon li"
http://www.laithllc.com/contact.html NA  info@laithllc.com  "+971 2-553-7571" "+971 2-353-7579" "p+ p , section:nth-child(1) p"')

library(dplyr)
library(purrr)
library(rvest)
mutate(file, text = map2(website, cssr, ~ read_html(.x) %>% html_nodes(.y) %>% html_text()))
#                                 website status                   email            phone              Fax                          cssr                                                               text
# 1 http://www.saudiacatering.com/en/home     NA info@noorinvestment.com +966 12-686-0011 +966 12-686-1864                    .w-icon li +966 (12) 686-0011, +966 (12) 686-1864, careers@saudiacatering.com
# 2  http://www.laithllc.com/contact.html     NA       info@laithllc.com  +971 2-553-7571  +971 2-353-7579 p+ p , section:nth-child(1) p 
文件%html\u节点(.y)%%>%html\u文本())
#网站状态电子邮件电话传真cssr文本
# 1 http://www.saudiacatering.com/en/home     NAinfo@noorinvestment.com+966 12-686-0011+966 12-686-1864。w-图标li+966(12)686-0011,+966(12)686-1864,careers@saudiacatering.com
# 2  http://www.laithllc.com/contact.html     NAinfo@laithllc.com+9712-553-7571+9712-353-7579 p+p,截面:第n个孩子(1)p

你得到了
urlsAlso,如果你说某件事“不成功”,你应该解释发生了什么样的不成功——错误消息、空白输出、计算机着火?事情通常只有一种方式能起作用,也有数百种方式会失败,而且知道哪种方式能帮助我们帮助你。@spacedman,你说得对。我是R新手,这是我第一次使用循环。我做了更多的阅读,并添加了上面的代码,这仍然给我错误。下面的答案对我有用,但为了学习,我想知道我的代码有什么问题?
library(stringr)
library(rvest)
library(magrittr)
    i<- str_replace_all(url, "http://www.", "")
urls<- "http://www."
cssr<- as.vector(file$cssr)
for (x in i){
  a01 <- paste0("http://www.",x, sep="")
  read_html(a01)%>%
for(m in cssr){html_nodes(m) %>%html_text()}}

    Error in for (. in m) file$cssr : 
  4 arguments passed to 'for' which requires 3 
file <- read.table(header = TRUE, stringsAsFactors = FALSE, text =
'website  status  email  phone   Fax  cssr
http://www.saudiacatering.com/en/home NA info@noorinvestment.com "+966 12-686-0011" "+966 12-686-1864" ".w-icon li"
http://www.laithllc.com/contact.html NA  info@laithllc.com  "+971 2-553-7571" "+971 2-353-7579" "p+ p , section:nth-child(1) p"')

library(dplyr)
library(purrr)
library(rvest)
mutate(file, text = map2(website, cssr, ~ read_html(.x) %>% html_nodes(.y) %>% html_text()))
#                                 website status                   email            phone              Fax                          cssr                                                               text
# 1 http://www.saudiacatering.com/en/home     NA info@noorinvestment.com +966 12-686-0011 +966 12-686-1864                    .w-icon li +966 (12) 686-0011, +966 (12) 686-1864, careers@saudiacatering.com
# 2  http://www.laithllc.com/contact.html     NA       info@laithllc.com  +971 2-553-7571  +971 2-353-7579 p+ p , section:nth-child(1) p