运行几个URL，并从每个URL导入数据_R

运行几个URL，并从每个URL导入数据

运行几个URL，并从每个URL导入数据,r,R,我正试图找出如何通过几个URL循环。这只是我自己的一个学习练习。我以为我基本上知道怎么做，但我已经被一个问题困扰了好几个小时了，现在我还没有取得任何进展。我相信我下面的代码很接近，但由于某种原因，它并没有增加 library(rvest) URL <- "https://www.ebay.com/sch/i.html?_from=R40&_sacat=0&_nkw=mens%27s+shoes+size+11&rt=nc" WS <- read_html(UR

我正试图找出如何通过几个URL循环。这只是我自己的一个学习练习。我以为我基本上知道怎么做，但我已经被一个问题困扰了好几个小时了，现在我还没有取得任何进展。我相信我下面的代码很接近，但由于某种原因，它并没有增加

library(rvest)
URL <- "https://www.ebay.com/sch/i.html?_from=R40&_sacat=0&_nkw=mens%27s+shoes+size+11&rt=nc"
WS <- read_html(URL)
URLs <- WS %>% html_nodes("ResultSetItems") %>% html_attr("href") %>% as.character()

我认为代码应该是这样的：

'https://www.ebay.com/sch/i.html?_from=R40&_sacat=0&_nkw=mens%27s+shoes+size+11&_pgn=1&_skc=0&rt=nc'              

'https://www.ebay.com/sch/i.html?_from=R40&_sacat=0&_nkw=mens%27s+shoes+size+11&_pgn=2&_skc=0&rt=nc'

'https://www.ebay.com/sch/i.html?_from=R40&_sacat=0&_nkw=mens%27s+shoes+size+11&_pgn=3&_skc=0&rt=nc'

'https://www.ebay.com/sch/i.html?_from=R40&_sacat=0&_nkw=mens%27s+shoes+size+11&_pgn=4&_skc=0&rt=nc'

'https://www.ebay.com/sch/i.html?_from=R40&_sacat=0&_nkw=mens%27s+shoes+size+11&_pgn=5&_skc=0&rt=nc'

for(i in 1:5) 
{

   site <- paste("https://www.ebay.com/sch/i.html?_from=R40&_sacat=0&_nkw=mens%27s+shoes+size+11&_pgn=",i,"&_skc=0&rt=nc", jump, sep="")
   dfList <- lapply(site, function(i) {
       WS <- read_html(i)
       URLs <- WS %>% html_nodes("ResultSetItems") %>% html_attr("href") %>% as.character()
   })
}
finaldf <- do.call(rbind, webpage)

我好像没法让它工作。我可能过于简化了。不确定。我能在这里得到点帮助吗？TIA。

以下是如何进行。在我的例子中，给定一组url read_url，您只需要使用mapfunction应用

您将获得一个对象列表，您可以在其中应用相同的函数来获取所需的数据。完成后，只需将列表放在一个数据框中即可

但在ebay这个领域，观看似乎是不允许的。也许你应该选择另一个例子来尝试；嗯

编辑

您在易趣上的示例无法给出结果，因为它是禁止的。更清楚地说，我将使用允许web抓取的示例。我就是这样做的，以避免使用apply族的函数。首先，我们生成从中获取信息的页面列表

library(rvest)
library(tidyverse)

urls <- "http://books.toscrape.com/catalogue/page-"

pag <- 1:5

read_urls <- paste0(urls, pag, ".html")

read_urls %>% 
  map(read_html) -> p

因此：

# A tibble: 100 x 2
                                 titles prices
                                  <chr>  <chr>
1                  A Light in the Attic £51.77
2                    Tipping the Velvet £53.74
3                            Soumission £50.10
4                         Sharp Objects £47.82
5 Sapiens: A Brief History of Humankind £54.23
6                       The Requiem Red £22.65

现在有可能将所有这些转化为一个函数。但是我把它交给你了。

对不起，你是什么意思？你的代码实际上什么都不做。我知道你的意思…我不一定要浏览易趣的链接。我只是想学习这个概念，我以为我知道，但正如我现在看到的，我真的不明白这个东西是如何工作的。谢谢。答案编辑后提供了一个更准确的例子，希望对大家有所帮助！一个快乐的网页抓取：真棒！！非常感谢分享这个！！

#Extract titles from the pages
p %>%  
  map(html_nodes, "article") %>% 
  map(html_nodes, xpath = "./h3/a") %>% 
  map(html_attr, "title") %>% 
  unlist() -> titles

#Extract price from the pages
p %>% 
  map(html_nodes, "article") %>% 
  map(html_nodes, ".price_color") %>% 
  map(html_text) %>% 
  unlist() -> prices

r <- tibble(titles, prices)

# A tibble: 100 x 2
                                 titles prices
                                  <chr>  <chr>
1                  A Light in the Attic £51.77
2                    Tipping the Velvet £53.74
3                            Soumission £50.10
4                         Sharp Objects £47.82
5 Sapiens: A Brief History of Humankind £54.23
6                       The Requiem Red £22.65