R I';I’我想刮多页

R I';I’我想刮多页,r,rvest,R,Rvest,我正试图从一个游戏网站上从同一个网站上抓取多个页面进行评论 我试着运行它并修改我在这里找到的代码:用其中一个答案 library(tidyverse) library(rvest) url_base <- "https://www.metacritic.com/browse/games/score/metascore/all/ps4?sort=desc&page=0" map_df(1:17, function(i) { cat(".") pg <- read

我正试图从一个游戏网站上从同一个网站上抓取多个页面进行评论

我试着运行它并修改我在这里找到的代码:用其中一个答案

library(tidyverse)
library(rvest)

url_base <- "https://www.metacritic.com/browse/games/score/metascore/all/ps4?sort=desc&page=0"

map_df(1:17, function(i) {


  cat(".")

 pg <- read_html(sprintf(url_base, i))

data.frame(Name = html_text(html_nodes(pg,"#main .product_title a")),
         MetaRating = as.numeric(html_text(html_nodes(pg,"#main .positive"))),
         UserRating = as.numeric(html_text(html_nodes(pg,"#main .textscore"))),
         stringsAsFactors = FALSE)

}) -> ps4games_metacritic
库(tidyverse)
图书馆(rvest)

url_base我对您的代码做了三处更改:

  • 由于它们的页码从0开始,
    map\u df(1:17…
    应该是
    map\u df(0:16…
  • 根据BigDataScientist的建议,
    url\u base
    应该这样设置:
    url\u base如果您查看链接的答案,您会看到页码被
    %d
    替换。因此,在您的情况下,您将页码刮取0,17次。请尝试
    url\u base可能的重复
    
        library(tidyverse)
        library(rvest)
        
        url_base <- "https://www.metacritic.com/browse/games/score/metascore/all/ps4?sort=desc&page=%d"
        
        map_df(0:16, function(i) {
          
          
          cat(".")
          pg <- read_html(sprintf(url_base, i))
        
          data.frame(Name = html_text(html_nodes(pg,"#main .product_title a")),
                     MetaRating = as.numeric(html_text(html_nodes(pg,"#main .game"))),
                     UserRating = as.numeric(html_text(html_nodes(pg,"#main .textscore"))),
                     stringsAsFactors = FALSE)
          
        }) -> ps4games_metacritic