R I';I’我想刮多页
我正试图从一个游戏网站上从同一个网站上抓取多个页面进行评论 我试着运行它并修改我在这里找到的代码:用其中一个答案R I';I’我想刮多页,r,rvest,R,Rvest,我正试图从一个游戏网站上从同一个网站上抓取多个页面进行评论 我试着运行它并修改我在这里找到的代码:用其中一个答案 library(tidyverse) library(rvest) url_base <- "https://www.metacritic.com/browse/games/score/metascore/all/ps4?sort=desc&page=0" map_df(1:17, function(i) { cat(".") pg <- read
library(tidyverse)
library(rvest)
url_base <- "https://www.metacritic.com/browse/games/score/metascore/all/ps4?sort=desc&page=0"
map_df(1:17, function(i) {
cat(".")
pg <- read_html(sprintf(url_base, i))
data.frame(Name = html_text(html_nodes(pg,"#main .product_title a")),
MetaRating = as.numeric(html_text(html_nodes(pg,"#main .positive"))),
UserRating = as.numeric(html_text(html_nodes(pg,"#main .textscore"))),
stringsAsFactors = FALSE)
}) -> ps4games_metacritic
库(tidyverse)
图书馆(rvest)
url_base我对您的代码做了三处更改:
由于它们的页码从0开始,map\u df(1:17…
应该是map\u df(0:16…
根据BigDataScientist的建议,
url\u base
应该这样设置:url\u base如果您查看链接的答案,您会看到页码被%d
替换。因此,在您的情况下,您将页码刮取0,17次。请尝试url\u base可能的重复
library(tidyverse)
library(rvest)
url_base <- "https://www.metacritic.com/browse/games/score/metascore/all/ps4?sort=desc&page=%d"
map_df(0:16, function(i) {
cat(".")
pg <- read_html(sprintf(url_base, i))
data.frame(Name = html_text(html_nodes(pg,"#main .product_title a")),
MetaRating = as.numeric(html_text(html_nodes(pg,"#main .game"))),
UserRating = as.numeric(html_text(html_nodes(pg,"#main .textscore"))),
stringsAsFactors = FALSE)
}) -> ps4games_metacritic