用Rvest刮环

用Rvest刮环,r,for-loop,web-scraping,rvest,R,For Loop,Web Scraping,Rvest,我已经回顾了与这个类似主题相关的类似问题的几个答案,但这两个答案似乎都不适用于我 我有一个URL列表,我希望从每个URL中获取表并将其附加到主数据帧 ## get all urls into one list page<- (0:2) urls <- list() for (i in 1:length(page)) { url<- paste0("https://www.mlssoccer.com/stats/season?page=",page[i]) urls[

我已经回顾了与这个类似主题相关的类似问题的几个答案,但这两个答案似乎都不适用于我

我有一个URL列表,我希望从每个URL中获取表并将其附加到主数据帧

## get all urls into one list
page<- (0:2)
urls <- list()
for (i in 1:length(page)) {
  url<- paste0("https://www.mlssoccer.com/stats/season?page=",page[i])
  urls[[i]] <- url
}


### loop over the urls and get the table from each page
table<- data.frame()
for (j in urls) {
  tbl<- urls[j] %>% 
    read_html() %>% 
    html_node("table") %>%
    html_table()
  table[[j]] <- tbl
}
关于如何纠正此错误并将3个表循环到单个DF中,有什么建议吗?我很感激任何提示或指点

这是您的问题:

for (j in urls) {
  tbl<- urls[j] %>% 
您还可以使用seq_:

这是你的问题:

for (j in urls) {
  tbl<- urls[j] %>% 
您还可以使用seq_:

试试这个:

library(tidyverse)
library(rvest)

page<- (0:2)
urls <- list()
for (i in 1:length(page)) {
  url<- paste0("https://www.mlssoccer.com/stats/season?page=",page[i])
  urls[[i]] <- url
}

### loop over the urls and get the table from each page
tbl <- list()
j <- 1
for (j in seq_along(urls)) {
  tbl[[j]] <- urls[[j]] %>%   # tbl[[j]] assigns each table from your urls as an element in the tbl list
    read_html() %>% 
    html_node("table") %>%
    html_table()
  j <- j+1                    # j <- j+1 iterates over each url in turn and assigns the table from the second url as an element of tbl list, [[2]] in this case
}

#convert list to data frame
tbl <- do.call(rbind, tbl)
表[[j]]请尝试以下操作:

library(tidyverse)
library(rvest)

page<- (0:2)
urls <- list()
for (i in 1:length(page)) {
  url<- paste0("https://www.mlssoccer.com/stats/season?page=",page[i])
  urls[[i]] <- url
}

### loop over the urls and get the table from each page
tbl <- list()
j <- 1
for (j in seq_along(urls)) {
  tbl[[j]] <- urls[[j]] %>%   # tbl[[j]] assigns each table from your urls as an element in the tbl list
    read_html() %>% 
    html_node("table") %>%
    html_table()
  j <- j+1                    # j <- j+1 iterates over each url in turn and assigns the table from the second url as an element of tbl list, [[2]] in this case
}

#convert list to data frame
tbl <- do.call(rbind, tbl)

表[[j]]您是否尝试过分配j?您是否尝试过分配j,这似乎解决了原始问题,但它会产生一个我不理解的新错误。[[这似乎解决了原来的问题,但是它产生了一个新的错误,我不明白。[[谢谢@on_an_岛上的错误。这正是我想要的输出。我知道这不是最好的提问方式,但是,我如何向表中添加一列,显示使用了什么j?我正在调整solution@AdilsonVCasula只需添加tbl[[j]]$j谢谢@on_an_island。正是我想要的输出。我知道这不是最好的提问方式,但是,我如何才能在表中添加一列,显示使用了什么j?我正在调整solution@AdilsonVCasula只需添加tbl[[j]]$j
library(tidyverse)
library(rvest)

page<- (0:2)
urls <- list()
for (i in 1:length(page)) {
  url<- paste0("https://www.mlssoccer.com/stats/season?page=",page[i])
  urls[[i]] <- url
}

### loop over the urls and get the table from each page
tbl <- list()
j <- 1
for (j in seq_along(urls)) {
  tbl[[j]] <- urls[[j]] %>%   # tbl[[j]] assigns each table from your urls as an element in the tbl list
    read_html() %>% 
    html_node("table") %>%
    html_table()
  j <- j+1                    # j <- j+1 iterates over each url in turn and assigns the table from the second url as an element of tbl list, [[2]] in this case
}

#convert list to data frame
tbl <- do.call(rbind, tbl)