用Rvest刮环
我已经回顾了与这个类似主题相关的类似问题的几个答案,但这两个答案似乎都不适用于我 我有一个URL列表,我希望从每个URL中获取表并将其附加到主数据帧用Rvest刮环,r,for-loop,web-scraping,rvest,R,For Loop,Web Scraping,Rvest,我已经回顾了与这个类似主题相关的类似问题的几个答案,但这两个答案似乎都不适用于我 我有一个URL列表,我希望从每个URL中获取表并将其附加到主数据帧 ## get all urls into one list page<- (0:2) urls <- list() for (i in 1:length(page)) { url<- paste0("https://www.mlssoccer.com/stats/season?page=",page[i]) urls[
## get all urls into one list
page<- (0:2)
urls <- list()
for (i in 1:length(page)) {
url<- paste0("https://www.mlssoccer.com/stats/season?page=",page[i])
urls[[i]] <- url
}
### loop over the urls and get the table from each page
table<- data.frame()
for (j in urls) {
tbl<- urls[j] %>%
read_html() %>%
html_node("table") %>%
html_table()
table[[j]] <- tbl
}
关于如何纠正此错误并将3个表循环到单个DF中,有什么建议吗?我很感激任何提示或指点 这是您的问题:
for (j in urls) {
tbl<- urls[j] %>%
您还可以使用seq_:
这是你的问题:
for (j in urls) {
tbl<- urls[j] %>%
您还可以使用seq_:
试试这个:
library(tidyverse)
library(rvest)
page<- (0:2)
urls <- list()
for (i in 1:length(page)) {
url<- paste0("https://www.mlssoccer.com/stats/season?page=",page[i])
urls[[i]] <- url
}
### loop over the urls and get the table from each page
tbl <- list()
j <- 1
for (j in seq_along(urls)) {
tbl[[j]] <- urls[[j]] %>% # tbl[[j]] assigns each table from your urls as an element in the tbl list
read_html() %>%
html_node("table") %>%
html_table()
j <- j+1 # j <- j+1 iterates over each url in turn and assigns the table from the second url as an element of tbl list, [[2]] in this case
}
#convert list to data frame
tbl <- do.call(rbind, tbl)
表[[j]]请尝试以下操作:
library(tidyverse)
library(rvest)
page<- (0:2)
urls <- list()
for (i in 1:length(page)) {
url<- paste0("https://www.mlssoccer.com/stats/season?page=",page[i])
urls[[i]] <- url
}
### loop over the urls and get the table from each page
tbl <- list()
j <- 1
for (j in seq_along(urls)) {
tbl[[j]] <- urls[[j]] %>% # tbl[[j]] assigns each table from your urls as an element in the tbl list
read_html() %>%
html_node("table") %>%
html_table()
j <- j+1 # j <- j+1 iterates over each url in turn and assigns the table from the second url as an element of tbl list, [[2]] in this case
}
#convert list to data frame
tbl <- do.call(rbind, tbl)
表[[j]]您是否尝试过分配j?您是否尝试过分配j,这似乎解决了原始问题,但它会产生一个我不理解的新错误。[[这似乎解决了原来的问题,但是它产生了一个新的错误,我不明白。[[谢谢@on_an_岛上的错误。这正是我想要的输出。我知道这不是最好的提问方式,但是,我如何向表中添加一列,显示使用了什么j?我正在调整solution@AdilsonVCasula只需添加tbl[[j]]$j谢谢@on_an_island。正是我想要的输出。我知道这不是最好的提问方式,但是,我如何才能在表中添加一列,显示使用了什么j?我正在调整solution@AdilsonVCasula只需添加tbl[[j]]$j
library(tidyverse)
library(rvest)
page<- (0:2)
urls <- list()
for (i in 1:length(page)) {
url<- paste0("https://www.mlssoccer.com/stats/season?page=",page[i])
urls[[i]] <- url
}
### loop over the urls and get the table from each page
tbl <- list()
j <- 1
for (j in seq_along(urls)) {
tbl[[j]] <- urls[[j]] %>% # tbl[[j]] assigns each table from your urls as an element in the tbl list
read_html() %>%
html_node("table") %>%
html_table()
j <- j+1 # j <- j+1 iterates over each url in turn and assigns the table from the second url as an element of tbl list, [[2]] in this case
}
#convert list to data frame
tbl <- do.call(rbind, tbl)