使用Rvest从滚动表中提取数据_R_Web Scraping_Rvest

使用Rvest从滚动表中提取数据

r web-scraping

使用Rvest从滚动表中提取数据,r,web-scraping,rvest,R,Web Scraping,Rvest,我希望从位于的表中提取所有记录我面临的挑战是它是一个滚动表格（表格底部的文本显示它包含31228条记录： Showing 1 to 10 of 31,228 entries 我是Rvest的新手，在谷歌Chrome中查看表格后尝试了以下操作： library(rvest) url <- "https://thearcfooty.com/2017/01/28/a-complete-history-of-the-afl/" Table <- url %>

我希望从位于的表中提取所有记录

我面临的挑战是它是一个滚动表格（表格底部的文本显示它包含31228条记录：

Showing 1 to 10 of 31,228 entries

我是Rvest的新手，在谷歌Chrome中查看表格后尝试了以下操作：

library(rvest)
url <- "https://thearcfooty.com/2017/01/28/a-complete-history-of-the-afl/"

Table  <- url %>%
  read_html() %>%
  html_nodes(xpath= '//*[@id="table_1"]') %>%
  html_table()
TableNew <- Table[[1]]
TableNew

库（rvest）
url%
html_节点（xpath='/*[@id=“table_1”]'）%>%
html_表（）
TableNew我猜是html\u table
中的一些代码有点慢，这就是为什么它会没完没了地运行。实际上，你可以读入所有文本并转换为数据框形状。我还没有检查结果是否正确。但根据我观察的几个示例，应该可以
库（rvest）
#>正在加载所需的包：xml2
库（数据表）
url$季节：chr“1897”“1897”“1897”“1897”。。。
#>$round:chr“1”“1”“1”。。。
#>$home\u away:chr“A”A“A”A“A”。。。
#>$team:chr“CA”“SK”“ME”“ES”。。。
#>$对手：chr“FI”“CW”“SY”“GE”。。。
#>$margin_pred:chr“0.00”“0.00”“0.00”“2.99”。。。
#>$margin_实际：chr“-33.00”“25.00”“17.00”“23.00”。。。
#>$win_prob:chr“0.50”“0.50”“0.50”“0.47”。。。
#>$result:chr“0.18”“0.24”“0.69”“0.74”。。。
#>$team_elo_pre:chr“1500”“1500”“1500”“1500”。。。
#>$Operator_elo_pre:chr“1500”“1500”“1500”。。。
#>$team_elo_post:chr“1473”“1478”“1515”“1522”。。。
#>$oppost_elo_post:chr“1526”“1521”“1484”“1477”。。。
#>-attr（*，“.internal.selfref”）=

创建于2020-07-27，由（v0.3.0）
接受此答案，因为它对解决我的问题最有帮助。