刮削一张由R中的桌子组成的桌子_R_Web Scraping

刮削一张由R中的桌子组成的桌子

r web-scraping

刮削一张由R中的桌子组成的桌子,r,web-scraping,R,Web Scraping,我从一个网站上下载了一张表，它似乎是由一堆表组成的。现在我使用rvest将表作为文本引入，但它引入了一堆我不感兴趣的其他表，然后我强制将数据转换为更好的格式，但这不是一个可重复的过程。这是我的密码： library(rvest) library(tidyr) #Auto Download Data #reads the url of the race race_url <- read_html("http://racing-reference.info/race/2016_Folds_o

我从一个网站上下载了一张表，它似乎是由一堆表组成的。现在我使用

rvest

将表作为文本引入，但它引入了一堆我不感兴趣的其他表，然后我强制将数据转换为更好的格式，但这不是一个可重复的过程。这是我的密码：

library(rvest)
library(tidyr)

#Auto Download Data
#reads the url of the race
race_url <- read_html("http://racing-reference.info/race/2016_Folds_of_Honor_QuikTrip_500/W") 
#reads in the tables, in this code there are too many
race_results <- race_url %>%
        html_nodes(".col") %>%
        html_text() 
race_results <- data.table(race_results) #turns from a factor to a DT
f <- nrow(race_results) #counts the number of rows in the data
#eliminates all rows after 496 (11*45 + 1) since there are never more than 43 racers
race_results <- race_results[-c(496:f)] 
#puts the data into a format with 1:11 numbering for unstacking
testDT <- data.frame(X = race_results$race_results, ind = rep(1:11, nrow(race_results)/11)) 
testDT <- unstack(testDT, X~ind) #unstacking data into 11 columns
colnames(testDT) <- testDT[1, ] #changing the top column into the header

尝试使用：

race\u results%html\u节点（“表”）

现在race\u results是页面上所有表的列表。找出你感兴趣的表格，你就都准备好了。请务必遵守本网站的服务条款。使用@Dave2e的建议，您会发现

race\u results[7]>%html\u table（）

返回您想要的内容。谢谢！我更新了问题以反映上面的正确代码。

library(rvest)
library(tidyr)

#Auto Download Data
race_url <- read_html("http://racing-reference.info/race/2016_Folds_of_Honor_QuikTrip_500/W") #reads the url of the race
race_results <- race_url %>% html_nodes("table") #returns a DF with all of the tables on the page
race_results <- race_results[7] %>% html_table()
race_results <- data.frame(race_results) #turns from a factor to a DT