R 胡说八道,不知道如何继续

R 胡说八道,不知道如何继续,r,web-scraping,rvest,R,Web Scraping,Rvest,作为一个辅助项目,我试图收集与梦幻足球相关的NFL球员的统计数据。我找到了一个包含所需数据的URL: 我正试图把它刮到R里,但运气不好。我尝试过很多东西,最接近的是: Test1 <- read_html("https://www.cbssports.com/fantasy/football/stats/QB/2020/season/projections/ppr/") %>% html_nodes('.TableBase-bodyTr') 这只是一个纯粹的混

作为一个辅助项目,我试图收集与梦幻足球相关的NFL球员的统计数据。我找到了一个包含所需数据的URL:

我正试图把它刮到R里,但运气不好。我尝试过很多东西,最接近的是:

Test1 <- read_html("https://www.cbssports.com/fantasy/football/stats/QB/2020/season/projections/ppr/") %>% html_nodes('.TableBase-bodyTr')
这只是一个纯粹的混沌,里面嵌入了相关的信息。我还尝试在它上面使用html_table(),但只是得到了一个错误

现在,如果我在“Test1”上使用View函数,我可以钻取许多层的数据并找到我要查找的内容,但我试图弄清楚的是如何直接获取这些数据


我真的不知道接下来该怎么办。如果有人能给我一些建议,我会非常感激。我对HTML的熟悉程度非常低,我正试图阅读更多关于它的内容并理解它,但从我通过查看页面所收集到的信息来看,数据存储在类“TableBase bodyTr”中,这就是我将节点指向该类的原因。

表格格式有点怪异,导致了一个错误
HTML\u table()
。我不知道该怎么纠正

这里有一种替代方法,可以刮取行的内容,然后创建数据帧

library(rvest)
page <- read_html("https://www.cbssports.com/fantasy/football/stats/QB/2020/season/projections/ppr/") 

#find the rows of the table
rows<-page%>% html_nodes('tr')

#the first 2 rows are the header information skipping those
#get the playname (both short and long verision)
playername <- rows[-c(1, 2)] %>% html_nodes('td span span a') %>% html_text() %>% trimws() 
playername <- matrix(playername, ncol=2, byrow=TRUE)

#get the team and position
position <- rows[-c(1, 2)] %>% html_nodes('span.CellPlayerName-position') %>% html_text() %>% trimws() 
team <- rows[-c(1, 2)] %>% html_nodes('span.CellPlayerName-team') %>% html_text() %>% trimws() 

#get the stats from the table
cols <- rows[-c(1, 2)] %>% html_nodes('td') %>% html_text() %>% trimws() 
stats <-matrix(cols, ncol=16, byrow=TRUE)

#make the final answer
answer <- data.frame(playername, position, team, stats[, -1])
#still need to rename the columns
statnames<-c("Name_s", "Name_l", "position", "team",  'GP', 'ATT', 'CMP', 'YDS', 'YDS/G', "TD", 'INT', 'RATE', 'ATT', 'YDS', 'AVG', 'TD', 'FL', 'FPTS', "FPPG")
names(answer) <- statnames
库(rvest)
页面%html\u text()%%>%trimws()
播放名称%html_text()%>%trimws()
团队%html\u节点('span.CellPlayerName团队')%%>%html\u文本()%%>%trimws()
#从表中获取统计信息
cols%html\u节点('td')%%>%html\u文本()%%>%trimws()

谢谢你!我正在检查代码,试图理解所有内容。还有一些令人毛骨悚然的数据,需要进一步研究。谢谢你的努力,我真的很感激。HTML真的让我很烦,不确定要查看哪个节点。@MSCRN,是的,这个页面不容易,希望上面的评论能提供足够的指导来提供最终结论。
[65] "\n                    \n                        \n                        \n            \n                                                                                                    \n            J. Eason\n    \n                                        \n                                    \n                        QB\n                    \n                    \n                                    \n                        IND\n                    \n                                \n                \n                \n                            \n        \n        \n            
library(rvest)
page <- read_html("https://www.cbssports.com/fantasy/football/stats/QB/2020/season/projections/ppr/") 

#find the rows of the table
rows<-page%>% html_nodes('tr')

#the first 2 rows are the header information skipping those
#get the playname (both short and long verision)
playername <- rows[-c(1, 2)] %>% html_nodes('td span span a') %>% html_text() %>% trimws() 
playername <- matrix(playername, ncol=2, byrow=TRUE)

#get the team and position
position <- rows[-c(1, 2)] %>% html_nodes('span.CellPlayerName-position') %>% html_text() %>% trimws() 
team <- rows[-c(1, 2)] %>% html_nodes('span.CellPlayerName-team') %>% html_text() %>% trimws() 

#get the stats from the table
cols <- rows[-c(1, 2)] %>% html_nodes('td') %>% html_text() %>% trimws() 
stats <-matrix(cols, ncol=16, byrow=TRUE)

#make the final answer
answer <- data.frame(playername, position, team, stats[, -1])
#still need to rename the columns
statnames<-c("Name_s", "Name_l", "position", "team",  'GP', 'ATT', 'CMP', 'YDS', 'YDS/G', "TD", 'INT', 'RATE', 'ATT', 'YDS', 'AVG', 'TD', 'FL', 'FPTS', "FPPG")
names(answer) <- statnames