Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/css/38.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用rvest访问html表_Html_Css_R_Rvest - Fatal编程技术网

使用rvest访问html表

使用rvest访问html表,html,css,r,rvest,Html,Css,R,Rvest,所以我想搜集一些NBA的数据。以下是目前为止我所拥有的,它功能完善: install.packages('rvest') library(rvest) url = "https://www.basketball-reference.com/boxscores/201710180BOS.html" webpage = read_html(url) table = html_nodes(webpage, 'table') data = html_table(table) away = data[

所以我想搜集一些NBA的数据。以下是目前为止我所拥有的,它功能完善:

install.packages('rvest')
library(rvest)

url = "https://www.basketball-reference.com/boxscores/201710180BOS.html"
webpage = read_html(url)
table = html_nodes(webpage, 'table')
data = html_table(table)

away = data[[1]]
home = data[[3]]

colnames(away) = away[1,] #set appropriate column names
colnames(home) = home[1,]

away = away[away$MP != "MP",] #remove rows that are just column names
home = home[home$MP != "MP",]
问题是这些表不包含团队名称,这一点很重要。为了得到这些信息,我想我会在网页上刮掉四因素表,然而,rvest似乎没有意识到这是一个表。包含四个因素表的div是:

<div class="overthrow table_container" id="div_four_factors">
<table class="suppress_all sortable stats_table now_sortable" id="four_factors" data-cols-to-freeze="1"><thead><tr class="over_header thead">

但这似乎不起作用,因为我得到的只是一个空列表。如何访问四因素表?

我绝不是HTML专家,但您感兴趣的表似乎在源代码中被注释掉了,然后注释在呈现之前的某个点被覆盖

如果我们假设主队总是排在第二位,我们可以使用位置参数并在页面上刮取另一个表:

table = html_nodes(webpage,'#bottom_nav_container')
teams <- html_text(table[1]) %>%
  stringr::str_split("Schedule\n")

away$team <- trimws(teams[[1]][1])
home$team <- trimws(teams[[1]][2])
显然,这不是最干净的解决方案,但这就是刮网世界的生活

table = html_nodes(webpage,'#bottom_nav_container')
teams <- html_text(table[1]) %>%
  stringr::str_split("Schedule\n")

away$team <- trimws(teams[[1]][1])
home$team <- trimws(teams[[1]][2])