Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/71.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R html_表丢失的信息_R_Web Scraping_Rvest - Fatal编程技术网

R html_表丢失的信息

R html_表丢失的信息,r,web-scraping,rvest,R,Web Scraping,Rvest,我希望从中删除第三个表并将其存储为数据帧。下面是一个可复制的示例 第三个表格是第一行第三列有“Isiah YOUNG”的表格 library(rvest) library(dplyr) target_url <- "https://flashresults.com/2017_Meets/Outdoor/06-22_USATF/004-2-02.htm" table <- target_url %>% read_html(options = c(&

我希望从中删除第三个表并将其存储为数据帧。下面是一个可复制的示例

第三个表格是第一行第三列有“Isiah YOUNG”的表格

library(rvest)
library(dplyr)

target_url <-
  "https://flashresults.com/2017_Meets/Outdoor/06-22_USATF/004-2-02.htm"

table <- target_url %>%
  read_html(options = c("DTDLOAD")) %>%
  html_nodes("[id^=splitevents]") # this is the correct node
但是,将其传递到
html_表
会导致一个空数据框

table[[1]] %>%
  html_table(fill = TRUE)
[1] Pl          Ln                      Athlete                 Affiliation Time                   
<0 rows> (or 0-length row.names)
表[[1]]%>%
html_表(fill=TRUE)
[1] 运动员附属时间
(或长度为0的行名称)

如何将
表[[1]]
(显然确实存在)的内容作为数据帧获取?

html中充满了错误并使解析器出错,我还没有找到任何简单的方法来修复这些错误

在这个特定场景中,另一种方法是使用报头计数来确定适当的列计数,然后通过将总td计数除以列数来导出行计数;使用这些转换为矩阵,然后是数据帧

library(rvest)
library(dplyr)

target_url <- "https://flashresults.com/2017_Meets/Outdoor/06-22_USATF/004-2-02.htm"

table <- read_html(target_url) %>%
  html_node("#splitevents")

tds <- table %>% html_nodes('td') %>% html_text()
ths <- table %>% html_nodes("th") %>% html_text()
num_col <- length(ths)
num_row <- length(tds) / num_col
  
df <- tds %>%
  matrix(nrow = num_row, ncol = num_col, byrow = TRUE) %>%
  data.frame() %>%
  setNames(ths)
库(rvest)
图书馆(dplyr)
目标\u url%html\u文本()
ths%html\u节点(“th”)%%>%html\u文本()
数值单位百分比
设置名称(ths)
library(rvest)
library(dplyr)

target_url <- "https://flashresults.com/2017_Meets/Outdoor/06-22_USATF/004-2-02.htm"

table <- read_html(target_url) %>%
  html_node("#splitevents")

tds <- table %>% html_nodes('td') %>% html_text()
ths <- table %>% html_nodes("th") %>% html_text()
num_col <- length(ths)
num_row <- length(tds) / num_col
  
df <- tds %>%
  matrix(nrow = num_row, ncol = num_col, byrow = TRUE) %>%
  data.frame() %>%
  setNames(ths)