如何将此数据强制放入data.frame？_R_Web Scraping_Data Science

如何将此数据强制放入data.frame？

r web-scraping

如何将此数据强制放入data.frame？,r,web-scraping,data-science,R,Web Scraping,Data Science,我试图将这些数据解析成有意义的格式。我无法摆脱\t\n\t\t\t。请帮忙 #Loading the rvest package library('rvest') # Define the url once. URL <- "https://rotogrinders.com/pages/pga-course-history-743469" tablescrape_html <- read_html(URL) tablescrape_html tablescrape_html %

我试图将这些数据解析成有意义的格式。我无法摆脱\t\n\t\t\t。请帮忙

#Loading the rvest package
library('rvest')

# Define the url once.
URL <- "https://rotogrinders.com/pages/pga-course-history-743469"

tablescrape_html <- read_html(URL)
tablescrape_html

tablescrape_html %>%
html_nodes("table") %>%
head()


tablescrape_html %>%

html_nodes("tr") %>% #grab the <td> tags
html_text() %>% # isolate the text from the html tages
gsub("^\\s+|\\s+$", "", .) %>% #strip the white space from the beginning and  end of a string.
head(n=100) # take a peek at the first 100 records

您需要将向量按\t\n进行拆分。为了使它成为一个数据帧，然后强制所有向量具有相同的长度，并将所有行绑定到一个表中

library(tidyverse)
tablescrape_html %>%
  html_nodes("tr") %>% #grab the <td> tags
  html_text() %>% # isolate the text from the html tages
  gsub("^\\s+|\\s+$", "", .) %>% 
  str_split("\\t\\n\\t+[ \t]*") %>% 
  map(`length<-` ,7) %>% 
  do.call(rbind,.)

将制表符制成的分隔符、行尾和空格替换为单空格，并将标题和填充集传递到read.table：

input <- tablescrape_html %>%

html_nodes("tr") %>% 
html_text() %>% 
gsub("[\t\n ]+", " ", .) %>% 
read.table(text=., fill=TRUE, header=TRUE)
str(input)
> str(input)
'data.frame':   133 obs. of  15 variables:
 $ Golfer  : Factor w/ 107 levels "Aaron","Adam",..: 40 79 71 20 2 14 69 77 100 102 ...
 $ Rounds  : Factor w/ 129 levels "Armour","Baddeley",..: 60 12 73 111 44 47 78 11 48 110 ...
 $ Avg     : Factor w/ 18 levels "0","10","11",..: 1 10 6 12 2 5 2 8 8 4 ...
 $ Score   : Factor w/ 67 levels "","0","10","14",..: 1 37 11 12 25 18 44 7 15 35 ...
 $ Avg.1   : Factor w/ 74 levels "","13.00","14.00",..: 1 35 56 46 36 34 36 33 51 59 ...
 $ Fairways: Factor w/ 104 levels "","134.15","202.68",..: 1 41 44 64 49 42 46 69 35 32 ...
 $ Hit     : num  NA 33.5 47.8 42 37.7 ...
 $ Avg.2   : num  NA 28.4 27.4 26.5 28.1 ...
 $ Drive   : num  NA NA NA NA NA NA NA NA NA NA ...
 $ Yards   : logi  NA NA NA NA NA NA ...
 $ Avg.3   : logi  NA NA NA NA NA NA ...
 $ Greens  : logi  NA NA NA NA NA NA ...
 $ Hit.1   : logi  NA NA NA NA NA NA ...
 $ Avg.4   : logi  NA NA NA NA NA NA ...
 $ Putts   : logi  NA NA NA NA NA NA ...

您的标题说要将其强制为data.frame，而在描述中，它是要删除\t\n。如果最终输出是data.frame，则不清楚列是什么