如何将此数据强制放入data.frame?
我试图将这些数据解析成有意义的格式。我无法摆脱\t\n\t\t\t。请帮忙如何将此数据强制放入data.frame?,r,web-scraping,data-science,R,Web Scraping,Data Science,我试图将这些数据解析成有意义的格式。我无法摆脱\t\n\t\t\t。请帮忙 #Loading the rvest package library('rvest') # Define the url once. URL <- "https://rotogrinders.com/pages/pga-course-history-743469" tablescrape_html <- read_html(URL) tablescrape_html tablescrape_html %
#Loading the rvest package
library('rvest')
# Define the url once.
URL <- "https://rotogrinders.com/pages/pga-course-history-743469"
tablescrape_html <- read_html(URL)
tablescrape_html
tablescrape_html %>%
html_nodes("table") %>%
head()
tablescrape_html %>%
html_nodes("tr") %>% #grab the <td> tags
html_text() %>% # isolate the text from the html tages
gsub("^\\s+|\\s+$", "", .) %>% #strip the white space from the beginning and end of a string.
head(n=100) # take a peek at the first 100 records
您需要将向量按\t\n进行拆分。为了使它成为一个数据帧,然后强制所有向量具有相同的长度,并将所有行绑定到一个表中
library(tidyverse)
tablescrape_html %>%
html_nodes("tr") %>% #grab the <td> tags
html_text() %>% # isolate the text from the html tages
gsub("^\\s+|\\s+$", "", .) %>%
str_split("\\t\\n\\t+[ \t]*") %>%
map(`length<-` ,7) %>%
do.call(rbind,.)
将制表符制成的分隔符、行尾和空格替换为单空格,并将标题和填充集传递到read.table:
input <- tablescrape_html %>%
html_nodes("tr") %>%
html_text() %>%
gsub("[\t\n ]+", " ", .) %>%
read.table(text=., fill=TRUE, header=TRUE)
str(input)
> str(input)
'data.frame': 133 obs. of 15 variables:
$ Golfer : Factor w/ 107 levels "Aaron","Adam",..: 40 79 71 20 2 14 69 77 100 102 ...
$ Rounds : Factor w/ 129 levels "Armour","Baddeley",..: 60 12 73 111 44 47 78 11 48 110 ...
$ Avg : Factor w/ 18 levels "0","10","11",..: 1 10 6 12 2 5 2 8 8 4 ...
$ Score : Factor w/ 67 levels "","0","10","14",..: 1 37 11 12 25 18 44 7 15 35 ...
$ Avg.1 : Factor w/ 74 levels "","13.00","14.00",..: 1 35 56 46 36 34 36 33 51 59 ...
$ Fairways: Factor w/ 104 levels "","134.15","202.68",..: 1 41 44 64 49 42 46 69 35 32 ...
$ Hit : num NA 33.5 47.8 42 37.7 ...
$ Avg.2 : num NA 28.4 27.4 26.5 28.1 ...
$ Drive : num NA NA NA NA NA NA NA NA NA NA ...
$ Yards : logi NA NA NA NA NA NA ...
$ Avg.3 : logi NA NA NA NA NA NA ...
$ Greens : logi NA NA NA NA NA NA ...
$ Hit.1 : logi NA NA NA NA NA NA ...
$ Avg.4 : logi NA NA NA NA NA NA ...
$ Putts : logi NA NA NA NA NA NA ...
您的标题说要将其强制为data.frame,而在描述中,它是要删除\t\n。如果最终输出是data.frame,则不清楚列是什么