R 将每4行转换为4个单独的列
我试图通过以下循环从IMDB中获取日期、标题和评论:R 将每4行转换为4个单独的列,r,list,web-scraping,transpose,rvest,R,List,Web Scraping,Transpose,Rvest,我试图通过以下循环从IMDB中获取日期、标题和评论: library(rvest) library(dplyr) library(stringr) library(tidyverse) ID <- 4633694 data <- lapply(paste0('http://www.imdb.com/title/tt', ID, '/reviews?filter=prolific', 1:20), function
library(rvest)
library(dplyr)
library(stringr)
library(tidyverse)
ID <- 4633694
data <- lapply(paste0('http://www.imdb.com/title/tt', ID, '/reviews?filter=prolific', 1:20),
function(url){
url %>% read_html() %>%
html_nodes(".review-date,.rating-other-user-rating,.title,.show-more__control") %>%
html_text() %>%
gsub('[\r\n\t]', '', .)
})
我想知道是否有一种方法可以将每4行转换为单独的列,以便每个属性在适当的列中对齐,如下所示:
Date Rating Title Review
1. 14 December 2018 10/10 If this was.. I have to...
2. 17 December 2018 10/10 Stan Lee Is... A movie worthy...
3. 20 December 2018 10/10 the most visually.. There's hardly anything...
这里有一个方法
数据:
x为什么不data.frame(矩阵(trimws(x$col1),ncol=6,byrow=T,stringsAsFactors=F)
,如果每个块的行数始终相同,则该方法有效。问题中提供的数据并非如此。(我很乐意承认样本数据不太可能是“真实的”,因此不具有代表性。)OP说,这种模式正在重复。我想这可能更有意义
Date Rating Title Review
1. 14 December 2018 10/10 If this was.. I have to...
2. 17 December 2018 10/10 Stan Lee Is... A movie worthy...
3. 20 December 2018 10/10 the most visually.. There's hardly anything...
text_data = gsub('\\b(\\d+/\\d+)\\b','\n\\1',paste(grep('\\w',x$col1,value = TRUE),collapse = ':'))
read.csv(text=text_data,h=F,sep=":",strip.white = T,fill=T,stringsAsFactors = F)
V1 V2 V3 V4 V5
1 10/10 If this was.. 14 December 2018 I have to say, and no... NA
2 10/10 Stan Lee Is Smiling Right Now... 17 December 2018 A movie worthy of... NA
3 10/10 the most visually stunning film I've ever seen... 20 December 2018 There's hardly anything... NA