R 问题：在多个页面上刮取已停止工作_R_Rvest

R 问题：在多个页面上刮取已停止工作

R 问题：在多个页面上刮取已停止工作,r,rvest,R,Rvest,我正在从Tripadvisor中删除对酒店的所有评论，我的代码导致以下错误：data.frame中的错误（Textocontario，fechaComentario）：参数表示不同的行数：6，5 我使用了下面的代码来刮取另一家酒店，它成功了，但我无法确定错误在哪里。我尝试过使用不同的CSS选择器，但没有任何效果。我能够完整地运行代码一次，但同样的检查却一遍又一遍地重复。我不知道如何解决这个问题。我附上我的代码以便于理解问题 library(dplyr) library(rvest) #Lin

我正在从Tripadvisor中删除对酒店的所有评论，我的代码导致以下错误：data.frame中的错误（Textocontario，fechaComentario）：参数表示不同的行数：6，5

我使用了下面的代码来刮取另一家酒店，它成功了，但我无法确定错误在哪里。我尝试过使用不同的CSS选择器，但没有任何效果。我能够完整地运行代码一次，但同样的检查却一遍又一遍地重复。我不知道如何解决这个问题。我附上我的代码以便于理解问题

library(dplyr)
library(rvest)
#Link
web <- read_html("https://www.tripadvisor.es/Hotel_Review-g187499-d239247-Reviews-Melia_Girona-Girona_Province_of_Girona_Catalonia.html")
# Dataset to download the review sections
# 1. Texto comentarios
textoComentario<-web%>%
  html_nodes(".location-review-review-list-parts-ExpandableReview__reviewText--gOmRC span")%>%
  html_text()
textoComentario

# 2. Fecha comentario
fechaComentario<-web%>%
  html_nodes(".location-review-review-list-parts-EventDate__event_date--1epHa")%>%
  html_text()
fechaComentario <- strsplit(fechaComentario, ": ")
fechaComentario <- unlist(lapply(fechaComentario, FUN = function(x) {x[2]}))
fechaComentario


datos<-data.frame(textoComentario,fechaComentario)

# To go through all the review pages
for(i in 1:174){
  # 1. url
  
  url<-paste0("https://www.tripadvisor.es/Hotel_Review-g187499-d239247-Reviews-or",i*10,"-Melia_Girona-Girona_Province_of_Girona_Catalonia.htm")
  
  
  pagina<-read_html(url)
  
 
  textoComentario<-pagina%>%
    html_nodes(".location-review-review-list-parts-ExpandableReview__reviewText--gOmRC span")%>%
    html_text()
  textoComentario

  fechaComentario<-pagina%>%
    html_nodes(".location-review-review-list-parts-EventDate__event_date--1epHa")%>%
    html_text()
  fechaComentario <- strsplit(fechaComentario, ": ")
  fechaComentario <- unlist(lapply(fechaComentario, FUN = function(x) {x[2]}))
  fechaComentario


  
  nuevosDatos<-data.frame(textoComentario,fechaComentario)
  
 
  datos<-rbind(datos,nuevosDatos)
  
  print(paste0("PÃƒÂ¡gina ",i))
}

df<- datos

库（dplyr）
图书馆（rvest）
#链接
网络百分比
html_text（）
文本内容
# 2. 费查·科门塔里奥
费沙库曼塔里奥%
html\u节点（“.location-review-review-list-parts-EventDate\uu event\u date--1epHa”）%>%
html_text（）
fechaComentariotextoComentario
和fechaComentario
的长度不同，因此不能像您这样组合在一个数据框中。以下代码通过在将变量组合到数据帧之前向较短的变量添加NAs
来解决此问题：
library(dplyr)
library(rvest)

Link
web <- read_html("https://www.tripadvisor.es/Hotel_Review-g187499-d239247-Reviews-Melia_Girona-Girona_Province_of_Girona_Catalonia.html")
# Dataset to download the review sections
# 1. Texto comentarios
textoComentario<-web%>%
  html_nodes(".location-review-review-list-parts-ExpandableReview__reviewText--gOmRC span")%>%
  html_text()
textoComentario

# 2. Fecha comentario
fechaComentario<-web%>%
  html_nodes(".location-review-review-list-parts-EventDate__event_date--1epHa")%>%
  html_text()
fechaComentario <- strsplit(fechaComentario, ": ")
fechaComentario <- unlist(lapply(fechaComentario, FUN = function(x) {x[2]}))
fechaComentario


datos<-data.frame(textoComentario,fechaComentario)

# To go through all the review pages
for(i in 1:174){
  # 1. url

  url<-paste0("https://www.tripadvisor.es/Hotel_Review-g187499-d239247-Reviews-or",i*10,"-Melia_Girona-Girona_Province_of_Girona_Catalonia.htm")
  print(i)

  pagina<-read_html(url)


  textoComentario<-pagina%>%
    html_nodes(".location-review-review-list-parts-ExpandableReview__reviewText--gOmRC span")%>%
    html_text()


  fechaComentario<-pagina%>%
    html_nodes(".location-review-review-list-parts-EventDate__event_date--1epHa")%>%
    html_text()
  fechaComentario <- strsplit(fechaComentario, ": ")
  fechaComentario <- unlist(lapply(fechaComentario, FUN = function(x) {x[2]}))
  fechaComentario

  #make sure variables have equal length, if not add NAs to shorter variable
  if (length(textoComentario) < length(fechaComentario)) {textoComentario[length(textoComentario):length(fechaComentario)] <- NA}
  if (length(fechaComentario) < length(textoComentario)) {fechaComentario[length(fechaComentario):length(textoComentario)] <- NA}

  nuevosDatos<-data.frame(textoComentario,fechaComentario)

  datos<-rbind(datos,nuevosDatos)

  print(paste0("PÃƒÂ¡gina ",i))
}


这能解决您的问题吗？
textoComentario
和fechaComentario
的长度不同，因此不能按您的方式组合在数据框中。以下代码通过在将变量组合到数据帧之前向较短的变量添加NAs
来解决此问题：
library(dplyr)
library(rvest)

Link
web <- read_html("https://www.tripadvisor.es/Hotel_Review-g187499-d239247-Reviews-Melia_Girona-Girona_Province_of_Girona_Catalonia.html")
# Dataset to download the review sections
# 1. Texto comentarios
textoComentario<-web%>%
  html_nodes(".location-review-review-list-parts-ExpandableReview__reviewText--gOmRC span")%>%
  html_text()
textoComentario

# 2. Fecha comentario
fechaComentario<-web%>%
  html_nodes(".location-review-review-list-parts-EventDate__event_date--1epHa")%>%
  html_text()
fechaComentario <- strsplit(fechaComentario, ": ")
fechaComentario <- unlist(lapply(fechaComentario, FUN = function(x) {x[2]}))
fechaComentario


datos<-data.frame(textoComentario,fechaComentario)

# To go through all the review pages
for(i in 1:174){
  # 1. url

  url<-paste0("https://www.tripadvisor.es/Hotel_Review-g187499-d239247-Reviews-or",i*10,"-Melia_Girona-Girona_Province_of_Girona_Catalonia.htm")
  print(i)

  pagina<-read_html(url)


  textoComentario<-pagina%>%
    html_nodes(".location-review-review-list-parts-ExpandableReview__reviewText--gOmRC span")%>%
    html_text()


  fechaComentario<-pagina%>%
    html_nodes(".location-review-review-list-parts-EventDate__event_date--1epHa")%>%
    html_text()
  fechaComentario <- strsplit(fechaComentario, ": ")
  fechaComentario <- unlist(lapply(fechaComentario, FUN = function(x) {x[2]}))
  fechaComentario

  #make sure variables have equal length, if not add NAs to shorter variable
  if (length(textoComentario) < length(fechaComentario)) {textoComentario[length(textoComentario):length(fechaComentario)] <- NA}
  if (length(fechaComentario) < length(textoComentario)) {fechaComentario[length(fechaComentario):length(textoComentario)] <- NA}

  nuevosDatos<-data.frame(textoComentario,fechaComentario)

  datos<-rbind(datos,nuevosDatos)

  print(paste0("PÃƒÂ¡gina ",i))
}


这解决了你的问题吗？
我认为问题在于，循环中至少有一次迭代在textoComentario
中返回6个元素，而不是5个。我认为问题在于循环中至少有一次迭代在textoComentario
中返回6个元素，而不是5个。太棒了！你能把问题标记为已解决吗（在左边打勾）？太好了！你能把问题标记为已解决吗（在左边打勾）？