R dplyr与HTML（或XML）文档、节点或节点集发生变异_R_Dplyr_Rvest

R dplyr与HTML（或XML）文档、节点或节点集发生变异

R dplyr与HTML（或XML）文档、节点或节点集发生变异,r,dplyr,rvest,R,Dplyr,Rvest,我有一个包含多个HTML链接的文件，现在我想使用dplyr和rvest来获取每行每个链接的图像链接当我手动执行此操作时，它工作正常并返回行，但当在函数中调用相同的代码时，它会失败，并出现以下错误：错误：“xml\u find\u all”没有适用于的对象的方法类别“因子” 我不知道我做错了什么。感谢您的帮助。为了使我的问题更清楚，我（在评论中）添加了一些示例行，并展示了手动方法 library(rvest) library(dplyr) library(httr) # contains f

我有一个包含多个HTML链接的文件，现在我想使用dplyr和rvest来获取每行每个链接的图像链接

当我手动执行此操作时，它工作正常并返回行，但当在函数中调用相同的代码时，它会失败，并出现以下错误：

错误：“xml\u find\u all”没有适用于的对象的方法类别“因子”

我不知道我做错了什么。感谢您的帮助。为了使我的问题更清楚，我（在评论中）添加了一些示例行，并展示了手动方法

library(rvest)
library(dplyr)
library(httr) # contains function stop_for_status()

#get html links from file
#EXAMPLE

# "_id",url

# 560fc55c65818bee0b77ec33,http://www.seriouseats.com/recipes/2011/01/sriracha-ceviche-recipe.html
# 560fc57e65818bee0b78d8b7,http://www.seriouseats.com/recipes/2008/07/pasta-arugula-tomatoes-recipe.html
# 560fc57e65818bee0b78dcde,http://www.seriouseats.com/recipes/2007/08/cook-the-book-minty-boozy-chic.html
# 560fc57e65818bee0b78de93,http://www.seriouseats.com/recipes/2010/02/chipped-beef-gravy-on-toast-stew-on-a-shingle-recipe.html
# 560fc57e65818bee0b78dfe6,http://www.seriouseats.com/recipes/2011/05/dinner-tonight-quinoa-salad-with-lemon-cream.html
# 560fc58165818bee0b78e65e,http://www.seriouseats.com/recipes/2010/10/dinner-tonight-spicy-quinoa-salad-recipe.html

#
#load into SE
#
SE <- read.csv("~/Desktop/SeriousEats.csv")

#
#function to retrieve imgPath per URL
#using rvest
#      
getImgPath <- function(x) {

  imgPath <- x %>% html_nodes(".photo") %>% html_attr("src")
  stop_for_status(res)
  return(imgPath)
}

#This works fine
#UrlPage <- read_html ("http://www.seriouseats.com/recipes/2011/01/sriracha-ceviche-recipe.html")
#imgPath <- UrlPage %>% html_nodes(".photo") %>% html_attr("src")

#
#This throws an error msg
#
S <- mutate(SE, imgPath = getImgPath(SE$url))

库（rvest）
图书馆（dplyr）
库（httr）#包含_status（）的函数stop_
#从文件中获取html链接
#范例
#“_id”，网址
#560fc55c65818bee0b77ec33，http://www.seriouseats.com/recipes/2011/01/sriracha-ceviche-recipe.html
#560fc57e65818bee0b78d8b7，http://www.seriouseats.com/recipes/2008/07/pasta-arugula-tomatoes-recipe.html
#560fc57e65818bee0b78dcde，http://www.seriouseats.com/recipes/2007/08/cook-the-book-minty-boozy-chic.html
#560fc57e65818bee0b78de93，http://www.seriouseats.com/recipes/2010/02/chipped-beef-gravy-on-toast-stew-on-a-shingle-recipe.html
#560fc57e65818bee0b78dfe6，http://www.seriouseats.com/recipes/2011/05/dinner-tonight-quinoa-salad-with-lemon-cream.html
#560fc58165818bee0b78e65e，http://www.seriouseats.com/recipes/2010/10/dinner-tonight-spicy-quinoa-salad-recipe.html
#
#装入SE
#
SE这是有效的：
library(rvest)
library(dplyr)

# SE <- data_frame(url = c(
#    "http://www.seriouseats.com/recipes/2011/01/sriracha-ceviche-recipe.html",
#    "http://www.seriouseats.com/recipes/2008/07/pasta-arugula-tomatoes-recipe.html"
# ))

SE <- read.csv('/path/to/SeriousEats.csv', stringsAsFactors = FALSE)

getImgPath <- function(x) {
    # x must be "a document, a node set or a single node" per rvest documentation; cannot be a factor or character
    imgPath <- read_html(x) %>% html_nodes(".photo") %>% html_attr("src")
    # httr::stop_for_status(res) OP said this is not necessary, so I removed
    return(imgPath)
}

S <- SE %>% 
    rowwise() %>%
    mutate(imgPath = getImgPath(url))

库（rvest）
图书馆（dplyr）
#SE感谢您的帮助、耐心和@Jubbles。为了其他人的利益，这里是完整的答案
library(rvest)
library(dplyr)

SE <- read.csv("~/Desktop/FILE.txt", stringsAsFactors = FALSE)

getImgPath <- function(x) {

  if (try(url.exists(x))) {
  imgPath <- html(x) %>% 
            html_nodes(".photo") %>% 
              html_attr("src")
} 
else {
imgPath = "NA"
}
 #imgPath
 return(imgPath)
}

SE1 <- SE %>% 
  rowwise() %>%
  mutate(imgPath = getImgPath(url))

库（rvest）
图书馆（dplyr）
SE%
变异（imgPath=getImgPath（url））
trymutate（SE，imgPath=getImgPath（url））
。我认为当mutate
希望逐行操作时，通过使用$
您引用的是整个列，请dput
您的SE
对象（或至少是其中的一部分）。URL是否被视为因素？在read.csv
命令中尝试stringsAsFactors=F
？否。错误：没有适用于类为“character”的对象的“xml\u find\u all”的方法在getImgPath（）函数中什么是res
？我看不到它在您的代码中的任何位置分配。谢谢：）捕捉到了它。@DirkLX我用rowwise（）
更新了它，使它在您的代码中每行都能工作file@Jubbles对不起，我还是很困惑。你的第一行对我来说毫无意义，因为我需要从我的文件中获取数据。我把它改成了这个