R Yelp餐馆信息网页抓取（运行循环获取多个餐馆信息时遇到错误）_R_Web Scraping

R Yelp餐馆信息网页抓取（运行循环获取多个餐馆信息时遇到错误）

r web-scraping

R Yelp餐馆信息网页抓取（运行循环获取多个餐馆信息时遇到错误）,r,web-scraping,R,Web Scraping,试图通过网络从Yelp中获取餐厅信息，如价格范围（$$）、价格说明、酒精、电话、网站、健康评分。这段代码在两个餐厅（Dirty French和Uncle Boons）中运行得非常好，但在对餐厅遗留记录使用相同的代码时，它开始显示错误。这是因为我在酒精代码（以及代码中未显示的网站）中使用的XPath对于肮脏的法语和布恩叔叔以及遗留记录是不同的。此外，旧式记录没有价格范围，但仍显示在输出中有没有什么方法可以让我在不同的餐厅中循环，甚至在XPath保持不变的情况下，或者在每个餐厅中XPath都会自行

试图通过网络从Yelp中获取餐厅信息，如价格范围（$$）、价格说明、酒精、电话、网站、健康评分。这段代码在两个餐厅（Dirty French和Uncle Boons）中运行得非常好，但在对餐厅遗留记录使用相同的代码时，它开始显示错误。这是因为我在酒精代码（以及代码中未显示的网站）中使用的XPath对于肮脏的法语和布恩叔叔以及遗留记录是不同的。此外，旧式记录没有价格范围，但仍显示在输出中

有没有什么方法可以让我在不同的餐厅中循环，甚至在XPath保持不变的情况下，或者在每个餐厅中XPath都会自行改变的情况下，也可以获得所需的信息？我正在收集1000多家餐馆的数据，所以我想不出每次都要手动更改代码

我走的方向对吗？有更好的办法吗

此代码可以很好地在您的系统中复制

actual_name <- data.frame(actual_name = c("dirty-french-new-york", "uncle- 
boons-new-york", 
                                      "legacy-records-new-york"))


titles <- c()
urls <- c()

urls <- paste(initial, actual_name$actual_name, sep = "")

map_df(urls, function(i){
  url <- read_html(i)

  data.frame(Title = url %>% html_node("title") %>% html_text(),
         HealthScore = url %>% html_node(".health-score-description") %>% 
html_text(), 
         Rating = url %>%
           html_node(xpath = "//*   [@id='wrap']/div[2]/div/div[1]/div/div[3]/div[1]/div[2]/div[1]/div[1]/div") 
%>%
           html_attr("title"),
         Phone = url %>% html_node(".biz-phone") %>% html_text(),
         Price = url %>% html_node(".price-range") %>% html_text(),
         PriceDescription = url %>% html_node(".price-description") %>% 
html_text(),
         Alcohol = url %>%
           html_nodes(xpath = "//* [@id='wrap']/div[2]/div/div[1]/div/div[4]/div[1]/div/div[2]/ul/li[3]/span[2]/a") %>%
           html_text())
}) -> titles

实际名称%
html_节点（xpath=“//*[@id='wrap']]/div[2]/div/div[1]/div/div[3]/div[1]/div[2]/div[1]/div[1]/div”）
%>%
html_属性（“标题”），
Phone=url%%>%html\u节点（“.biz Phone”）%%>%html\u text（），
Price=url%>%html\u节点（“.Price range”）%>%html\u text（），
PriceDescription=url%>%html\U节点（“.price description”）%>%
html_text（），
酒精=url%>%
html_节点（xpath=“//*[@id='wrap']]/div[2]/div/div[1]/div/div[4]/div[1]/div/div/div[2]/ul/li[3]/span[2]/a”）%%>%
html_text（））
})->标题