在R中读取网站的html代码时遇到错误_R_Web Scraping_Rvest

在R中读取网站的html代码时遇到错误

r web-scraping

在R中读取网站的html代码时遇到错误,r,web-scraping,rvest,R,Web Scraping,Rvest,我试图从一个网站上读取html代码以获取一些数据，但我遇到了一个奇怪的错误下面是一个示例链接：www.boxofficemojo.com/movies/？id=avatar.htm 代码如下： library(RCurl) library(XML) library(rvest) url <- paste0("www.boxofficemojo.com",movies_table[1,1]) webpage <- read_html(url) gross_data_html &

我试图从一个网站上读取html代码以获取一些数据，但我遇到了一个奇怪的错误

下面是一个示例链接：www.boxofficemojo.com/movies/？id=avatar.htm

代码如下：

library(RCurl)
library(XML)
library(rvest)

url <- paste0("www.boxofficemojo.com",movies_table[1,1])

webpage <- read_html(url)

gross_data_html <- html_nodes(webpage,".mp_box_content b")

结果：

library(RCurl)
library(XML)
library(rvest)

url <- paste0("www.boxofficemojo.com",movies_table[1,1])

webpage <- read_html(url)
> Error: 'www.boxofficemojo.com/movies/?id=avatar.htm' does not exist in current working directory ('C:/Users/Will/Documents').

gross_data_html <- html_nodes(webpage,".mp_box_content b")
> Error in html_nodes(webpage, ".mp_box_content b") : object 'webpage' not found

为什么会这样？它是否与文件类型为.htm而不是.html有关？

如果要发送URL以读取html，则需要在其前面加上http://，否则函数将假定输入是不存在的本地文件路径

错：

read_html('www.boxofficemojo.com/movies/?id=avatar.htm')

对:

read_html('http://www.boxofficemojo.com/movies/?id=avatar.htm')

删除那个网站是违法的