Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/71.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 拉网电影录像机_R_Web Scraping - Fatal编程技术网

R 拉网电影录像机

R 拉网电影录像机,r,web-scraping,R,Web Scraping,我正试图从moviemeter中获取电影名称、评级和年份,以便将其与imdb进行比较。我设法将imdb前250部电影放入一个包含标题、评级、排名和年份的数据框中。但是我好像没能让电影放映器运转 这是我的代码: url <- rvest::html("https://www.moviemeter.nl/list/") scrapemoviemeter <- rvest::html_nodes(x = url, css = ".film_row") head(scrapemoviem

我正试图从moviemeter中获取电影名称、评级和年份,以便将其与imdb进行比较。我设法将imdb前250部电影放入一个包含标题、评级、排名和年份的数据框中。但是我好像没能让电影放映器运转

这是我的代码:

url <- rvest::html("https://www.moviemeter.nl/list/")
 scrapemoviemeter <- rvest::html_nodes(x = url, css = ".film_row")
 head(scrapemoviemeter)
 moviemeter <- rvest::html_text(scrapemoviemeter, trim = TRUE)

如何将数据放入一个与评级、标题和年份分开的数据框中?

我认为使用XPath更容易。试试这个

library(rvest)
library(stringi)

url <- rvest::html("https://www.moviemeter.nl/list/")
scores <- rvest::html_nodes(x = url, xpath = "/html/body/div[1]/div[4]/div/div[3]/*//span[@class='score']")
scores <- rvest::html_text(scores, trim = TRUE)
names <- rvest::html_nodes(x = url, xpath = "/html/body/div[1]/div[4]/div/div[3]/*//a[@class='tooltip']")
names <- rvest::html_text(names, trim = TRUE)
years <- rvest::html_nodes(x = url, xpath = "/html/body/div[1]/div[4]/div/div[3]//div[@class='film_row']/text() ")
years <- rvest::html_text(years, trim = TRUE)
years <- stri_extract(years, regex = "\\b\\d{4}\\b")
years <- years[!is.na(years)]

names <- unlist(names)
scores <- unlist(scores)
years <- unlist(years)

df <- cbind(names, scores, years)
df <- as.data.frame(df)
库(rvest)
图书馆(stringi)

url如果您有IMDB id,请使用MovieMeter API vs scraping:

library(moviemeter) # devtools::install_github("hrbrmstr/moviemeter")
library(purrr)

imdb_ids <- c("tt1107846", "tt0282552", "tt0048199")

map_df(imdb_ids, function(x) {
  mm <- mm_get_movie_info(x)
  mm <- map(mm, ~. %||% NA)  # the javascript has nulls, so get rid of them
  mm[c(1:11)]                # remove posters, countries, genres, actors and directors
}) -> df

dplyr::glimpse(df)
## Observations: 3
## Variables: 11
## $ id                <int> 57161, 6465, 33351
## $ url               <chr> "https://www.moviemeter.nl/film/57161", "https://www.moviemeter.nl/film/6465", "https://www.moviemeter.nl/film/33351"
## $ year              <int> 2007, 2002, 1955
## $ imdb              <chr> "tt1107846", "tt0282552", "tt0048199"
## $ title             <chr> "Theft", "Riders", "Illegal"
## $ display_title     <chr> "Theft", "Riders", "Illegal"
## $ alternative_title <chr> NA, "Steal", NA
## $ plot              <chr> "Een naïeve dorpsjongen wordt verliefd op een crimineel. Guy was altijd een nette beschaafde jongen, wie had er ooi...
## $ duration          <int> 90, 83, 88
## $ votes_count       <int> 1, 293, 20
## $ average           <dbl> 2.00, 2.55, 3.42
library(moviemeter)#devtools::install_github(“hrbrmstr/moviemeter”)
图书馆(purrr)

imdb_ids刮取imdb违反了其服务条款。所以,如果你从抓取IMDB中获得IMDB ID,你就违反了他们的服务条款。MovieMeter有一个API。有一个R软件包可以与API一起使用,他们在发布他们数据中的任何衍生作品时也会请求引用/归属。@indian friends-与其建议对下面的答案进行编辑以删除所有文本,不如删除问题该问题发生了什么事?它从IMDB+MovieMeter&R发展到能量饮料和Python。这个问题应该删除。OP知道编辑历史是完全可用的,对吗?
library(moviemeter) # devtools::install_github("hrbrmstr/moviemeter")
library(purrr)

imdb_ids <- c("tt1107846", "tt0282552", "tt0048199")

map_df(imdb_ids, function(x) {
  mm <- mm_get_movie_info(x)
  mm <- map(mm, ~. %||% NA)  # the javascript has nulls, so get rid of them
  mm[c(1:11)]                # remove posters, countries, genres, actors and directors
}) -> df

dplyr::glimpse(df)
## Observations: 3
## Variables: 11
## $ id                <int> 57161, 6465, 33351
## $ url               <chr> "https://www.moviemeter.nl/film/57161", "https://www.moviemeter.nl/film/6465", "https://www.moviemeter.nl/film/33351"
## $ year              <int> 2007, 2002, 1955
## $ imdb              <chr> "tt1107846", "tt0282552", "tt0048199"
## $ title             <chr> "Theft", "Riders", "Illegal"
## $ display_title     <chr> "Theft", "Riders", "Illegal"
## $ alternative_title <chr> NA, "Steal", NA
## $ plot              <chr> "Een naïeve dorpsjongen wordt verliefd op een crimineel. Guy was altijd een nette beschaafde jongen, wie had er ooi...
## $ duration          <int> 90, 83, 88
## $ votes_count       <int> 1, 293, 20
## $ average           <dbl> 2.00, 2.55, 3.42