Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/68.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Webscraping:如何将URL提供给函数_R_Web Scraping - Fatal编程技术网

Webscraping:如何将URL提供给函数

Webscraping:如何将URL提供给函数,r,web-scraping,R,Web Scraping,我的最终目标是能够从该页面及其以下内容中获取所有310篇文章,并通过此功能运行它: 库(tidyverse) 图书馆(rvest) 图书馆(stringr) 图书馆(purrr) 图书馆(lubridate) 图书馆(dplyr) scrape_docs您可以通过执行以下操作获得所有url library(rvest) source_col <- "https://www.presidency.ucsb.edu/advanced-search?field-keywords=%22spac

我的最终目标是能够从该页面及其以下内容中获取所有310篇文章,并通过此功能运行它:

库(tidyverse)
图书馆(rvest)
图书馆(stringr)
图书馆(purrr)
图书馆(lubridate)
图书馆(dplyr)

scrape_docs您可以通过执行以下操作获得所有url

library(rvest)

source_col <- "https://www.presidency.ucsb.edu/advanced-search?field-keywords=%22space%20exploration%22&field-keywords2=&field-keywords3=&from%5Bdate%5D=&to%5Bdate%5D=&person2=&items_per_page=100&page=0"

all_urls <- source_col %>%
              read_html() %>%
              html_nodes("td a") %>%
              html_attr("href") %>%
             .[c(FALSE, TRUE)] %>%
              paste0("https://www.presidency.ucsb.edu", .)
在1个URL上测试函数
scrape_docs

scrape_docs(all_urls[1])

#$speaker
#[1] "Dwight D. Eisenhower"

#$date
#[1] "1958-04-02"

#$title
#[1] "Special Message to the Congress Relative to Space Science and Exploration."

#$text
#[1] "\n    To the Congress of the United States:\nRecent developments in long-range 
#    rockets for military purposes have for the first time provided man with new mac......

您可以通过执行以下操作获取所有url

library(rvest)

source_col <- "https://www.presidency.ucsb.edu/advanced-search?field-keywords=%22space%20exploration%22&field-keywords2=&field-keywords3=&from%5Bdate%5D=&to%5Bdate%5D=&person2=&items_per_page=100&page=0"

all_urls <- source_col %>%
              read_html() %>%
              html_nodes("td a") %>%
              html_attr("href") %>%
             .[c(FALSE, TRUE)] %>%
              paste0("https://www.presidency.ucsb.edu", .)
在1个URL上测试函数
scrape_docs

scrape_docs(all_urls[1])

#$speaker
#[1] "Dwight D. Eisenhower"

#$date
#[1] "1958-04-02"

#$title
#[1] "Special Message to the Congress Relative to Space Science and Exploration."

#$text
#[1] "\n    To the Congress of the United States:\nRecent developments in long-range 
#    rockets for military purposes have for the first time provided man with new mac......