Webscraping:如何将URL提供给函数
我的最终目标是能够从该页面及其以下内容中获取所有310篇文章,并通过此功能运行它:Webscraping:如何将URL提供给函数,r,web-scraping,R,Web Scraping,我的最终目标是能够从该页面及其以下内容中获取所有310篇文章,并通过此功能运行它: 库(tidyverse) 图书馆(rvest) 图书馆(stringr) 图书馆(purrr) 图书馆(lubridate) 图书馆(dplyr) scrape_docs您可以通过执行以下操作获得所有url library(rvest) source_col <- "https://www.presidency.ucsb.edu/advanced-search?field-keywords=%22spac
库(tidyverse)
图书馆(rvest)
图书馆(stringr)
图书馆(purrr)
图书馆(lubridate)
图书馆(dplyr)
scrape_docs您可以通过执行以下操作获得所有url
library(rvest)
source_col <- "https://www.presidency.ucsb.edu/advanced-search?field-keywords=%22space%20exploration%22&field-keywords2=&field-keywords3=&from%5Bdate%5D=&to%5Bdate%5D=&person2=&items_per_page=100&page=0"
all_urls <- source_col %>%
read_html() %>%
html_nodes("td a") %>%
html_attr("href") %>%
.[c(FALSE, TRUE)] %>%
paste0("https://www.presidency.ucsb.edu", .)
在1个URL上测试函数scrape_docs
scrape_docs(all_urls[1])
#$speaker
#[1] "Dwight D. Eisenhower"
#$date
#[1] "1958-04-02"
#$title
#[1] "Special Message to the Congress Relative to Space Science and Exploration."
#$text
#[1] "\n To the Congress of the United States:\nRecent developments in long-range
# rockets for military purposes have for the first time provided man with new mac......
您可以通过执行以下操作获取所有url
library(rvest)
source_col <- "https://www.presidency.ucsb.edu/advanced-search?field-keywords=%22space%20exploration%22&field-keywords2=&field-keywords3=&from%5Bdate%5D=&to%5Bdate%5D=&person2=&items_per_page=100&page=0"
all_urls <- source_col %>%
read_html() %>%
html_nodes("td a") %>%
html_attr("href") %>%
.[c(FALSE, TRUE)] %>%
paste0("https://www.presidency.ucsb.edu", .)
在1个URL上测试函数scrape_docs
scrape_docs(all_urls[1])
#$speaker
#[1] "Dwight D. Eisenhower"
#$date
#[1] "1958-04-02"
#$title
#[1] "Special Message to the Congress Relative to Space Science and Exploration."
#$text
#[1] "\n To the Congress of the United States:\nRecent developments in long-range
# rockets for military purposes have for the first time provided man with new mac......