使用R从url下载几个Excel文件
问题 我想用R从一个网站下载几个Excel文件。我不想单独命名每个链接,因为有很多链接。我看了以前的帖子,但他们的网站链接不起作用,所以我看不出答案与网站结构的关系 我正在测试的网站是: 尝试 我尝试获取附件的完整链接的名称:使用R从url下载几个Excel文件,r,web-scraping,R,Web Scraping,问题 我想用R从一个网站下载几个Excel文件。我不想单独命名每个链接,因为有很多链接。我看了以前的帖子,但他们的网站链接不起作用,所以我看不出答案与网站结构的关系 我正在测试的网站是: 尝试 我尝试获取附件的完整链接的名称: url <- "https://digital.nhs.uk/data-and-information/publications/statistical/adult-psychiatric-morbidity-survey/adult-psychiatri
url <- "https://digital.nhs.uk/data-and-information/publications/statistical/adult-psychiatric-morbidity-survey/adult-psychiatric-morbidity-survey-survey-of-mental-health-and-wellbeing-england-2014"
simple <- read_html(url)
files <- simple %>%
html_nodes(".attachment") %>%
html_text()
url这个XML
包有一些很好的工具来处理网页,特别是提取链接
library(XML)
url <- "https://digital.nhs.uk/data-and-information/publications/statistical/adult-psychiatric-morbidity-survey/adult-psychiatric-morbidity-survey-survey-of-mental-health-and-wellbeing-england-2014"
pageContent <- readLines(url)
Links <- getHTMLLinks(pageContent)
xlsFiles <- grep("\\.xls", Links)
Links[xlsFiles]
[1] "https://files.digital.nhs.uk/excel/9/s/apms-2014-ch-02-tabs.xls"
[2] "https://files.digital.nhs.uk/excel/9/b/apms-2014-ch-03-tabs.xls"
[3] "https://files.digital.nhs.uk/excel/a/i/apms-2014-ch-04-tabs.xls"
[4] "https://files.digital.nhs.uk/excel/a/t/apms-2014-ch-05-tabs.xls"
[5] "https://files.digital.nhs.uk/excel/b/m/apms-2014-ch-06-tabs.xls"
[6] "https://files.digital.nhs.uk/excel/b/p/apms-2014-ch-07-tabs.xls"
[7] "https://files.digital.nhs.uk/excel/b/l/apms-2014-ch-08-tabs.xls"
[8] "https://files.digital.nhs.uk/excel/c/1/apms-2014-ch-09-tabs.xls"
[9] "https://files.digital.nhs.uk/excel/s/0/apms-2014-ch-10-tabs.xls"
[10] "https://files.digital.nhs.uk/excel/c/p/apms-2014-ch-11-tabs.xls"
[11] "https://files.digital.nhs.uk/6F/FB2F1B/apms-2014-ch-12-tabs.xls"
[12] "https://files.digital.nhs.uk/excel/d/g/apms-2014-ch-13-tabs.xls"
[13] "https://files.digital.nhs.uk/excel/d/r/apms-2014-ch-14-tabs.xls"
库(XML)
urlXMLXML
包有一些很好的工具来处理网页,特别是提取链接
library(XML)
url <- "https://digital.nhs.uk/data-and-information/publications/statistical/adult-psychiatric-morbidity-survey/adult-psychiatric-morbidity-survey-survey-of-mental-health-and-wellbeing-england-2014"
pageContent <- readLines(url)
Links <- getHTMLLinks(pageContent)
xlsFiles <- grep("\\.xls", Links)
Links[xlsFiles]
[1] "https://files.digital.nhs.uk/excel/9/s/apms-2014-ch-02-tabs.xls"
[2] "https://files.digital.nhs.uk/excel/9/b/apms-2014-ch-03-tabs.xls"
[3] "https://files.digital.nhs.uk/excel/a/i/apms-2014-ch-04-tabs.xls"
[4] "https://files.digital.nhs.uk/excel/a/t/apms-2014-ch-05-tabs.xls"
[5] "https://files.digital.nhs.uk/excel/b/m/apms-2014-ch-06-tabs.xls"
[6] "https://files.digital.nhs.uk/excel/b/p/apms-2014-ch-07-tabs.xls"
[7] "https://files.digital.nhs.uk/excel/b/l/apms-2014-ch-08-tabs.xls"
[8] "https://files.digital.nhs.uk/excel/c/1/apms-2014-ch-09-tabs.xls"
[9] "https://files.digital.nhs.uk/excel/s/0/apms-2014-ch-10-tabs.xls"
[10] "https://files.digital.nhs.uk/excel/c/p/apms-2014-ch-11-tabs.xls"
[11] "https://files.digital.nhs.uk/6F/FB2F1B/apms-2014-ch-12-tabs.xls"
[12] "https://files.digital.nhs.uk/excel/d/g/apms-2014-ch-13-tabs.xls"
[13] "https://files.digital.nhs.uk/excel/d/r/apms-2014-ch-14-tabs.xls"
库(XML)
url我没有用过它,但我礼貌地推荐package
——它可以让你尽可能地尊重你的网站。我可不想给NHS添麻烦!我没有使用过它,但我建议您礼貌地使用package
——这样可以很容易地确保您尽可能尊重您的网站。我可不想给NHS添麻烦!