Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/powerbi/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用R从url下载几个Excel文件_R_Web Scraping - Fatal编程技术网

使用R从url下载几个Excel文件

使用R从url下载几个Excel文件,r,web-scraping,R,Web Scraping,问题 我想用R从一个网站下载几个Excel文件。我不想单独命名每个链接,因为有很多链接。我看了以前的帖子,但他们的网站链接不起作用,所以我看不出答案与网站结构的关系 我正在测试的网站是: 尝试 我尝试获取附件的完整链接的名称: url <- "https://digital.nhs.uk/data-and-information/publications/statistical/adult-psychiatric-morbidity-survey/adult-psychiatri

问题

我想用R从一个网站下载几个Excel文件。我不想单独命名每个链接,因为有很多链接。我看了以前的帖子,但他们的网站链接不起作用,所以我看不出答案与网站结构的关系

我正在测试的网站是:

尝试

我尝试获取附件的完整链接的名称:

url <- "https://digital.nhs.uk/data-and-information/publications/statistical/adult-psychiatric-morbidity-survey/adult-psychiatric-morbidity-survey-survey-of-mental-health-and-wellbeing-england-2014"

simple <- read_html(url)

files <- simple %>%
  html_nodes(".attachment") %>% 
  html_text()


url这个
XML
包有一些很好的工具来处理网页,特别是提取链接

library(XML)

url <- "https://digital.nhs.uk/data-and-information/publications/statistical/adult-psychiatric-morbidity-survey/adult-psychiatric-morbidity-survey-survey-of-mental-health-and-wellbeing-england-2014"
pageContent <- readLines(url)
Links <- getHTMLLinks(pageContent)
xlsFiles <- grep("\\.xls", Links)
Links[xlsFiles]

 [1] "https://files.digital.nhs.uk/excel/9/s/apms-2014-ch-02-tabs.xls"
 [2] "https://files.digital.nhs.uk/excel/9/b/apms-2014-ch-03-tabs.xls"
 [3] "https://files.digital.nhs.uk/excel/a/i/apms-2014-ch-04-tabs.xls"
 [4] "https://files.digital.nhs.uk/excel/a/t/apms-2014-ch-05-tabs.xls"
 [5] "https://files.digital.nhs.uk/excel/b/m/apms-2014-ch-06-tabs.xls"
 [6] "https://files.digital.nhs.uk/excel/b/p/apms-2014-ch-07-tabs.xls"
 [7] "https://files.digital.nhs.uk/excel/b/l/apms-2014-ch-08-tabs.xls"
 [8] "https://files.digital.nhs.uk/excel/c/1/apms-2014-ch-09-tabs.xls"
 [9] "https://files.digital.nhs.uk/excel/s/0/apms-2014-ch-10-tabs.xls"
[10] "https://files.digital.nhs.uk/excel/c/p/apms-2014-ch-11-tabs.xls"
[11] "https://files.digital.nhs.uk/6F/FB2F1B/apms-2014-ch-12-tabs.xls"
[12] "https://files.digital.nhs.uk/excel/d/g/apms-2014-ch-13-tabs.xls"
[13] "https://files.digital.nhs.uk/excel/d/r/apms-2014-ch-14-tabs.xls"
库(XML)

urlXML
XML
包有一些很好的工具来处理网页,特别是提取链接

library(XML)

url <- "https://digital.nhs.uk/data-and-information/publications/statistical/adult-psychiatric-morbidity-survey/adult-psychiatric-morbidity-survey-survey-of-mental-health-and-wellbeing-england-2014"
pageContent <- readLines(url)
Links <- getHTMLLinks(pageContent)
xlsFiles <- grep("\\.xls", Links)
Links[xlsFiles]

 [1] "https://files.digital.nhs.uk/excel/9/s/apms-2014-ch-02-tabs.xls"
 [2] "https://files.digital.nhs.uk/excel/9/b/apms-2014-ch-03-tabs.xls"
 [3] "https://files.digital.nhs.uk/excel/a/i/apms-2014-ch-04-tabs.xls"
 [4] "https://files.digital.nhs.uk/excel/a/t/apms-2014-ch-05-tabs.xls"
 [5] "https://files.digital.nhs.uk/excel/b/m/apms-2014-ch-06-tabs.xls"
 [6] "https://files.digital.nhs.uk/excel/b/p/apms-2014-ch-07-tabs.xls"
 [7] "https://files.digital.nhs.uk/excel/b/l/apms-2014-ch-08-tabs.xls"
 [8] "https://files.digital.nhs.uk/excel/c/1/apms-2014-ch-09-tabs.xls"
 [9] "https://files.digital.nhs.uk/excel/s/0/apms-2014-ch-10-tabs.xls"
[10] "https://files.digital.nhs.uk/excel/c/p/apms-2014-ch-11-tabs.xls"
[11] "https://files.digital.nhs.uk/6F/FB2F1B/apms-2014-ch-12-tabs.xls"
[12] "https://files.digital.nhs.uk/excel/d/g/apms-2014-ch-13-tabs.xls"
[13] "https://files.digital.nhs.uk/excel/d/r/apms-2014-ch-14-tabs.xls"
库(XML)

url我没有用过它,但我礼貌地推荐package
——它可以让你尽可能地尊重你的网站。我可不想给NHS添麻烦!我没有使用过它,但我建议您礼貌地使用package
——这样可以很容易地确保您尽可能尊重您的网站。我可不想给NHS添麻烦!