R XML href从SEC Edgar网站刮取

R XML href从SEC Edgar网站刮取,r,web-scraping,html-parsing,href,extract,R,Web Scraping,Html Parsing,Href,Extract,我检查了之前类似的问题-没有运气。。。似乎无法通过readHTMLTable阅读Edgar网页。我正在尝试读取此URL: …并将“文档”按钮下的所有href链接转换为字符向量 “文档”链接位于一个表中-从Firefox检查工具中,第一个“文档”href链接如下所示: 文档对于简单的作业,rvest包要简单得多: library(rvest) url <- 'https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&

我检查了之前类似的问题-没有运气。。。似乎无法通过
readHTMLTable
阅读Edgar网页。我正在尝试读取此URL:

…并将“文档”按钮下的所有href链接转换为字符向量

“文档”链接位于一个表中-从Firefox检查工具中,第一个“文档”href链接如下所示:



文档对于简单的作业,
rvest
包要简单得多:

library(rvest)

url <- 'https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=AAPL&type=10-Q&dateb=&owner=exclude&count=100'

        # pull HTML from page  
url %>% read_html() %>%
    # get tags with a certain CSS selector
    html_nodes('#documentsbutton') %>%
    # get the href attribute from each node
    html_attr('href')

# [1] "/Archives/edgar/data/320193/000119312516559625/0001193125-16-559625-index.htm"
# [2] "/Archives/edgar/data/320193/000119312516439878/0001193125-16-439878-index.htm"
# [3] "/Archives/edgar/data/320193/000119312515259935/0001193125-15-259935-index.htm"
# [4] "/Archives/edgar/data/320193/000119312515153166/0001193125-15-153166-index.htm"
# [5] "/Archives/edgar/data/320193/000119312515023697/0001193125-15-023697-index.htm"
# [6] "/Archives/edgar/data/320193/000119312514277160/0001193125-14-277160-index.htm"
# [7] "/Archives/edgar/data/320193/000119312514157311/0001193125-14-157311-index.htm"
# [8] "/Archives/edgar/data/320193/000119312514024487/0001193125-14-024487-index.htm"
# [9] "/Archives/edgar/data/320193/000119312513300670/0001193125-13-300670-index.htm"
# [10] "/Archives/edgar/data/320193/000119312513168288/0001193125-13-168288-index.htm"
# ...
库(rvest)
url%read_html()%%>%
#使用特定的CSS选择器获取标记
html_节点(“#文档按钮”)%>%
#从每个节点获取href属性
html_attr('href')
#[1]“/Archives/edgar/data/320193/000119312516559625/0001193125-16-559625-index.htm”
#[2]“/Archives/edgar/data/320193/00011912516439878/0001193125-16-439878-index.htm”
#[3]“/Archives/edgar/data/320193/00011932515259935/0001193125-15-259935-index.htm”
#[4]“/Archives/edgar/data/320193/0001193125153166/0001193125-15-153166-index.htm”
#[5]“/Archives/edgar/data/320193/00011932515023697/0001193125-15-023697-index.htm”
#[6]“/Archives/edgar/data/320193/00011932514277160/0001193125-14-277160-index.htm”
#[7]“/Archives/edgar/data/320193/00011932514157311/0001193125-14-157311-index.htm”
#[8]“/Archives/edgar/data/320193/00011932514024487/0001193125-14-024487-index.htm”
#[9]“/Archives/edgar/data/320193/00011932513300670/0001193125-13-300670-index.htm”
#[10]“/Archives/edgar/data/320193/00011932513168288/0001193125-13-168288-index.htm”
# ...

系统是否存在某些缺陷?