来自rvest的html#u表单';我认不出形式
我正试图通过来自rvest的html#u表单';我认不出形式,html,r,web-scraping,rvest,Html,R,Web Scraping,Rvest,我正试图通过rvest(不是链接的论文/摘要,只是编号、标题、作者等)来获取文章的内容 默认情况下,该页面仅显示2016年论文,删除2016年数据“没有问题”。我希望URL在将“2016”改为“所有年份”后会发生变化,但它仍然是一样的。所以我求助于html\u表单。在查看网页的“资源”时,我发现相关的输入名称是filteryear R代码: library(rvest) rdc <- html_session("https://sfb649.wiwi.hu-berlin.de/fedc/d
rvest
(不是链接的论文/摘要,只是编号、标题、作者等)来获取文章的内容
默认情况下,该页面仅显示2016年论文,删除2016年数据“没有问题”。我希望URL在将“2016”改为“所有年份”后会发生变化,但它仍然是一样的。所以我求助于html\u表单
。在查看网页的“资源”时,我发现相关的输入名称是filteryear
R代码:
library(rvest)
rdc <- html_session("https://sfb649.wiwi.hu-berlin.de/fedc/discussionPapers_formular_content.php")
form <- html_form(rdc)
form <- set_values(form, filteryear = "all years")
#Error: Unknown field names: filteryear
为什么
html\u表单
不能完全识别此表单?而且,更重要的是,有没有办法解决这个问题?我可以使用html\u表单
,但您只需手动httr::POST
表单,如下所示:
library(rvest)
library(httr)
res <- POST("https://sfb649.wiwi.hu-berlin.de/fedc/discussionPapers_formular_content.php",
body = list(filterTypeName = "filterTypeName:AUTHORS",
filteryear = "all",
B1 = "Search"), encode = "form")
out <- read_html(res) %>% html_table(fill=TRUE)
library(rvest)
library(httr)
res <- POST("https://sfb649.wiwi.hu-berlin.de/fedc/discussionPapers_formular_content.php",
body = list(filterTypeName = "filterTypeName:AUTHORS",
filteryear = "all",
B1 = "Search"), encode = "form")
out <- read_html(res) %>% html_table(fill=TRUE)
> dim(out[[7]])
[1] 805 10
> head(out[[7]])
X1 X2
1 2016-049 Q3-D3-LSA
2 2016-048 Unraveling of Cooperation in Dynamic Collaboration
3 2016-047 Time Varying Quantile Lasso
4 2016-046 Credit Rating Score Analysis
5 2016-045 Information Acquisition and Liquidity Dry-Ups
6 2016-044 Dynamic Contracting with Long-Term Consequences: Optimal CEO Compensation and Turnover
X3 X4 X5 X6
1 Lukas Borke and Wolfgang K. Härdle B1 15.11.2016 C87, C88, G17
2 Suvi Vasama A8 07.11.2016 C73, D83, O31
3 Lenka Zbonakova, Wolfgang Karl H\177ardle and Weining Wang B1 07.11.2016 C21, G01, G20, G32
4 Wolfgang Karl H\177ärdle, Phoon Kok Fai and David Lee Kuo Chuen B1 02.11.2016 C01, G00, G17, G24
5 Philipp Koenig and David Pothier C10 26.10.2016 D82, G01, G12
6 Suvi Vasama A8 26.10.2016 C73, D82, D86
X7 X8 X9 X10
1 NA NA NA NA
2 NA NA NA NA
3 NA NA NA NA
4 NA NA NA NA
5 NA NA NA NA
6 NA NA NA NA