Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/80.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R使用下拉菜单进行刮削_R_Web Scraping_Rvest - Fatal编程技术网

R使用下拉菜单进行刮削

R使用下拉菜单进行刮削,r,web-scraping,rvest,R,Web Scraping,Rvest,我正试图从网站上抓取NBA每日ROS预测: 问题是默认选择的玩家数量是200,我想要400或者全部都可以 此代码检索前200个无问题: > url <- 'https://hashtagbasketball.com/fantasy-basketball-projections' > > page <- read_html(url) > > projs <- html_table(page)[[3]] %>% ### anything af

我正试图从网站上抓取NBA每日ROS预测:

问题是默认选择的玩家数量是200,我想要400或者全部都可以

此代码检索前200个无问题:

> url <- 'https://hashtagbasketball.com/fantasy-basketball-projections'
> 
> page <- read_html(url)
> 
> projs <- html_table(page)[[3]] %>% ### anything after this just cleans the df
+     rename_all(~gsub('3pm','threes',gsub('\\%','pct',tolower(.)))) %>% 
+     mutate_at(vars(matches('pct$')),~stringr::str_sub(.,1,4)) %>% 
+     mutate(player = stringr::word(player,1, 2, sep = ' ')) %>% 
+     mutate(pos = stringr::word(pos,1,1,sep = ',')) %>% 
+     mutate(pos2 = gsub('P','',pos)) %>% 
+     drop_na(player) %>% 
+     mutate_at(vars(-c(player,matches('pos'),team)),~as.numeric(.)) %>% 
+     select(player, matches('pos'),everything(),-`r#`) %>% 
+     head(2)
> projs
         player pos pos2 team gp  mpg fgpct ftpct threes  pts treb ast stl blk  to total
1  James Harden  PG    G  HOU 64 36.3  0.44  0.86    4.7 34.4  6.6 9.3 1.7 0.8 4.6 17.68
2 Anthony Davis  PF    F  LAL 65 34.8  0.50  0.84    1.3 26.6  9.4 3.2 1.5 2.3 2.5 14.56
这将创建包含所有所需类别的表。但是,当我使用下面的代码时,它不会仅提取gp和mpg的所有统计类别:

> pgsession <- html_session(url)
> pgform <-html_form(pgsession)[[1]]
> filled_form <-set_values(pgform,
+                          "ctl00$ContentPlaceHolder1$DDSHOW" = "400")
> 
> d <- submit_form(session=pgsession, form=filled_form)
Submitting with '<unnamed>'
> 
> y <- d %>%
+     html_nodes("table") %>%
+     .[[3]] %>%
+     html_table(header=TRUE) %>% 
+     mutate(PLAYER = stringr::word(PLAYER,1, 2, sep = ' ')) %>% 
+     head(2)
> y
  R#        PLAYER   POS TEAM GP  MPG TOTAL
1  1  James Harden PG,SG  HOU 64 36.3  0.00
2  2 Anthony Davis  PF,C  LAL 65 34.8  0.00
知道我做错了什么吗?
谢谢

问题似乎是在提交表单时没有选中其他变量的复选框。您必须手动设置它们。这将向您展示如何获取ftm和ftpct。我将把其余的留给你:

library(tidyverse)
library(rvest)
url <- 'https://hashtagbasketball.com/fantasy-basketball-projections'
pgsession <- html_session(url)
pgform <-html_form(pgsession)
pgform[[1]][[5]][["ctl00$ContentPlaceHolder1$CBFTM"]]$value <- "checked" 
pgform[[1]][[5]][["ctl00$ContentPlaceHolder1$CBFTP"]]$value <- "checked" 

filled_form <-set_values(pgform[[1]],"ctl00$ContentPlaceHolder1$DDSHOW" = "400")
d <- submit_form(session=pgsession, form=filled_form)

d %>%
       html_nodes("table") %>%
       .[[3]] %>% 
       html_table() %>%
       rename_all(~gsub('3pm','threes',gsub('\\%','pct',tolower(.)))) %>% 
       mutate_at(vars(matches('pct$')),~stringr::str_sub(.,1,4)) %>% 
       mutate(player = stringr::word(player,1, 2, sep = ' ')) %>% 
       mutate(pos = stringr::word(pos,1,1,sep = ',')) %>% 
       mutate(pos2 = gsub('P','',pos)) %>% 
       drop_na(player) %>% 
       mutate_at(vars(-c(player,matches('pos'),team)),~as.numeric(.)) %>% 
       select(player, matches('pos'),everything(),-`r#`) %>% 
       head(2)
#        player pos pos2 team gp  mpg  ftm ftpct total
#1 James Harden  PG    G  HOU 64 36.3 10.4  0.86 10.95
#2 Devin Booker  SG   SG  PHX 70 35.6  6.7  0.91  7.99
如果您不知道,您可以通过右键单击并选择“在Chrome中检查”来获取复选框名称:

问题似乎在于,提交表单时未选中其他变量的复选框。您必须手动设置它们。这将向您展示如何获取ftm和ftpct。我将把其余的留给你:

library(tidyverse)
library(rvest)
url <- 'https://hashtagbasketball.com/fantasy-basketball-projections'
pgsession <- html_session(url)
pgform <-html_form(pgsession)
pgform[[1]][[5]][["ctl00$ContentPlaceHolder1$CBFTM"]]$value <- "checked" 
pgform[[1]][[5]][["ctl00$ContentPlaceHolder1$CBFTP"]]$value <- "checked" 

filled_form <-set_values(pgform[[1]],"ctl00$ContentPlaceHolder1$DDSHOW" = "400")
d <- submit_form(session=pgsession, form=filled_form)

d %>%
       html_nodes("table") %>%
       .[[3]] %>% 
       html_table() %>%
       rename_all(~gsub('3pm','threes',gsub('\\%','pct',tolower(.)))) %>% 
       mutate_at(vars(matches('pct$')),~stringr::str_sub(.,1,4)) %>% 
       mutate(player = stringr::word(player,1, 2, sep = ' ')) %>% 
       mutate(pos = stringr::word(pos,1,1,sep = ',')) %>% 
       mutate(pos2 = gsub('P','',pos)) %>% 
       drop_na(player) %>% 
       mutate_at(vars(-c(player,matches('pos'),team)),~as.numeric(.)) %>% 
       select(player, matches('pos'),everything(),-`r#`) %>% 
       head(2)
#        player pos pos2 team gp  mpg  ftm ftpct total
#1 James Harden  PG    G  HOU 64 36.3 10.4  0.86 10.95
#2 Devin Booker  SG   SG  PHX 70 35.6  6.7  0.91  7.99
如果您不知道,您可以通过右键单击并选择“在Chrome中检查”来获取复选框名称:

请记住,Lappy将返回其结果,而不是在全局环境中修改副本。使用for循环,您可能会更幸运。set_值也可能有效,但我发现一些GitHub问题表明它可能不适用于复选框。请记住,Lappy会返回结果,而不是在全局环境中修改副本。使用for循环,您可能会更幸运。set_值也可以工作,但我发现一些GitHub问题表明它可能不适用于复选框。