使用bind_rows()和lappy()函数处理错误/异常

使用bind_rows()和lappy()函数处理错误/异常,r,lapply,rvest,R,Lapply,Rvest,我有一个从URL列表中提取表格的函数: getscore <- function(www0) { require(rvest) require(dplyr) www <- html(www0) boxscore <- www %>% html_table(fill = TRUE) %>% .[[1]] names(boxscore)[3] <- "VG" names(boxscore)[5] <- "

我有一个从URL列表中提取表格的函数:

getscore <- function(www0) {

    require(rvest)
    require(dplyr)

    www <- html(www0)

    boxscore <- www %>% html_table(fill = TRUE) %>% .[[1]]
    names(boxscore)[3] <- "VG"
    names(boxscore)[5] <- "HG"
    names(boxscore)[6] <- "Type"

    return(boxscore)
}

getscore没有游戏时获得的表具有完全不同的结构。您可以检查colnames(boxscore)是否符合预期。作为一个例子,我包含了一个对函数的修改,用于检查列访问者是否可用

getscore <- function(www0) {

  require(rvest)
  require(dplyr)

  www <- html(www0)

  boxscore <- www %>% html_table(fill = TRUE) %>% .[[1]]

  if ("Visitor" %in% colnames(boxscore)){
    names(boxscore)[3] <- "VG"
    names(boxscore)[5] <- "HG"
    names(boxscore)[6] <- "Type"

  return(boxscore)
  }
}

getscore没有游戏时获得的表具有完全不同的结构。您可以检查colnames(boxscore)是否符合预期。作为一个例子,我包含了一个对函数的修改,用于检查列访问者是否可用

getscore <- function(www0) {

  require(rvest)
  require(dplyr)

  www <- html(www0)

  boxscore <- www %>% html_table(fill = TRUE) %>% .[[1]]

  if ("Visitor" %in% colnames(boxscore)){
    names(boxscore)[3] <- "VG"
    names(boxscore)[5] <- "HG"
    names(boxscore)[6] <- "Type"

  return(boxscore)
  }
}

getscore这里一个很好的方法是使用
data.table
包中的
rbindlist
(它允许您使用
fill=TRUE
),这样您就可以绑定所有行,即使是
bind\u rows
不起作用的行,但是您可以过滤非NA日期(本质上是
bind_rows
不起作用的网页)然后限制为6列,我猜您在有效数据中查找这些列

library(data.table) # development vs. 1.9.5
www_list <- c("http://www.hockey-reference.com/boxscores/2014/12/20/",
              "http://www.hockey-reference.com/boxscores/2014/12/21/",
              "http://www.hockey-reference.com/boxscores/2014/12/22/",
              "http://www.hockey-reference.com/boxscores/2014/12/24/") # not working
resdt<-rbindlist(
    lapply(
        www_list, function(www0){
            message ("web is ", www0) # comment out this if you don't want message to appear
            getscore(www0)}),fill=TRUE)
resdt[!is.na(Date),1:6,with=FALSE] # 6 column is valid data

         Date             Visitor VG                  Home HG Type
 1: 2014-12-20  Colorado Avalanche  5        Buffalo Sabres  1     
 2: 2014-12-20    New York Rangers  3   Carolina Hurricanes  2   SO
 3: 2014-12-20  Chicago Blackhawks  2 Columbus Blue Jackets  3   SO
 4: 2014-12-20     Arizona Coyotes  2     Los Angeles Kings  4     
 5: 2014-12-20 Nashville Predators  6        Minnesota Wild  5   OT
 6: 2014-12-20     Ottawa Senators  1    Montreal Canadiens  4     
 7: 2014-12-20 Washington Capitals  4     New Jersey Devils  0     
 8: 2014-12-20 Tampa Bay Lightning  1    New York Islanders  3     
 9: 2014-12-20    Florida Panthers  1   Pittsburgh Penguins  3     
10: 2014-12-20     St. Louis Blues  2       San Jose Sharks  3   OT
11: 2014-12-20 Philadelphia Flyers  7   Toronto Maple Leafs  4     
12: 2014-12-20      Calgary Flames  2     Vancouver Canucks  3   OT
13: 2014-12-21      Buffalo Sabres  3         Boston Bruins  4   OT
14: 2014-12-21 Toronto Maple Leafs  0    Chicago Blackhawks  4     
15: 2014-12-21  Colorado Avalanche  2     Detroit Red Wings  1   SO
16: 2014-12-21        Dallas Stars  6       Edmonton Oilers  5   SO
17: 2014-12-21 Carolina Hurricanes  0      New York Rangers  1     
18: 2014-12-21 Philadelphia Flyers  4         Winnipeg Jets  3   OT
19: 2014-12-22     San Jose Sharks  2         Anaheim Ducks  3   OT
20: 2014-12-22 Nashville Predators  5 Columbus Blue Jackets  1     
21: 2014-12-22 Pittsburgh Penguins  3      Florida Panthers  4   SO
22: 2014-12-22      Calgary Flames  4     Los Angeles Kings  3   OT
23: 2014-12-22     Arizona Coyotes  1     Vancouver Canucks  7     
24: 2014-12-22     Ottawa Senators  1   Washington Capitals  2     
          Date             Visitor VG                  Home HG Type

这里一个很好的方法是使用
data.table
包中的
rbindlist
(它允许您使用
fill=TRUE
),这样您就可以绑定所有
bind\u rows
不起作用的行,但是您可以过滤非NA日期(本质上是
bind\u rows
不起作用的网页)然后限制为6列,我猜您正在有效数据中查找这些列

library(data.table) # development vs. 1.9.5
www_list <- c("http://www.hockey-reference.com/boxscores/2014/12/20/",
              "http://www.hockey-reference.com/boxscores/2014/12/21/",
              "http://www.hockey-reference.com/boxscores/2014/12/22/",
              "http://www.hockey-reference.com/boxscores/2014/12/24/") # not working
resdt<-rbindlist(
    lapply(
        www_list, function(www0){
            message ("web is ", www0) # comment out this if you don't want message to appear
            getscore(www0)}),fill=TRUE)
resdt[!is.na(Date),1:6,with=FALSE] # 6 column is valid data

         Date             Visitor VG                  Home HG Type
 1: 2014-12-20  Colorado Avalanche  5        Buffalo Sabres  1     
 2: 2014-12-20    New York Rangers  3   Carolina Hurricanes  2   SO
 3: 2014-12-20  Chicago Blackhawks  2 Columbus Blue Jackets  3   SO
 4: 2014-12-20     Arizona Coyotes  2     Los Angeles Kings  4     
 5: 2014-12-20 Nashville Predators  6        Minnesota Wild  5   OT
 6: 2014-12-20     Ottawa Senators  1    Montreal Canadiens  4     
 7: 2014-12-20 Washington Capitals  4     New Jersey Devils  0     
 8: 2014-12-20 Tampa Bay Lightning  1    New York Islanders  3     
 9: 2014-12-20    Florida Panthers  1   Pittsburgh Penguins  3     
10: 2014-12-20     St. Louis Blues  2       San Jose Sharks  3   OT
11: 2014-12-20 Philadelphia Flyers  7   Toronto Maple Leafs  4     
12: 2014-12-20      Calgary Flames  2     Vancouver Canucks  3   OT
13: 2014-12-21      Buffalo Sabres  3         Boston Bruins  4   OT
14: 2014-12-21 Toronto Maple Leafs  0    Chicago Blackhawks  4     
15: 2014-12-21  Colorado Avalanche  2     Detroit Red Wings  1   SO
16: 2014-12-21        Dallas Stars  6       Edmonton Oilers  5   SO
17: 2014-12-21 Carolina Hurricanes  0      New York Rangers  1     
18: 2014-12-21 Philadelphia Flyers  4         Winnipeg Jets  3   OT
19: 2014-12-22     San Jose Sharks  2         Anaheim Ducks  3   OT
20: 2014-12-22 Nashville Predators  5 Columbus Blue Jackets  1     
21: 2014-12-22 Pittsburgh Penguins  3      Florida Panthers  4   SO
22: 2014-12-22      Calgary Flames  4     Los Angeles Kings  3   OT
23: 2014-12-22     Arizona Coyotes  1     Vancouver Canucks  7     
24: 2014-12-22     Ottawa Senators  1   Washington Capitals  2     
          Date             Visitor VG                  Home HG Type
library(data.table) # development vs. 1.9.5
www_list <- c("http://www.hockey-reference.com/boxscores/2014/12/20/",
              "http://www.hockey-reference.com/boxscores/2014/12/21/",
              "http://www.hockey-reference.com/boxscores/2014/12/22/",
              "http://www.hockey-reference.com/boxscores/2014/12/24/") # not working
resdt<-rbindlist(
    lapply(
        www_list, function(www0){
            message ("web is ", www0) # comment out this if you don't want message to appear
            getscore(www0)}),fill=TRUE)
resdt[!is.na(Date),1:6,with=FALSE] # 6 column is valid data

         Date             Visitor VG                  Home HG Type
 1: 2014-12-20  Colorado Avalanche  5        Buffalo Sabres  1     
 2: 2014-12-20    New York Rangers  3   Carolina Hurricanes  2   SO
 3: 2014-12-20  Chicago Blackhawks  2 Columbus Blue Jackets  3   SO
 4: 2014-12-20     Arizona Coyotes  2     Los Angeles Kings  4     
 5: 2014-12-20 Nashville Predators  6        Minnesota Wild  5   OT
 6: 2014-12-20     Ottawa Senators  1    Montreal Canadiens  4     
 7: 2014-12-20 Washington Capitals  4     New Jersey Devils  0     
 8: 2014-12-20 Tampa Bay Lightning  1    New York Islanders  3     
 9: 2014-12-20    Florida Panthers  1   Pittsburgh Penguins  3     
10: 2014-12-20     St. Louis Blues  2       San Jose Sharks  3   OT
11: 2014-12-20 Philadelphia Flyers  7   Toronto Maple Leafs  4     
12: 2014-12-20      Calgary Flames  2     Vancouver Canucks  3   OT
13: 2014-12-21      Buffalo Sabres  3         Boston Bruins  4   OT
14: 2014-12-21 Toronto Maple Leafs  0    Chicago Blackhawks  4     
15: 2014-12-21  Colorado Avalanche  2     Detroit Red Wings  1   SO
16: 2014-12-21        Dallas Stars  6       Edmonton Oilers  5   SO
17: 2014-12-21 Carolina Hurricanes  0      New York Rangers  1     
18: 2014-12-21 Philadelphia Flyers  4         Winnipeg Jets  3   OT
19: 2014-12-22     San Jose Sharks  2         Anaheim Ducks  3   OT
20: 2014-12-22 Nashville Predators  5 Columbus Blue Jackets  1     
21: 2014-12-22 Pittsburgh Penguins  3      Florida Panthers  4   SO
22: 2014-12-22      Calgary Flames  4     Los Angeles Kings  3   OT
23: 2014-12-22     Arizona Coyotes  1     Vancouver Canucks  7     
24: 2014-12-22     Ottawa Senators  1   Washington Capitals  2     
          Date             Visitor VG                  Home HG Type
resdf<-as.data.frame(res.dt)
with(resdf,resdf[!is.na(Date),1:6]) 

     Date             Visitor VG                  Home HG Type
1  2014-12-20  Colorado Avalanche  5        Buffalo Sabres  1     
2  2014-12-20    New York Rangers  3   Carolina Hurricanes  2   SO
3  2014-12-20  Chicago Blackhawks  2 Columbus Blue Jackets  3   SO
4  2014-12-20     Arizona Coyotes  2     Los Angeles Kings  4     
5  2014-12-20 Nashville Predators  6        Minnesota Wild  5   OT
6  2014-12-20     Ottawa Senators  1    Montreal Canadiens  4     
7  2014-12-20 Washington Capitals  4     New Jersey Devils  0     
8  2014-12-20 Tampa Bay Lightning  1    New York Islanders  3     
9  2014-12-20    Florida Panthers  1   Pittsburgh Penguins  3     
10 2014-12-20     St. Louis Blues  2       San Jose Sharks  3   OT
11 2014-12-20 Philadelphia Flyers  7   Toronto Maple Leafs  4     
12 2014-12-20      Calgary Flames  2     Vancouver Canucks  3   OT
13 2014-12-21      Buffalo Sabres  3         Boston Bruins  4   OT
14 2014-12-21 Toronto Maple Leafs  0    Chicago Blackhawks  4     
15 2014-12-21  Colorado Avalanche  2     Detroit Red Wings  1   SO
16 2014-12-21        Dallas Stars  6       Edmonton Oilers  5   SO
17 2014-12-21 Carolina Hurricanes  0      New York Rangers  1     
18 2014-12-21 Philadelphia Flyers  4         Winnipeg Jets  3   OT
19 2014-12-22     San Jose Sharks  2         Anaheim Ducks  3   OT
20 2014-12-22 Nashville Predators  5 Columbus Blue Jackets  1     
21 2014-12-22 Pittsburgh Penguins  3      Florida Panthers  4   SO
22 2014-12-22      Calgary Flames  4     Los Angeles Kings  3   OT
23 2014-12-22     Arizona Coyotes  1     Vancouver Canucks  7     
24 2014-12-22     Ottawa Senators  1   Washington Capitals  2