使用bind_rows()和lappy()函数处理错误/异常
我有一个从URL列表中提取表格的函数:使用bind_rows()和lappy()函数处理错误/异常,r,lapply,rvest,R,Lapply,Rvest,我有一个从URL列表中提取表格的函数: getscore <- function(www0) { require(rvest) require(dplyr) www <- html(www0) boxscore <- www %>% html_table(fill = TRUE) %>% .[[1]] names(boxscore)[3] <- "VG" names(boxscore)[5] <- "
getscore <- function(www0) {
require(rvest)
require(dplyr)
www <- html(www0)
boxscore <- www %>% html_table(fill = TRUE) %>% .[[1]]
names(boxscore)[3] <- "VG"
names(boxscore)[5] <- "HG"
names(boxscore)[6] <- "Type"
return(boxscore)
}
getscore没有游戏时获得的表具有完全不同的结构。您可以检查colnames(boxscore)是否符合预期。作为一个例子,我包含了一个对函数的修改,用于检查列访问者是否可用
getscore <- function(www0) {
require(rvest)
require(dplyr)
www <- html(www0)
boxscore <- www %>% html_table(fill = TRUE) %>% .[[1]]
if ("Visitor" %in% colnames(boxscore)){
names(boxscore)[3] <- "VG"
names(boxscore)[5] <- "HG"
names(boxscore)[6] <- "Type"
return(boxscore)
}
}
getscore没有游戏时获得的表具有完全不同的结构。您可以检查colnames(boxscore)是否符合预期。作为一个例子,我包含了一个对函数的修改,用于检查列访问者是否可用
getscore <- function(www0) {
require(rvest)
require(dplyr)
www <- html(www0)
boxscore <- www %>% html_table(fill = TRUE) %>% .[[1]]
if ("Visitor" %in% colnames(boxscore)){
names(boxscore)[3] <- "VG"
names(boxscore)[5] <- "HG"
names(boxscore)[6] <- "Type"
return(boxscore)
}
}
getscore这里一个很好的方法是使用data.table
包中的rbindlist
(它允许您使用fill=TRUE
),这样您就可以绑定所有行,即使是bind\u rows
不起作用的行,但是您可以过滤非NA日期(本质上是bind_rows
不起作用的网页)然后限制为6列,我猜您在有效数据中查找这些列
library(data.table) # development vs. 1.9.5
www_list <- c("http://www.hockey-reference.com/boxscores/2014/12/20/",
"http://www.hockey-reference.com/boxscores/2014/12/21/",
"http://www.hockey-reference.com/boxscores/2014/12/22/",
"http://www.hockey-reference.com/boxscores/2014/12/24/") # not working
resdt<-rbindlist(
lapply(
www_list, function(www0){
message ("web is ", www0) # comment out this if you don't want message to appear
getscore(www0)}),fill=TRUE)
resdt[!is.na(Date),1:6,with=FALSE] # 6 column is valid data
Date Visitor VG Home HG Type
1: 2014-12-20 Colorado Avalanche 5 Buffalo Sabres 1
2: 2014-12-20 New York Rangers 3 Carolina Hurricanes 2 SO
3: 2014-12-20 Chicago Blackhawks 2 Columbus Blue Jackets 3 SO
4: 2014-12-20 Arizona Coyotes 2 Los Angeles Kings 4
5: 2014-12-20 Nashville Predators 6 Minnesota Wild 5 OT
6: 2014-12-20 Ottawa Senators 1 Montreal Canadiens 4
7: 2014-12-20 Washington Capitals 4 New Jersey Devils 0
8: 2014-12-20 Tampa Bay Lightning 1 New York Islanders 3
9: 2014-12-20 Florida Panthers 1 Pittsburgh Penguins 3
10: 2014-12-20 St. Louis Blues 2 San Jose Sharks 3 OT
11: 2014-12-20 Philadelphia Flyers 7 Toronto Maple Leafs 4
12: 2014-12-20 Calgary Flames 2 Vancouver Canucks 3 OT
13: 2014-12-21 Buffalo Sabres 3 Boston Bruins 4 OT
14: 2014-12-21 Toronto Maple Leafs 0 Chicago Blackhawks 4
15: 2014-12-21 Colorado Avalanche 2 Detroit Red Wings 1 SO
16: 2014-12-21 Dallas Stars 6 Edmonton Oilers 5 SO
17: 2014-12-21 Carolina Hurricanes 0 New York Rangers 1
18: 2014-12-21 Philadelphia Flyers 4 Winnipeg Jets 3 OT
19: 2014-12-22 San Jose Sharks 2 Anaheim Ducks 3 OT
20: 2014-12-22 Nashville Predators 5 Columbus Blue Jackets 1
21: 2014-12-22 Pittsburgh Penguins 3 Florida Panthers 4 SO
22: 2014-12-22 Calgary Flames 4 Los Angeles Kings 3 OT
23: 2014-12-22 Arizona Coyotes 1 Vancouver Canucks 7
24: 2014-12-22 Ottawa Senators 1 Washington Capitals 2
Date Visitor VG Home HG Type
这里一个很好的方法是使用data.table
包中的rbindlist
(它允许您使用fill=TRUE
),这样您就可以绑定所有bind\u rows
不起作用的行,但是您可以过滤非NA日期(本质上是bind\u rows
不起作用的网页)然后限制为6列,我猜您正在有效数据中查找这些列
library(data.table) # development vs. 1.9.5
www_list <- c("http://www.hockey-reference.com/boxscores/2014/12/20/",
"http://www.hockey-reference.com/boxscores/2014/12/21/",
"http://www.hockey-reference.com/boxscores/2014/12/22/",
"http://www.hockey-reference.com/boxscores/2014/12/24/") # not working
resdt<-rbindlist(
lapply(
www_list, function(www0){
message ("web is ", www0) # comment out this if you don't want message to appear
getscore(www0)}),fill=TRUE)
resdt[!is.na(Date),1:6,with=FALSE] # 6 column is valid data
Date Visitor VG Home HG Type
1: 2014-12-20 Colorado Avalanche 5 Buffalo Sabres 1
2: 2014-12-20 New York Rangers 3 Carolina Hurricanes 2 SO
3: 2014-12-20 Chicago Blackhawks 2 Columbus Blue Jackets 3 SO
4: 2014-12-20 Arizona Coyotes 2 Los Angeles Kings 4
5: 2014-12-20 Nashville Predators 6 Minnesota Wild 5 OT
6: 2014-12-20 Ottawa Senators 1 Montreal Canadiens 4
7: 2014-12-20 Washington Capitals 4 New Jersey Devils 0
8: 2014-12-20 Tampa Bay Lightning 1 New York Islanders 3
9: 2014-12-20 Florida Panthers 1 Pittsburgh Penguins 3
10: 2014-12-20 St. Louis Blues 2 San Jose Sharks 3 OT
11: 2014-12-20 Philadelphia Flyers 7 Toronto Maple Leafs 4
12: 2014-12-20 Calgary Flames 2 Vancouver Canucks 3 OT
13: 2014-12-21 Buffalo Sabres 3 Boston Bruins 4 OT
14: 2014-12-21 Toronto Maple Leafs 0 Chicago Blackhawks 4
15: 2014-12-21 Colorado Avalanche 2 Detroit Red Wings 1 SO
16: 2014-12-21 Dallas Stars 6 Edmonton Oilers 5 SO
17: 2014-12-21 Carolina Hurricanes 0 New York Rangers 1
18: 2014-12-21 Philadelphia Flyers 4 Winnipeg Jets 3 OT
19: 2014-12-22 San Jose Sharks 2 Anaheim Ducks 3 OT
20: 2014-12-22 Nashville Predators 5 Columbus Blue Jackets 1
21: 2014-12-22 Pittsburgh Penguins 3 Florida Panthers 4 SO
22: 2014-12-22 Calgary Flames 4 Los Angeles Kings 3 OT
23: 2014-12-22 Arizona Coyotes 1 Vancouver Canucks 7
24: 2014-12-22 Ottawa Senators 1 Washington Capitals 2
Date Visitor VG Home HG Type
library(data.table) # development vs. 1.9.5
www_list <- c("http://www.hockey-reference.com/boxscores/2014/12/20/",
"http://www.hockey-reference.com/boxscores/2014/12/21/",
"http://www.hockey-reference.com/boxscores/2014/12/22/",
"http://www.hockey-reference.com/boxscores/2014/12/24/") # not working
resdt<-rbindlist(
lapply(
www_list, function(www0){
message ("web is ", www0) # comment out this if you don't want message to appear
getscore(www0)}),fill=TRUE)
resdt[!is.na(Date),1:6,with=FALSE] # 6 column is valid data
Date Visitor VG Home HG Type
1: 2014-12-20 Colorado Avalanche 5 Buffalo Sabres 1
2: 2014-12-20 New York Rangers 3 Carolina Hurricanes 2 SO
3: 2014-12-20 Chicago Blackhawks 2 Columbus Blue Jackets 3 SO
4: 2014-12-20 Arizona Coyotes 2 Los Angeles Kings 4
5: 2014-12-20 Nashville Predators 6 Minnesota Wild 5 OT
6: 2014-12-20 Ottawa Senators 1 Montreal Canadiens 4
7: 2014-12-20 Washington Capitals 4 New Jersey Devils 0
8: 2014-12-20 Tampa Bay Lightning 1 New York Islanders 3
9: 2014-12-20 Florida Panthers 1 Pittsburgh Penguins 3
10: 2014-12-20 St. Louis Blues 2 San Jose Sharks 3 OT
11: 2014-12-20 Philadelphia Flyers 7 Toronto Maple Leafs 4
12: 2014-12-20 Calgary Flames 2 Vancouver Canucks 3 OT
13: 2014-12-21 Buffalo Sabres 3 Boston Bruins 4 OT
14: 2014-12-21 Toronto Maple Leafs 0 Chicago Blackhawks 4
15: 2014-12-21 Colorado Avalanche 2 Detroit Red Wings 1 SO
16: 2014-12-21 Dallas Stars 6 Edmonton Oilers 5 SO
17: 2014-12-21 Carolina Hurricanes 0 New York Rangers 1
18: 2014-12-21 Philadelphia Flyers 4 Winnipeg Jets 3 OT
19: 2014-12-22 San Jose Sharks 2 Anaheim Ducks 3 OT
20: 2014-12-22 Nashville Predators 5 Columbus Blue Jackets 1
21: 2014-12-22 Pittsburgh Penguins 3 Florida Panthers 4 SO
22: 2014-12-22 Calgary Flames 4 Los Angeles Kings 3 OT
23: 2014-12-22 Arizona Coyotes 1 Vancouver Canucks 7
24: 2014-12-22 Ottawa Senators 1 Washington Capitals 2
Date Visitor VG Home HG Type
resdf<-as.data.frame(res.dt)
with(resdf,resdf[!is.na(Date),1:6])
Date Visitor VG Home HG Type
1 2014-12-20 Colorado Avalanche 5 Buffalo Sabres 1
2 2014-12-20 New York Rangers 3 Carolina Hurricanes 2 SO
3 2014-12-20 Chicago Blackhawks 2 Columbus Blue Jackets 3 SO
4 2014-12-20 Arizona Coyotes 2 Los Angeles Kings 4
5 2014-12-20 Nashville Predators 6 Minnesota Wild 5 OT
6 2014-12-20 Ottawa Senators 1 Montreal Canadiens 4
7 2014-12-20 Washington Capitals 4 New Jersey Devils 0
8 2014-12-20 Tampa Bay Lightning 1 New York Islanders 3
9 2014-12-20 Florida Panthers 1 Pittsburgh Penguins 3
10 2014-12-20 St. Louis Blues 2 San Jose Sharks 3 OT
11 2014-12-20 Philadelphia Flyers 7 Toronto Maple Leafs 4
12 2014-12-20 Calgary Flames 2 Vancouver Canucks 3 OT
13 2014-12-21 Buffalo Sabres 3 Boston Bruins 4 OT
14 2014-12-21 Toronto Maple Leafs 0 Chicago Blackhawks 4
15 2014-12-21 Colorado Avalanche 2 Detroit Red Wings 1 SO
16 2014-12-21 Dallas Stars 6 Edmonton Oilers 5 SO
17 2014-12-21 Carolina Hurricanes 0 New York Rangers 1
18 2014-12-21 Philadelphia Flyers 4 Winnipeg Jets 3 OT
19 2014-12-22 San Jose Sharks 2 Anaheim Ducks 3 OT
20 2014-12-22 Nashville Predators 5 Columbus Blue Jackets 1
21 2014-12-22 Pittsburgh Penguins 3 Florida Panthers 4 SO
22 2014-12-22 Calgary Flames 4 Los Angeles Kings 3 OT
23 2014-12-22 Arizona Coyotes 1 Vancouver Canucks 7
24 2014-12-22 Ottawa Senators 1 Washington Capitals 2