R 如何解决%>;%在并行处理中运行函数时出现管道错误?
我正在1核和并行运行以下功能:R 如何解决%>;%在并行处理中运行函数时出现管道错误?,r,error-handling,parallel-processing,doparallel,R,Error Handling,Parallel Processing,Doparallel,我正在1核和并行运行以下功能: library(parallel) cl <- makeCluster(detectCores()-1) matrix_of_sums <- parLapply(cl, date_series, scrape_insider_forms) 这个简单的版本似乎工作正常,但在加载库(并行)时,它给了我一个错误。您应该能够直接运行此代码: # install.packages('stringr') # install.packages('rvest')
library(parallel)
cl <- makeCluster(detectCores()-1)
matrix_of_sums <- parLapply(cl, date_series, scrape_insider_forms)
这个简单的版本似乎工作正常,但在加载库(并行)时,它给了我一个错误。您应该能够直接运行此代码:
# install.packages('stringr')
# install.packages('rvest')
library(stringr)
library(rvest)
scrape_insider_forms <- function(date) {
#### READ WEBPAGE ####
page <- paste0('https://www.secform4.com/', date ,'/selling.htm') %>% read_html()
#### PARSE FULL TABLE #####
## nodes
insider_sales_node <- html_nodes(page, "table")
## table content
data <- html_table(insider_sales_node[[2]], fill = TRUE)
#### PARSE JOB & INVESTMENT TYPE ####
## nodes
positions_node <- html_nodes(page, "span")
## txt content
positions_txt <- as.matrix(html_text(positions_node))
job_title <- as.matrix(positions_txt[!positions_txt[,1] == '(Direct)'
& !positions_txt == '(IndirectDirect)'
& !positions_txt == '(Indirect)'
& !positions_txt == '(DirectIndirect)',])
## direct / indirect investment
dir_indir <- as.matrix(positions_txt[positions_txt[,1] == '(Direct)'
| positions_txt == '(IndirectDirect)'
| positions_txt == '(Indirect)'
| positions_txt == '(DirectIndirect)',])
## remove header row
data <- data[-1,]
## Add jobs and inv type
data$FilerJob <- job_title
data$DirIndirect <- dir_indir
## set matching colnames for output rbind
if (ncol(data) == 12) colnames(data) <- c('TransactionDate', 'ReportedDateTime', 'Company',
'Symbol', 'InsiderRelationship', 'SharesTraded',
'AveragePrice', 'TotalAmount', 'SharesOwned', 'Filing',
'FilerJob', 'DirIndirect')
## store output
insider_sales_MASTER <<- rbind(insider_sales_MASTER, data)
cat('\nFinished ----- ', as.character(date), ' --- ')
}
先谢谢你
更新:
谢谢你的指点:
library(doParallel)
## create N-1 cores clusters
cl <- makeCluster(detectCores() - 1, # number of cores to use
type = "PSOCK")
## load the libraries inside the cluster
clusterEvalQ(cl, library(rvest))
clusterExport(cl, 'date_series')
clusterExport(cl, 'insider_sales_MASTER')
matrix_of_sums <- parLapply(cl, date_series, scrape_insider_forms)
库(双并行)
##创建N-1个核心群集
cl您可以编辑您的问题以包含来自sessionInfo()
的输出吗?您的库
调用永远不会在集群内运行。尝试clusterEvalQ(cl,{library(string);library(rvest);})
@r2evans注释是正确的;具体地说,当您在Windows上时(为什么我要求使用sessionInfo()
),您必须使用一个“PSOCK”集群,它使用一个全新的R会话。有关详细信息,请参阅(例如)。@duckmayr,哦,对了,非PSOCK群集受益于已设置的主节点搜索路径。B/c非Windows可以使用分叉模型,该模型应受益于已加载到主节点的包/数据。(所有对library
的调用以及在调用makeCluster
之后在主节点中进行的此类调用都需要clusterEvalQ
来推送更新。)很好的调用。松散相关:,尽管每个节点上都需要.libpath(…)
,以查找非通用库位置。
> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_Ireland.1252 LC_CTYPE=English_Ireland.1252 LC_MONETARY=English_Ireland.1252 LC_NUMERIC=C
[5] LC_TIME=English_Ireland.1252
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] devtools_1.13.6 snow_0.4-3 future.apply_1.0.1 future_1.9.0 RSelenium_1.7.4 stringr_1.3.0 rvest_0.3.2 xml2_1.2.0
[9] XML_3.98-1.11
loaded via a namespace (and not attached):
[1] Rcpp_0.12.16 magrittr_1.5 rappdirs_0.3.1 R6_2.2.2 httr_1.3.1 globals_0.12.3 caTools_1.17.1 tools_3.5.0 binman_0.1.1
[10] git2r_0.23.0 withr_2.1.2 selectr_0.4-1 semver_0.2.0 subprocess_0.8.3 digest_0.6.15 openssl_1.0.1 yaml_2.1.18 assertthat_0.2.0
[19] codetools_0.2-15 bitops_1.0-6 curl_3.2 memoise_1.1.0 wdman_0.2.4 stringi_1.1.7 compiler_3.5.0 jsonlite_1.5 listenv_0.7.0
>
library(doParallel)
## create N-1 cores clusters
cl <- makeCluster(detectCores() - 1, # number of cores to use
type = "PSOCK")
## load the libraries inside the cluster
clusterEvalQ(cl, library(rvest))
clusterExport(cl, 'date_series')
clusterExport(cl, 'insider_sales_MASTER')
matrix_of_sums <- parLapply(cl, date_series, scrape_insider_forms)