R 如何解决%>;%在并行处理中运行函数时出现管道错误?

R 如何解决%>;%在并行处理中运行函数时出现管道错误?,r,error-handling,parallel-processing,doparallel,R,Error Handling,Parallel Processing,Doparallel,我正在1核和并行运行以下功能: library(parallel) cl <- makeCluster(detectCores()-1) matrix_of_sums <- parLapply(cl, date_series, scrape_insider_forms) 这个简单的版本似乎工作正常,但在加载库(并行)时,它给了我一个错误。您应该能够直接运行此代码: # install.packages('stringr') # install.packages('rvest')

我正在1核和并行运行以下功能:

library(parallel)

cl <- makeCluster(detectCores()-1)

matrix_of_sums <- parLapply(cl, date_series, scrape_insider_forms)
这个简单的版本似乎工作正常,但在加载库(并行)时,它给了我一个错误。您应该能够直接运行此代码:

# install.packages('stringr')
# install.packages('rvest')

library(stringr)
library(rvest)


scrape_insider_forms <- function(date)  {


  #### READ WEBPAGE ####

  page <- paste0('https://www.secform4.com/', date ,'/selling.htm') %>% read_html()

  #### PARSE FULL TABLE  ##### 

  ## nodes
  insider_sales_node <- html_nodes(page, "table")

  ## table content
  data <- html_table(insider_sales_node[[2]], fill = TRUE)

  #### PARSE JOB & INVESTMENT TYPE #### 

  ## nodes 
  positions_node <- html_nodes(page, "span")

  ## txt content 
  positions_txt <- as.matrix(html_text(positions_node))

  job_title <- as.matrix(positions_txt[!positions_txt[,1] == '(Direct)' 
                                       & !positions_txt == '(IndirectDirect)' 
                                       & !positions_txt == '(Indirect)'
                                       & !positions_txt == '(DirectIndirect)',])

  ## direct / indirect investment
  dir_indir <- as.matrix(positions_txt[positions_txt[,1] == '(Direct)' 
                                       | positions_txt == '(IndirectDirect)' 
                                       | positions_txt == '(Indirect)'
                                       | positions_txt == '(DirectIndirect)',])

  ## remove header row
  data <- data[-1,]

  ## Add jobs and inv type 
  data$FilerJob <- job_title
  data$DirIndirect <- dir_indir

  ## set matching colnames for output rbind
  if (ncol(data) == 12) colnames(data) <- c('TransactionDate', 'ReportedDateTime', 'Company',
                                            'Symbol', 'InsiderRelationship', 'SharesTraded', 
                                            'AveragePrice', 'TotalAmount', 'SharesOwned', 'Filing', 
                                            'FilerJob', 'DirIndirect')

  ## store output 
  insider_sales_MASTER  <<- rbind(insider_sales_MASTER, data)

  cat('\nFinished ----- ', as.character(date), ' --- ')


}
先谢谢你

更新:

谢谢你的指点:

library(doParallel)

## create N-1 cores clusters
cl <- makeCluster(detectCores() - 1, # number of cores to use
                         type = "PSOCK")

## load the libraries inside the cluster 
clusterEvalQ(cl,  library(rvest))

clusterExport(cl, 'date_series')
clusterExport(cl, 'insider_sales_MASTER')



matrix_of_sums <- parLapply(cl, date_series, scrape_insider_forms)
库(双并行)
##创建N-1个核心群集

cl您可以编辑您的问题以包含来自
sessionInfo()
的输出吗?您的
调用永远不会在集群内运行。尝试
clusterEvalQ(cl,{library(string);library(rvest);})
@r2evans注释是正确的;具体地说,当您在Windows上时(为什么我要求使用
sessionInfo()
),您必须使用一个“PSOCK”集群,它使用一个全新的R会话。有关详细信息,请参阅(例如)。@duckmayr,哦,对了,非PSOCK群集受益于已设置的主节点搜索路径。B/c非Windows可以使用分叉模型,该模型应受益于已加载到主节点的包/数据。(所有对
library
的调用以及在调用
makeCluster
之后在主节点中进行的此类调用都需要
clusterEvalQ
来推送更新。)很好的调用。松散相关:,尽管每个节点上都需要
.libpath(…)
,以查找非通用库位置。
> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_Ireland.1252  LC_CTYPE=English_Ireland.1252    LC_MONETARY=English_Ireland.1252 LC_NUMERIC=C                    
[5] LC_TIME=English_Ireland.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] devtools_1.13.6    snow_0.4-3         future.apply_1.0.1 future_1.9.0       RSelenium_1.7.4    stringr_1.3.0      rvest_0.3.2        xml2_1.2.0        
[9] XML_3.98-1.11     

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.16     magrittr_1.5     rappdirs_0.3.1   R6_2.2.2         httr_1.3.1       globals_0.12.3   caTools_1.17.1   tools_3.5.0      binman_0.1.1    
[10] git2r_0.23.0     withr_2.1.2      selectr_0.4-1    semver_0.2.0     subprocess_0.8.3 digest_0.6.15    openssl_1.0.1    yaml_2.1.18      assertthat_0.2.0
[19] codetools_0.2-15 bitops_1.0-6     curl_3.2         memoise_1.1.0    wdman_0.2.4      stringi_1.1.7    compiler_3.5.0   jsonlite_1.5     listenv_0.7.0   
> 
library(doParallel)

## create N-1 cores clusters
cl <- makeCluster(detectCores() - 1, # number of cores to use
                         type = "PSOCK")

## load the libraries inside the cluster 
clusterEvalQ(cl,  library(rvest))

clusterExport(cl, 'date_series')
clusterExport(cl, 'insider_sales_MASTER')



matrix_of_sums <- parLapply(cl, date_series, scrape_insider_forms)