如何理解主流程和辅助流程--R包“;平行;?

如何理解主流程和辅助流程--R包“;平行;?,r,rparallel,R,Rparallel,当我试图理解R软件包parallel的文档时,我在阅读中第8页的一些代码行时遇到了这个问题。我复制了下面的代码。请注意,mc正好等于2 # mc = 2 cl <- makeCluster(mc) cd4.rg <- function(data, mle) MASS::mvrnorm(nrow(data), mle$m, mle$v) cd4.mle <- list(m = colMeans(cd4), v = var(cd4)) clusterExport(cl, c("cd

当我试图理解
R
软件包
parallel
的文档时,我在阅读中第8页的一些代码行时遇到了这个问题。我复制了下面的代码。请注意,
mc
正好等于
2

# mc = 2
cl <- makeCluster(mc)
cd4.rg <- function(data, mle) MASS::mvrnorm(nrow(data), mle$m, mle$v)
cd4.mle <- list(m = colMeans(cd4), v = var(cd4))
clusterExport(cl, c("cd4.rg", "cd4.mle"))
junk <- clusterEvalQ(cl, library(boot)) # discard result
clusterSetRNGStream(cl, 123)
res <- clusterEvalQ(cl, boot(cd4, corr, R = 500,
+                   sim = "parametric", ran.gen = cd4.rg, mle = cd4.mle))
library(boot) # needed for c() method on master
cd4.boot <- do.call(c, res)
boot.ci(cd4.boot, type = c("norm", "basic", "perc"),
+                   conf = 0.9, h = atanh, hinv = tanh)
stopCluster(cl)
#mc=2
cl在我的系统上,可以看到makeCluster(2)创建了2个额外的R进程(我在Windows上,可以在资源监视器中看到它们)。因此,“工人”似乎不同于“主”过程,并且是“主”过程的补充

关于库,快速检查的方法是向每个工作人员询问其
loadedNamespaces()
。下面的成绩单以加载到两个worker上的
foreach
包为例显示了这一点,在加载前后,worker和master的
loadedNamespaces()

由于只有辅助进程用于执行
clusterEvalQ
的表达式,因此从表面上看,将辅助进程的数量增加到最多8似乎是合理的。真正的性能将取决于其他因素,如8核系统上可用的逻辑核数量、每个核在处理过程中所做的工作,以及当时系统上发生的其他事情

发言稿:

library(parallel)
cl <- makeCluster(2)
loadedNamespaces() # get loaded namespaces on master
# [1] "base"      "datasets"  "graphics"  "grDevices" "methods"   "parallel"  "stats"     "utils"
clusterEvalQ(cl, loadedNamespaces()) # get loaded namespaces on workers
# [[1]]
# [1] "base"      "datasets"  "graphics"  "grDevices" "methods"   "parallel"  "stats"     "utils"    
# 
# [[2]]
# [1] "base"      "datasets"  "graphics"  "grDevices" "methods"   "parallel"  "stats"     "utils"    
# 
invisible( clusterEvalQ(cl, library(foreach)) ) # load foreach on workers
loadedNamespaces() # check master
# [1] "base"      "datasets"  "graphics"  "grDevices" "methods"   "parallel"  "stats"     "utils"
clusterEvalQ(cl, loadedNamespaces()) # check workers
# [[1]]
#  [1] "base"      "codetools" "datasets"  "foreach"   "graphics"  "grDevices" "iterators" "methods"   "parallel"  "stats"     "utils"    
# 
# [[2]]
#  [1] "base"      "codetools" "datasets"  "foreach"   "graphics"  "grDevices" "iterators" "methods"   "parallel"  "stats"     "utils"    
# 
library(foreach) # load foreach on master
# foreach: simple, scalable parallel programming from Revolution Analytics
# Use Revolution R for scalability, fault tolerance and more.
# http://www.revolutionanalytics.com
loadedNamespaces() # check again
# [1] "base"      "codetools" "datasets"  "foreach"   "graphics"  "grDevices" "iterators" "methods"   "parallel"  "stats"     "utils"    
stopCluster(cl) # tidy up
库(并行)
cl在我的系统上,可以看到makeCluster(2)创建了2个额外的R进程(我在Windows上,可以在资源监视器中看到它们)。因此,“工人”似乎不同于“主”过程,并且是“主”过程的补充

关于库,快速检查的方法是向每个工作人员询问其
loadedNamespaces()
。下面的成绩单以加载到两个worker上的
foreach
包为例显示了这一点,在加载前后,worker和master的
loadedNamespaces()

由于只有辅助进程用于执行
clusterEvalQ
的表达式,因此从表面上看,将辅助进程的数量增加到最多8似乎是合理的。真正的性能将取决于其他因素,如8核系统上可用的逻辑核数量、每个核在处理过程中所做的工作,以及当时系统上发生的其他事情

发言稿:

library(parallel)
cl <- makeCluster(2)
loadedNamespaces() # get loaded namespaces on master
# [1] "base"      "datasets"  "graphics"  "grDevices" "methods"   "parallel"  "stats"     "utils"
clusterEvalQ(cl, loadedNamespaces()) # get loaded namespaces on workers
# [[1]]
# [1] "base"      "datasets"  "graphics"  "grDevices" "methods"   "parallel"  "stats"     "utils"    
# 
# [[2]]
# [1] "base"      "datasets"  "graphics"  "grDevices" "methods"   "parallel"  "stats"     "utils"    
# 
invisible( clusterEvalQ(cl, library(foreach)) ) # load foreach on workers
loadedNamespaces() # check master
# [1] "base"      "datasets"  "graphics"  "grDevices" "methods"   "parallel"  "stats"     "utils"
clusterEvalQ(cl, loadedNamespaces()) # check workers
# [[1]]
#  [1] "base"      "codetools" "datasets"  "foreach"   "graphics"  "grDevices" "iterators" "methods"   "parallel"  "stats"     "utils"    
# 
# [[2]]
#  [1] "base"      "codetools" "datasets"  "foreach"   "graphics"  "grDevices" "iterators" "methods"   "parallel"  "stats"     "utils"    
# 
library(foreach) # load foreach on master
# foreach: simple, scalable parallel programming from Revolution Analytics
# Use Revolution R for scalability, fault tolerance and more.
# http://www.revolutionanalytics.com
loadedNamespaces() # check again
# [1] "base"      "codetools" "datasets"  "foreach"   "graphics"  "grDevices" "iterators" "methods"   "parallel"  "stats"     "utils"    
stopCluster(cl) # tidy up
库(并行)

cl根据[this post][1],主流程和工作流程是不同的。我的问题仍然是我能有意义地创建多少工作流程。[1] :根据[this post][1],主进程和工作进程是不同的。我的问题仍然是我能有意义地创建多少工作流程。[1] :谢谢你的回答,克里斯:)谢谢你的回答,克里斯:)