如何理解主流程和辅助流程--R包“;平行;?
当我试图理解如何理解主流程和辅助流程--R包“;平行;?,r,rparallel,R,Rparallel,当我试图理解R软件包parallel的文档时,我在阅读中第8页的一些代码行时遇到了这个问题。我复制了下面的代码。请注意,mc正好等于2 # mc = 2 cl <- makeCluster(mc) cd4.rg <- function(data, mle) MASS::mvrnorm(nrow(data), mle$m, mle$v) cd4.mle <- list(m = colMeans(cd4), v = var(cd4)) clusterExport(cl, c("cd
R
软件包parallel
的文档时,我在阅读中第8页的一些代码行时遇到了这个问题。我复制了下面的代码。请注意,mc
正好等于2
# mc = 2
cl <- makeCluster(mc)
cd4.rg <- function(data, mle) MASS::mvrnorm(nrow(data), mle$m, mle$v)
cd4.mle <- list(m = colMeans(cd4), v = var(cd4))
clusterExport(cl, c("cd4.rg", "cd4.mle"))
junk <- clusterEvalQ(cl, library(boot)) # discard result
clusterSetRNGStream(cl, 123)
res <- clusterEvalQ(cl, boot(cd4, corr, R = 500,
+ sim = "parametric", ran.gen = cd4.rg, mle = cd4.mle))
library(boot) # needed for c() method on master
cd4.boot <- do.call(c, res)
boot.ci(cd4.boot, type = c("norm", "basic", "perc"),
+ conf = 0.9, h = atanh, hinv = tanh)
stopCluster(cl)
#mc=2
cl在我的系统上,可以看到makeCluster(2)创建了2个额外的R进程(我在Windows上,可以在资源监视器中看到它们)。因此,“工人”似乎不同于“主”过程,并且是“主”过程的补充
关于库,快速检查的方法是向每个工作人员询问其loadedNamespaces()
。下面的成绩单以加载到两个worker上的foreach
包为例显示了这一点,在加载前后,worker和master的loadedNamespaces()
由于只有辅助进程用于执行clusterEvalQ
的表达式,因此从表面上看,将辅助进程的数量增加到最多8似乎是合理的。真正的性能将取决于其他因素,如8核系统上可用的逻辑核数量、每个核在处理过程中所做的工作,以及当时系统上发生的其他事情
发言稿:
library(parallel)
cl <- makeCluster(2)
loadedNamespaces() # get loaded namespaces on master
# [1] "base" "datasets" "graphics" "grDevices" "methods" "parallel" "stats" "utils"
clusterEvalQ(cl, loadedNamespaces()) # get loaded namespaces on workers
# [[1]]
# [1] "base" "datasets" "graphics" "grDevices" "methods" "parallel" "stats" "utils"
#
# [[2]]
# [1] "base" "datasets" "graphics" "grDevices" "methods" "parallel" "stats" "utils"
#
invisible( clusterEvalQ(cl, library(foreach)) ) # load foreach on workers
loadedNamespaces() # check master
# [1] "base" "datasets" "graphics" "grDevices" "methods" "parallel" "stats" "utils"
clusterEvalQ(cl, loadedNamespaces()) # check workers
# [[1]]
# [1] "base" "codetools" "datasets" "foreach" "graphics" "grDevices" "iterators" "methods" "parallel" "stats" "utils"
#
# [[2]]
# [1] "base" "codetools" "datasets" "foreach" "graphics" "grDevices" "iterators" "methods" "parallel" "stats" "utils"
#
library(foreach) # load foreach on master
# foreach: simple, scalable parallel programming from Revolution Analytics
# Use Revolution R for scalability, fault tolerance and more.
# http://www.revolutionanalytics.com
loadedNamespaces() # check again
# [1] "base" "codetools" "datasets" "foreach" "graphics" "grDevices" "iterators" "methods" "parallel" "stats" "utils"
stopCluster(cl) # tidy up
库(并行)
cl在我的系统上,可以看到makeCluster(2)创建了2个额外的R进程(我在Windows上,可以在资源监视器中看到它们)。因此,“工人”似乎不同于“主”过程,并且是“主”过程的补充
关于库,快速检查的方法是向每个工作人员询问其loadedNamespaces()
。下面的成绩单以加载到两个worker上的foreach
包为例显示了这一点,在加载前后,worker和master的loadedNamespaces()
由于只有辅助进程用于执行clusterEvalQ
的表达式,因此从表面上看,将辅助进程的数量增加到最多8似乎是合理的。真正的性能将取决于其他因素,如8核系统上可用的逻辑核数量、每个核在处理过程中所做的工作,以及当时系统上发生的其他事情
发言稿:
library(parallel)
cl <- makeCluster(2)
loadedNamespaces() # get loaded namespaces on master
# [1] "base" "datasets" "graphics" "grDevices" "methods" "parallel" "stats" "utils"
clusterEvalQ(cl, loadedNamespaces()) # get loaded namespaces on workers
# [[1]]
# [1] "base" "datasets" "graphics" "grDevices" "methods" "parallel" "stats" "utils"
#
# [[2]]
# [1] "base" "datasets" "graphics" "grDevices" "methods" "parallel" "stats" "utils"
#
invisible( clusterEvalQ(cl, library(foreach)) ) # load foreach on workers
loadedNamespaces() # check master
# [1] "base" "datasets" "graphics" "grDevices" "methods" "parallel" "stats" "utils"
clusterEvalQ(cl, loadedNamespaces()) # check workers
# [[1]]
# [1] "base" "codetools" "datasets" "foreach" "graphics" "grDevices" "iterators" "methods" "parallel" "stats" "utils"
#
# [[2]]
# [1] "base" "codetools" "datasets" "foreach" "graphics" "grDevices" "iterators" "methods" "parallel" "stats" "utils"
#
library(foreach) # load foreach on master
# foreach: simple, scalable parallel programming from Revolution Analytics
# Use Revolution R for scalability, fault tolerance and more.
# http://www.revolutionanalytics.com
loadedNamespaces() # check again
# [1] "base" "codetools" "datasets" "foreach" "graphics" "grDevices" "iterators" "methods" "parallel" "stats" "utils"
stopCluster(cl) # tidy up
库(并行)
cl根据[this post][1],主流程和工作流程是不同的。我的问题仍然是我能有意义地创建多少工作流程。[1] :根据[this post][1],主进程和工作进程是不同的。我的问题仍然是我能有意义地创建多少工作流程。[1] :谢谢你的回答,克里斯:)谢谢你的回答,克里斯:)