R 如何通过并行计算在自定义函数中有效地使用do.call函数
我的职能是:R 如何通过并行计算在自定义函数中有效地使用do.call函数,r,performance,parallel-processing,do.call,R,Performance,Parallel Processing,Do.call,我的职能是: 在两个模型的已知参数集下模拟两个数据集 (空和可选) 将两个模型与模拟数据重新拟合 我想通过将并行包与pblappy包结合使用来加快计算时间 以下是函数: 当我像上面一样运行,但没有并行计算(do.parallel=F)时,通常计算花费的时间更少: >fitting models to simulated data under the null model (BMM) >|+++++++++++++++++++++++++++++++++++++++++++++++++
>fitting models to simulated data under the null model (BMM)
>|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 32s
>|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 01m 23s
>fitting models to simulated data under the alternative model (OUM)
>|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 09s
>|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 02m 02s
之后,我只是在全局环境中运行部分函数(不在函数内),但使用并行计算。代码和结果如下所示:
cl <- makeCluster(detectCores()-1)
clusterEvalQ (cl, library(mvMORPH))
clusterExport (cl, varlist=c("tree",
"A.df", "B.df",
"call.fun.A", "call.fun.B",
"argsA", "argsB"), envir=environment())
clusterExport (cl, varlist = "do.call")
>fitting models to simulated data under the null model (BMM)
AA <- pblapply (X = A.df, FUN = function(x)
do.call (call.fun.A, args = c (list (tree = tree, data = x), c (argsA, diagnostic=FALSE, echo=FALSE))), cl = cl)
AB <- pblapply (X = A.df, FUN = function(x)
do.call (call.fun.B, args = c (list (tree = tree, data = x), c (argsB, diagnostic=FALSE, echo=FALSE))), cl = cl)
>|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 26s
>|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 57s
>fitting models to simulated data under the alternative model (OUM)
BB <- pblapply (X = B.df, FUN = function(x)
do.call (call.fun.B, args = c (list (tree = tree, data = x), c (argsB, diagnostic=FALSE, echo=FALSE))), cl = cl)
BA <- pblapply (X = B.df, FUN = function(x)
do.call (call.fun.A, args = c (list (tree = tree, data = x), c (argsA, diagnostic=FALSE, echo=FALSE))), cl = cl)
>|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 17s
>|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 49s
stopCluster(cl)
cl将模型拟合到零模型(BMM)下的模拟数据
AA |+100%运行时间=57秒
>将模型与备选模型(OUM)下的模拟数据进行拟合
BB |+100%运行时间=49秒
停止簇(cl)
请注意,全局环境中的并行计算时间远低于我的自定义函数中的并行计算时间
最后,我只是在全局环境中进行并行计算,但没有使用do.call函数,这是最有效的:
cl <- makeCluster(detectCores()-1)
clusterEvalQ (cl, library(mvMORPH))
clusterExport (cl, varlist=c("tree",
"A.df", "B.df"), envir=environment())
>fitting models to simulated data under the null model (BMM)
AA <- pblapply (X = A.df, FUN = function(x)
mvBM (tree = tree, data = x, model = "BMM", method = "sparse", optimization = "L-BFGS-B", diagnostic=FALSE, echo=FALSE), cl = cl)
AB <- pblapply (X = A.df, FUN = function(x)
mvOU (tree = tree, data = x, model = "OUM", method = "pseudoinverse", optimization = "L-BFGS-B", diagnostic=FALSE, echo=FALSE), cl = cl)
>|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 19s
>|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 49s
>fitting models to simulated data under the alternative model (OUM)
BA <- pblapply (X = B.df, FUN = function(x)
mvBM (tree = tree, data = x, model = "BMM", method = "sparse", optimization = "L-BFGS-B", diagnostic=FALSE, echo=FALSE), cl = cl)
BB <- pblapply (X = B.df, FUN = function(x)
mvOU (tree = tree, data = x, model = "OUM", method = "pseudoinverse", optimization = "L-BFGS-B", diagnostic=FALSE, echo=FALSE), cl = cl)
>|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 09s
>|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 41s
stopCluster(cl)
cl将模型拟合到零模型(BMM)下的模拟数据
AA |+100%运行时间=49秒
>将模型与备选模型(OUM)下的模拟数据进行拟合
BA |+100%运行时间=41s
停止簇(cl)
我非常感谢任何建议和/或解决方案,这些建议和/或解决方案可能有助于我在我的函数中实现do.call,并与并行处理一起实现更高效的性能。我发现do.call函数没有任何问题,但主要问题是在我的函数中缺少存储对象的ram 我在一台内存为4 Gb的计算机上尝试了该功能,使用该功能生成的对象很容易到达。因此,计算机试图将存储在ram中的数据分配给hdd,这反过来又导致该功能的性能降低。 一种解决方案是使用
save()
函数将单个对象提取到硬盘,并通过rm()
函数将它们从函数环境中删除。同样,升级ram内存总是合理的
我两个都做了,功能运行得很好
>fitting models to simulated data under the null model (BMM)
>|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 32s
>|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 01m 23s
>fitting models to simulated data under the alternative model (OUM)
>|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 09s
>|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 02m 02s
cl <- makeCluster(detectCores()-1)
clusterEvalQ (cl, library(mvMORPH))
clusterExport (cl, varlist=c("tree",
"A.df", "B.df",
"call.fun.A", "call.fun.B",
"argsA", "argsB"), envir=environment())
clusterExport (cl, varlist = "do.call")
>fitting models to simulated data under the null model (BMM)
AA <- pblapply (X = A.df, FUN = function(x)
do.call (call.fun.A, args = c (list (tree = tree, data = x), c (argsA, diagnostic=FALSE, echo=FALSE))), cl = cl)
AB <- pblapply (X = A.df, FUN = function(x)
do.call (call.fun.B, args = c (list (tree = tree, data = x), c (argsB, diagnostic=FALSE, echo=FALSE))), cl = cl)
>|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 26s
>|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 57s
>fitting models to simulated data under the alternative model (OUM)
BB <- pblapply (X = B.df, FUN = function(x)
do.call (call.fun.B, args = c (list (tree = tree, data = x), c (argsB, diagnostic=FALSE, echo=FALSE))), cl = cl)
BA <- pblapply (X = B.df, FUN = function(x)
do.call (call.fun.A, args = c (list (tree = tree, data = x), c (argsA, diagnostic=FALSE, echo=FALSE))), cl = cl)
>|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 17s
>|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 49s
stopCluster(cl)
cl <- makeCluster(detectCores()-1)
clusterEvalQ (cl, library(mvMORPH))
clusterExport (cl, varlist=c("tree",
"A.df", "B.df"), envir=environment())
>fitting models to simulated data under the null model (BMM)
AA <- pblapply (X = A.df, FUN = function(x)
mvBM (tree = tree, data = x, model = "BMM", method = "sparse", optimization = "L-BFGS-B", diagnostic=FALSE, echo=FALSE), cl = cl)
AB <- pblapply (X = A.df, FUN = function(x)
mvOU (tree = tree, data = x, model = "OUM", method = "pseudoinverse", optimization = "L-BFGS-B", diagnostic=FALSE, echo=FALSE), cl = cl)
>|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 19s
>|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 49s
>fitting models to simulated data under the alternative model (OUM)
BA <- pblapply (X = B.df, FUN = function(x)
mvBM (tree = tree, data = x, model = "BMM", method = "sparse", optimization = "L-BFGS-B", diagnostic=FALSE, echo=FALSE), cl = cl)
BB <- pblapply (X = B.df, FUN = function(x)
mvOU (tree = tree, data = x, model = "OUM", method = "pseudoinverse", optimization = "L-BFGS-B", diagnostic=FALSE, echo=FALSE), cl = cl)
>|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 09s
>|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 41s
stopCluster(cl)