R doSNOW停止群集(cl),导致mpi退出

R doSNOW停止群集(cl),导致mpi退出,r,mpi,R,Mpi,我已经安装了一个mpi集群。它使用openmpi在centos上运行。 我正在尝试运行一个R作业,但是,尽管该作业运行正常,但它会导致一个错误。这对我来说毫无意义。你知道为什么会这样吗?我没有正确地停止R代码上的mpi吗 多斯诺 #!/usr/bin/env Rscript hello.world <- function(i) { sprintf('Hello from loop iteration %d running on rank %d on node %s',

我已经安装了一个mpi集群。它使用openmpi在centos上运行。 我正在尝试运行一个R作业,但是,尽管该作业运行正常,但它会导致一个错误。这对我来说毫无意义。你知道为什么会这样吗?我没有正确地停止R代码上的mpi吗

多斯诺

#!/usr/bin/env Rscript

hello.world <- function(i) {
   sprintf('Hello from loop iteration %d running on rank %d on node %s',
       i, mpi.comm.rank(), Sys.info()[c("nodename")]);
}

library(foreach)
library(snow)
library(doSNOW)

cl <- makeMPIcluster( 3 )
registerDoSNOW(cl)

output.lines <- foreach(i = (1:10)) %dopar% {
   hello.world(i)
}

cat(unlist(output.lines), sep='\n')

stopCluster(cl)
溃败:

R版本3.2.3(2015-12-10)-“木制圣诞树”
版权所有(C)2015统计计算基础
平台:x86_64-redhat-linux-gnu(64位)
R是免费软件,绝对没有保修。
在某些条件下,欢迎您重新分发。
键入“license()”或“license()”以获取分发详细信息。
自然语言支持,但在英语环境中运行
R是一个有许多贡献者的协作项目。
键入“contributors()”以获取详细信息和
“引文()”介绍如何在出版物中引用R或R软件包。
对于某些演示,键入“demo()”;对于联机帮助,键入“help()”;或者
“help.start()”用于HTML浏览器界面的帮助。
键入“q()”退出R。
[以前保存的工作区已恢复]
> #!/usr/bin/env Rscript
>
>你好,世界
>图书馆(foreach)
>图书馆(雪)
>图书馆(doSNOW)
加载所需包:迭代器
>
>cl注册数据库(cl)
>
>输出线
>cat(未列出(output.lines),sep='\n')
来自在节点n0001.cluster的秩1上运行的循环迭代1的Hello
Hello来自在节点wwmaster上的秩2上运行的循环迭代2
Hello来自运行在节点n0000.cluster的秩3上的循环迭代3
在节点n0001.cluster的秩1上运行的循环迭代4中的您好
来自循环迭代5的您好,在节点wwmaster的秩2上运行
来自循环迭代6的您好,在节点n0000.cluster的秩3上运行
来自循环迭代7的您好,在节点n0000.cluster的秩3上运行
来自循环迭代8的您好,运行在节点n0001.cluster上的秩1上
来自循环迭代9的您好,运行在节点wwmaster上的秩2上
来自循环迭代10的您好,在节点n0001.cluster的秩1上运行
>
>停止簇(cl)
[1] 1
>
>过程时间()
用户系统运行时间
0.835   0.290   4.177

我确实可以重现一个问题,但它是通过
makeMPIcluster
函数实现的。如果使用makeCluster(3,type='MPI'),您是否也会遇到同样的问题?找到了解决方案。在stopCluster(cl)之后添加Rmpi::mpi.quit()。是的,mpi.quit()将调用“finalize”,这使mpirun感到高兴。另外,您应该使用mpirun“-n 1”选项,否则mpirun将启动多个进程,每个进程将产生三个工作进程。我确实可以重现一个问题,但它是通过
makeMPIcluster
函数实现的。如果使用makeCluster(3,type='MPI'),您不会遇到同样的问题吗?找到了解决方案。在stopCluster(cl)之后添加Rmpi::mpi.quit()。是的,mpi.quit()将调用“finalize”,这使mpirun感到高兴。另外,您应该使用mpirun“-n1”选项,否则mpirun将启动多个进程,每个进程将产生三个工作进程。
mpirun --hostfile wwhosts R CMD BATCH dosnow.r
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 3200 on
node wwmaster exiting improperly. There are three reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

3. this process called "MPI_Abort" or "orte_abort" and the mca parameter
orte_create_session_dirs is set to false. In this case, the run-time cannot
detect that the abort call was an abnormal termination. Hence, the only
error message you will receive is this one.

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).

You can avoid this message by specifying -quiet on the mpirun command line.

--------------------------------------------------------------------------
R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree"
Copyright (C) 2015 The R Foundation for Statistical Computing
Platform: x86_64-redhat-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

[Previously saved workspace restored]

> #!/usr/bin/env Rscript
>
> hello.world <- function(i) {
+    sprintf('Hello from loop iteration %d running on rank %d on node %s',
+        i, mpi.comm.rank(), Sys.info()[c("nodename")]);
+ }
>
> library(foreach)
> library(snow)
> library(doSNOW)
Loading required package: iterators
>
> cl <- makeMPIcluster( 3 )
Loading required namespace: Rmpi
    3 slaves are spawned successfully. 0 failed.
> registerDoSNOW(cl)
>
> output.lines <- foreach(i = (1:10)) %dopar% {
+    hello.world(i)
+ }
>
> cat(unlist(output.lines), sep='\n')
Hello from loop iteration 1 running on rank 1 on node n0001.cluster
Hello from loop iteration 2 running on rank 2 on node wwmaster
Hello from loop iteration 3 running on rank 3 on node n0000.cluster
Hello from loop iteration 4 running on rank 1 on node n0001.cluster
Hello from loop iteration 5 running on rank 2 on node wwmaster
Hello from loop iteration 6 running on rank 3 on node n0000.cluster
Hello from loop iteration 7 running on rank 3 on node n0000.cluster
Hello from loop iteration 8 running on rank 1 on node n0001.cluster
Hello from loop iteration 9 running on rank 2 on node wwmaster
Hello from loop iteration 10 running on rank 1 on node n0001.cluster
>
> stopCluster(cl)
[1] 1
>
> proc.time()
   user  system elapsed
  0.835   0.290   4.177