Fortran MPI_Finalize行为不正确,孤立进程

Fortran MPI_Finalize行为不正确,孤立进程,fortran,mpi,intel,openmpi,infiniband,Fortran,Mpi,Intel,Openmpi,Infiniband,我有一个相当直接的MPI程序,本质上是“初始化,2个从主设备发送到从设备,2个从设备接收,执行一系列复制/粘贴的系统调用,然后运行代码,整理并完成MPI” 这看起来很简单,但我没有让mpi_finalize正常工作。下面是该程序的快照,没有我在“DoCodish stuff”类型语句中汇总的所有系统复制/粘贴/调用外部代码 program mpi_finalize_break !<variable declarations> call MPI_INIT(ierr) icomm = M

我有一个相当直接的MPI程序,本质上是“初始化,2个从主设备发送到从设备,2个从设备接收,执行一系列复制/粘贴的系统调用,然后运行代码,整理并完成MPI”

这看起来很简单,但我没有让mpi_finalize正常工作。下面是该程序的快照,没有我在“DoCodish stuff”类型语句中汇总的所有系统复制/粘贴/调用外部代码

program mpi_finalize_break
!<variable declarations>
call MPI_INIT(ierr)
icomm = MPI_COMM_WORLD
call MPI_COMM_SIZE(icomm,nproc,ierr)
call MPI_COMM_RANK(icomm,rank,ierr)

!<do codish stuff for a while>
if (rank == 0) then
    !<set up some stuff then call MPI_SEND in a loop over number of slaves>
    call MPI_SEND(numat,1,MPI_INTEGER,n,0,icomm,ierr)
    call MPI_SEND(n_to_add,1,MPI_INTEGER,n,0,icomm,ierr)
else
    call MPI_Recv(begin_mat,1,MPI_INTEGER,0,0,icomm,status,ierr)
    call MPI_Recv(nrepeat,1,MPI_INTEGER,0,0,icomm,status,ierr)
    !<do codish stuff for a while>
endif

print*, "got here4", rank
call MPI_BARRIER(icomm,ierr)
print*, "got here5", rank, ierr
call MPI_FINALIZE(ierr)

print*, "got here6"
end program mpi_finalize_break
但代码在其他情况下运行正常(所有正确的输出文件和内容)

(这与其说是回答,不如说是评论,但我需要空间来放置错误消息……)

如果某个地方有
呼叫系统,您的问题也可能来自“复制/粘贴/呼叫外部代码”。
使用OpenMPI时,禁止分叉进程。您将收到以下警告:

--------------------------------------
An MPI process has executed an operation involving a call to the
"fork()" system call to create a child process. Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your MPI job may hang, crash, or produce silent
data corruption. The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.
-------------------------------------

你向OpenMPI用户列表报告了吗?我没有,因为我第一次假设我是问题所在,但是我调查得越多,它就越没有意义,所以我也会在那里交叉发布。从技术上讲,MPI不能保证在
MPI\u Finalize
之后存在多少个进程,但是集群的每个实现在完成后都会有与之前相同的数量。我不确定Fortran中的IO刷新语义是什么(尽管我经常使用Fortran,但它在代码中有自己的刷新例程包装C),所以我不会使用
got here6
作为诊断。但是,如果您发现并非所有进程都存在于
MPI\u Finalize
之后,这显然是一个问题。如果将
MPI\u Finalize()
替换为
MPI\u Abort(MPI\u COMM\u WORLD,0)
,会发生什么情况?
--------------------------------------
An MPI process has executed an operation involving a call to the
"fork()" system call to create a child process. Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your MPI job may hang, crash, or produce silent
data corruption. The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.
-------------------------------------