Networking openMPI/mpich2不';t在多个节点上运行

Networking openMPI/mpich2不';t在多个节点上运行,networking,mpi,openmpi,sungridengine,mpich,Networking,Mpi,Openmpi,Sungridengine,Mpich,我试图在多节点集群上安装openMPI和mpich2,但在这两种情况下,在多台计算机上运行时遇到问题。使用mpich2,我可以从head节点在特定主机上运行,但如果我尝试从compute节点到其他节点运行某些内容,我会得到: HYDU_sock_connect (utils/sock/sock.c:172): unable to connect from "destination_node" to "parent_node" (No route to host) [proxy:0:0@desti

我试图在多节点集群上安装openMPI和mpich2,但在这两种情况下,在多台计算机上运行时遇到问题。使用mpich2,我可以从head节点在特定主机上运行,但如果我尝试从compute节点到其他节点运行某些内容,我会得到:

HYDU_sock_connect (utils/sock/sock.c:172): unable to connect from "destination_node" to "parent_node" (No route to host)
[proxy:0:0@destination_node] main (pm/pmiserv/pmip.c:189): unable to connect to server parent_node at port 56411 (check for firewalls!)
如果我尝试使用sge设置作业,我会遇到类似的错误

另一方面,如果我尝试使用openMPI来运行作业,我将无法在任何远程机器上运行,即使是从head节点。我得到:

ORTE was unable to reliably start one or more daemons.
This usually is caused by:

* not finding the required libraries and/or binaries on
  one or more nodes. Please check your PATH and LD_LIBRARY_PATH
  settings, or configure OMPI with --enable-orterun-prefix-by-default

* lack of authority to execute on one or more specified nodes.
  Please verify your allocation and authorities.

* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
  Please check with your sys admin to determine the correct location to use.

*  compilation of the orted with dynamic libraries when static are required
  (e.g., on Cray). Please check your configure cmd line and consider using
  one of the contrib/platform definitions for your system type.

* an inability to create a connection back to mpirun due to a
  lack of common network interfaces and/or no route found between
  them. Please check network connectivity (including firewalls
  and network routing requirements).

这些机器相互连接,我可以从任何机器到任何其他机器进行ping、ssh等无密码操作,所有机器中的MPI_LIB和路径都设置得很好。

通常这是因为您没有设置主机文件或在命令行上传递主机列表

对于MPICH,您可以通过在命令行上传递标志
-host
,然后是主机列表(
host1
host2
host3
)来完成此操作

然后在命令行上传递该文件,如下所示:

mpiexec -f <hostfile> -n 3 <executable>
mpiexec-f-n3
类似地,对于Open MPI,您将使用:

mpiexec --host host1,host2,host3 -n 3 <executable>
mpiexec—主机主机1、主机2、主机3-n3

mpiexec--hostfile hostfile-n3
您可以通过以下链接获得更多信息:

  • MPICH-
  • 开放式MPI-

如果我做了所有这些事情,但仍然得到
ORTE无法可靠地启动一个或多个守护进程
mpiexec -f <hostfile> -n 3 <executable>
mpiexec --host host1,host2,host3 -n 3 <executable>
mpiexec --hostfile hostfile -n 3 <executable>