Networking openMPI/mpich2不';t在多个节点上运行
我试图在多节点集群上安装openMPI和mpich2,但在这两种情况下,在多台计算机上运行时遇到问题。使用mpich2,我可以从head节点在特定主机上运行,但如果我尝试从compute节点到其他节点运行某些内容,我会得到:Networking openMPI/mpich2不';t在多个节点上运行,networking,mpi,openmpi,sungridengine,mpich,Networking,Mpi,Openmpi,Sungridengine,Mpich,我试图在多节点集群上安装openMPI和mpich2,但在这两种情况下,在多台计算机上运行时遇到问题。使用mpich2,我可以从head节点在特定主机上运行,但如果我尝试从compute节点到其他节点运行某些内容,我会得到: HYDU_sock_connect (utils/sock/sock.c:172): unable to connect from "destination_node" to "parent_node" (No route to host) [proxy:0:0@desti
HYDU_sock_connect (utils/sock/sock.c:172): unable to connect from "destination_node" to "parent_node" (No route to host)
[proxy:0:0@destination_node] main (pm/pmiserv/pmip.c:189): unable to connect to server parent_node at port 56411 (check for firewalls!)
如果我尝试使用sge设置作业,我会遇到类似的错误
另一方面,如果我尝试使用openMPI来运行作业,我将无法在任何远程机器上运行,即使是从head节点。我得到:
ORTE was unable to reliably start one or more daemons.
This usually is caused by:
* not finding the required libraries and/or binaries on
one or more nodes. Please check your PATH and LD_LIBRARY_PATH
settings, or configure OMPI with --enable-orterun-prefix-by-default
* lack of authority to execute on one or more specified nodes.
Please verify your allocation and authorities.
* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
Please check with your sys admin to determine the correct location to use.
* compilation of the orted with dynamic libraries when static are required
(e.g., on Cray). Please check your configure cmd line and consider using
one of the contrib/platform definitions for your system type.
* an inability to create a connection back to mpirun due to a
lack of common network interfaces and/or no route found between
them. Please check network connectivity (including firewalls
and network routing requirements).
这些机器相互连接,我可以从任何机器到任何其他机器进行ping、ssh等无密码操作,所有机器中的MPI_LIB和路径都设置得很好。通常这是因为您没有设置主机文件或在命令行上传递主机列表 对于MPICH,您可以通过在命令行上传递标志
-host
,然后是主机列表(host1
、host2
、host3
)来完成此操作
然后在命令行上传递该文件,如下所示:
mpiexec -f <hostfile> -n 3 <executable>
mpiexec-f-n3
类似地,对于Open MPI,您将使用:
mpiexec --host host1,host2,host3 -n 3 <executable>
mpiexec—主机主机1、主机2、主机3-n3
及
mpiexec--hostfile hostfile-n3
您可以通过以下链接获得更多信息:
- MPICH-
- 开放式MPI-
ORTE无法可靠地启动一个或多个守护进程
mpiexec -f <hostfile> -n 3 <executable>
mpiexec --host host1,host2,host3 -n 3 <executable>
mpiexec --hostfile hostfile -n 3 <executable>