将rankfiles与OpenMPI一起使用

将rankfiles与OpenMPI一起使用,mpi,distributed-computing,openmpi,Mpi,Distributed Computing,Openmpi,我试图在集群中使用MPI,并希望能够控制在哪些节点中调度哪些列组 注意:我正在使用OpenMPI 2.1.0。 为此,我使用了一个rankfile。如果我使用以下rankfile: ubuntu@ip-172-31-8-16:~/dist_log_reg$ cat rankfile rank 0=localhost slots=1 rank 1=54.153.103.12 slots=1 ubuntu@ip-172-31-8-16:~/dist_log_reg$ cat rankfile

我试图在集群中使用MPI,并希望能够控制在哪些节点中调度哪些列组

注意:我正在使用OpenMPI 2.1.0。

为此,我使用了一个rankfile。如果我使用以下rankfile:

ubuntu@ip-172-31-8-16:~/dist_log_reg$ cat rankfile 
rank 0=localhost slots=1
rank 1=54.153.103.12 slots=1
ubuntu@ip-172-31-8-16:~/dist_log_reg$ cat rankfile 
rank 0=localhost slots=1
我得到:

ubuntu@ip-172-31-8-16:~/dist_log_reg$ mpirun -v -np 1 -rankfile rankfile hostname
--------------------------------------------------------------------------
The rankfile that was used claimed that a host was either not
allocated or oversubscribed its slots.  Please review your rank-slot
assignments and your host allocation to ensure a proper match.  Also,
some systems may require using full hostnames, such as
"host1.example.com" (instead of just plain "host1").

  Host: ip-172-31-8-16
ubuntu@ip-172-31-8-16:~/dist_log_reg$ mpirun -v -np 1 -rankfile rankfile hostname
--------------------------------------------------------------------------
All nodes which are allocated for this job are already filled.
如果在rankfile中仅使用一个条目:

ubuntu@ip-172-31-8-16:~/dist_log_reg$ cat rankfile 
rank 0=localhost slots=1
rank 1=54.153.103.12 slots=1
ubuntu@ip-172-31-8-16:~/dist_log_reg$ cat rankfile 
rank 0=localhost slots=1
我得到:

ubuntu@ip-172-31-8-16:~/dist_log_reg$ mpirun -v -np 1 -rankfile rankfile hostname
--------------------------------------------------------------------------
The rankfile that was used claimed that a host was either not
allocated or oversubscribed its slots.  Please review your rank-slot
assignments and your host allocation to ensure a proper match.  Also,
some systems may require using full hostnames, such as
"host1.example.com" (instead of just plain "host1").

  Host: ip-172-31-8-16
ubuntu@ip-172-31-8-16:~/dist_log_reg$ mpirun -v -np 1 -rankfile rankfile hostname
--------------------------------------------------------------------------
All nodes which are allocated for this job are already filled.
我已经尝试了我能想到的一切(例如,安装其他MPI发行版,并在rankfile中尝试不同的选项),但没有成功


有什么想法吗?

我通过将
localhost
传递为
hostname
创建了您的错误。但是当我使用实际的系统名时,我成功地运行了它

rank X=myPC slot=Y

我相信OpenMPI会探测主机名并执行调用。

我相信最新版本的MPI同时支持“插槽”和“插槽”。两者都给出了上面的错误。嗯,那么我猜FAQ已经过时了。像这样的rank文件怎么样:
rank 0=ip-172-31-8-16 slot=1
你能试试吗?是的,这很有效。谢谢如果你在回答中包括这一点,我将接受。