Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/joomla/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
MPI(OpenMPI)-MPI\u发布\u名称无法联系全局ompi服务器并引发错误_Mpi_Openmpi - Fatal编程技术网

MPI(OpenMPI)-MPI\u发布\u名称无法联系全局ompi服务器并引发错误

MPI(OpenMPI)-MPI\u发布\u名称无法联系全局ompi服务器并引发错误,mpi,openmpi,Mpi,Openmpi,我正在尝试编写一个MPI应用程序,它将由服务器-客户机模式中的程序组成。我一直在尝试让服务器将其名称发布到全局范围内的ompi服务器 以下是服务器代码: int main(int argc, char** argv) { int myrank, nprocs, errmpi; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &nprocs); MPI_Comm_rank(MPI_COMM_WORLD, &

我正在尝试编写一个MPI应用程序,它将由服务器-客户机模式中的程序组成。我一直在尝试让服务器将其名称发布到全局范围内的ompi服务器

以下是服务器代码:

int main(int argc, char** argv) {
int myrank, nprocs, errmpi;

MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
char port_name[MPI_MAX_PORT_NAME];
MPI_Info info;
MPI_Info_create(&info);
MPI_Info_set(info, "ompi_global_scope", "yes");
MPI_Open_port(info, port_name);

//Fails here
MPI_Publish_name("ServerName", info, port_name);

// Rest of code...
我在运行时遇到以下错误:

$ ./mpi/bin/mpirun -np 1 --mca btl self ServerName
--------------------------------------------------------------------------
Process rank 0 attempted to publish to a global ompi_server that
could not be contacted. This is typically caused by either not
specifying the contact info for the server, or by the server not
currently executing. If you did specify the contact info for a
server, please check to see that the server is running and start
it again (or have your sys admin start it) if it isn't.

--------------------------------------------------------------------------
[xxx:18205] *** An error occurred in MPI_Publish_name
[xxx:18205] *** reported by process [1424949249,139676631433216]
[xxx:18205] *** on communicator MPI_COMM_WORLD
[xxx:18205] *** MPI_ERR_INTERN: internal error
[xxx:18205] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[xxx:18205] ***    and potentially your MPI job)
我确实在控制台上以调试模式运行ompi服务器进程

$ ./ompi-server --no-daemonize -d -r +
[xxx:14140] [[9416,0],0] orte-server: up and running!
最终,我将在各个节点上分发流程,但现在我真的希望让框架在单个节点上工作。有人能帮忙吗?非常感谢

编辑1:非常感谢您的快速回复。我做了以下更改

$mpi/bin/ompi-server --no-daemonize -d -r mpiuri
如果我现在这样运行程序,我发现程序挂起在以前失败的地方

$./mpi/bin/mpirun --ompi-server file:mpiuri -mca btn tcp,self,sm -np 1 -v Server
而如果我用以下命令运行程序

$ ./mpi/bin/mpirun --ompi-server file:mpiuri -mca btn tcp,self,sm -np 1 -v --wait-for-server --server-wait-time 10 Server
有以下错误

--------------------------------------------------------------------------
mpirun was instructed to wait for the requested ompi-server, but was unable to
establish contact with the server during the specified wait time:

Server uri:  799801344.0;tcp://192.168.1.113:44487
Timeout time: 10

Error received: Not supported

Please check to ensure that the requested server matches the actual server
information, and that the server is in operation.
--------------------------------------------------------------------------
我必须靠近。。。但我不太明白


我很确定这不是防火墙,因为我在ufw中添加了规则ALLOW 192.168.1.0/24

1) 确保ompi服务器已启动并正在运行,并且正在使用以下命令将其uri写入文件

$mpi/bin/ompi-server --no-daemonize -d -r mpiuri
2) 使用此uri文件启动所有mpi进程,确保

  • 输入文件名时,在uri文件名前加上“file:” --ompi服务器参数
  • 输入运行MPI的节点的主机名。。。像这样

    $./mpi/bin/mpirun--ompi服务器文件:mpiuri-host myHostName-np 1-v server


  • 您应该为
    mpiexec
    提供名称服务器的URI。你的问题基本上是重复的非常感谢你Hristo,是的,我在问我的问题之前看到了你的答案,但是错过了关于添加文件uri的部分。然而,正如您在我的编辑中所看到的,即使添加了uri,我仍然无法联系ompi服务器。您应该从问题中删除解决方案,而是发布您自己的答案,然后在超时时间结束后接受它。因此,如果将来出现类似的问题,其他人可以参考这个问题。感谢Hristo,按照建议完成。干杯