运行MPI的倍频程失败
我是八度音阶的新手。现在,我在Ubuntu14.04上运行了一个helloworld的octave示例,但它总是失败。故障信息如下:运行MPI的倍频程失败,mpi,cluster-computing,octave,Mpi,Cluster Computing,Octave,我是八度音阶的新手。现在,我在Ubuntu14.04上运行了一个helloworld的octave示例,但它总是失败。故障信息如下: octave:1> system (" mpirun -x LD_PRELOAD=libmpi.so --hostfile ./hostfile -np 2 octave -q --eval 'pkg load mpi; helloworld ()'"); --------------------------------------------------
octave:1> system (" mpirun -x LD_PRELOAD=libmpi.so --hostfile ./hostfile -np 2 octave -q --eval 'pkg load mpi; helloworld ()'");
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
ompi_mpi_init: ompi_rte_init failed
--> Returned "(null)" (-43) instead of "Success" (0)
--------------------------------------------------------------------------
An error occurred in MPI_Init
on a NULL communicator
MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
and potentially your MPI job)
[computationnode:17991] Local abort before MPI_INIT completed completed
successfully, but am not able to aggregate error messages, and not able to
guarantee that all other processes were killed!
An error occurred in MPI_Init
on a NULL communicator
MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
and potentially your MPI job)
[computationnode:17992] Local abort before MPI_INIT completed completed
successfully, but am not able to aggregate error messages, and not able to
guarantee that all other processes were killed!
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
ompi_mpi_init: ompi_rte_init failed
--> Returned "(null)" (-43) instead of "Success" (0)
--------------------------------------------------------------------------
mpirun
注意到作业已中止,但没有关于进程的信息
这造成了这种局面
有人能帮我吗?我非常想解决这个问题。您确定您的主机文件吗?您是否尝试过不使用
--hostfile
选项?您是否尝试过在执行mpirun
命令之前导出LD\u预加载,并将-x LD\u预加载作为一个选项传递?(mpirun的手册页指出,-x
不是很复杂,最好在命令之外定义env变量,并且仅使用-x
来“导出”,而不是为了mpi而“定义”)。另外,您是否确定helloworld()
函数存在,并且可以从默认路径中的倍频程访问该函数?(我不是mpi专家,这只是一般性的建议)此外,您可能会更幸运地使用并行
软件包,它似乎更易于维护/记录。特别是如果你只是在寻找局部并行化的话。是的,这个问题已经解决了。这主要是因为环境变量。谢谢你,伙计。顺便说一句,我是MPI新手。就主机文件而言,它应该是什么格式和信息?你能给我举几个例子吗?例如,我有两个节点,一个有4个核,IP是A.A.A.A,另一个也有4个核,IP是B.B.B.B。我不知道,就像我说的,我不是MPI专家。我只是根据常识对你的问题可能的罪魁祸首发表了我的看法:)