Python多处理和MPI多节点

Python多处理和MPI多节点,python,multiprocessing,mpi,communication,Python,Multiprocessing,Mpi,Communication,这是我的问题。我正在编写一个现有的python多处理代码(在单个节点上运行),目标是使用MPI for python(mpi4py)执行该代码的多节点执行 两个节点之间的MPI通信仅由每个MPI进程的主线程完成(调用MPI.Init()的线程,您可以通过调用MPI.Is_thread_main()函数知道是哪个线程完成的),但不幸的是,该线程不起作用 事实上,在python进程启动之后,MPI通信就不起作用了 为了解释这个问题,我重新编写了一个简短的代码,其中有完全相同的问题 import os

这是我的问题。我正在编写一个现有的python多处理代码(在单个节点上运行),目标是使用MPI for python(mpi4py)执行该代码的多节点执行

两个节点之间的MPI通信仅由每个MPI进程的主线程完成(调用MPI.Init()的线程,您可以通过调用MPI.Is_thread_main()函数知道是哪个线程完成的),但不幸的是,该线程不起作用

事实上,在python进程启动之后,MPI通信就不起作用了

为了解释这个问题,我重新编写了一个简短的代码,其中有完全相同的问题

import os
import psutil
import multiprocessing
import numpy as np
import Queue
import time

from mpi4py import rc
rc.initialize = False # if = True, The Init is done when "from mpi4py import MPI" is called
rc.thread_level = 'funneled'

from mpi4py import MPI


def infiniteloop(arg):
    while True:
        print(arg)
        time.sleep(1)
        # Check if the worker think it's the thread who called MPI.Init()
        print("Worker is Main Thread %s" %(MPI.Is_thread_main()))
        print("Rank %d on %s, Process PID for worker = %d" %(MPI.COMM_WORLD.Get_rank(),MPI.Get_processor_name(),os.getpid()))



if __name__ == '__main__':

    MPI.Init()
    # In the code I'm working on, MPI.Init() has to be done before the miltiprocess initialization

    proc = multiprocessing.Process(target=infiniteloop, args=('RunningWorker',))
    proc.start()
    print("MultiProcess Stared")


    comm = MPI.COMM_WORLD
    rank = comm.Get_rank()
    size_mpi = comm.Get_size()

    while True:
        print("Running Main Thread")
        print("Main Thread is Main Thread %s" %( MPI.Is_thread_main()))
        print("Rank %d on %s, Process PID for main = %s" %(MPI.COMM_WORLD.Get_rank(),MPI.Get_processor_name(),os.getpid()))
        print("Rank %d on %s, rc.thread_level = %s" %((MPI.COMM_WORLD.Get_rank(),MPI.Get_processor_name(), rc.thread_level)))
        time.sleep(1)

        # Start MPI Communication, It is just an example of 2D array communication which I know it works
        print("Start MPI Transfert")
        #*************** Multiple SEND AND RECEIVE for 2D Array fill randomly
        SumPsfmean = None
        TransPsfmean = None
        TABSIZE = 100
        # Create 100x100 array with random np.float64 values
        # (Because it's really close from the case I'm intersting for)
        # Row per row communication
        if rank == 0:
            psfmean = np.random.rand(TABSIZE,TABSIZE)
            print(psfmean.dtype)
        else:
            psfmean = np.random.rand(TABSIZE,TABSIZE)

        psfmean_shape = psfmean.shape

        if rank == 0:
            SumPsfmean = np.array(range(psfmean.size*(size_mpi-1)), dtype = np.float64)
            SumPsfmean.shape = (size_mpi-1, psfmean_shape[0], psfmean_shape[1])
            TransPsfmean = np.array(range(psfmean[0].size), dtype = np.float64)

        for i in range(psfmean_shape[1]):
            print("Rank %d : Send&Receive nb %d" %(rank, I))
            if rank == 0:
                comm.Recv(TransPsfmean, source=1, tag=i)
            elif rank ==1:
                comm.Send(psfmean[i], dest=0, tag=i)
            print"End Send&Receive %d" %i
            if rank == 0:
                for k in range(size_mpi):
                    if k != 0:
                        SumPsfmean[k-1][i] = TransPsfmean


    proc.join()
在本例中,仅在2个不同节点上创建2个MPI进程。 因此,在初始化MPI之后,主函数将创建并启动一个python进程,然后将启动2MPI进程之间的通信。MPI_列组_1将每行发送2D数组(100行)的信息,MPI_列组_0将等待接收该信息

结果是:

[0] MultiProcess Stared
[0] Running Main Thread
[0] Main Thread is Main Thread  : True
[0] Rank 0 on genji271, Process PID for main = 227040
[0] Rank 0 on genji271, rc.thread_level = funneled
[1] MultiProcess Stared
[1] Running Main Thread
[1] Main Thread is Main Thread  : True
[1] Rank 1 on genji272, Process PID for main = 211028
[1] Rank 1 on genji272, rc.thread_level = funneled
[0] RunningWorker[0]
[1] RunningWorker
[0] Start MPI Transfert
[0] float64
[1] Start MPI Transfert
[1] Rank 1 : Send&Receive nb 0
[1] End Send&Receive 0
[1] Rank 1 : Send&Receive nb 1
[1] End Send&Receive 1
[1] Rank 1 : Send&Receive nb 2
[1] End Send&Receive 2
[1] Rank 1 : Send&Receive nb 3
[1] End Send&Receive 3
[1] Rank 1 : Send&Receive nb 4
[1] End Send&Receive 4
[1] Rank 1 : Send&Receive nb 5
[1] End Send&Receive 5[1]
[1] Rank 1 : Send&Receive nb 6[1]
[1] End Send&Receive 6
[1] Rank 1 : Send&Receive nb 7
[1] End Send&Receive 7[1]
[1] Rank 1 : Send&Receive nb 8
[1] End Send&Receive 8
[1] Rank 1 : Send&Receive nb 9
[1] End Send&Receive 9
[1] Rank 1 : Send&Receive nb 10
[1] End Send&Receive 10
[1] Rank 1 : Send&Receive nb 11
[1] End Send&Receive 11
[1] Rank 1 : Send&Receive nb 12
[1] End Send&Receive 12
[1] Rank 1 : Send&Receive nb 13
[1] End Send&Receive 13
[1] Rank 1 : Send&Receive nb 14[1]
[1] End Send&Receive 14
[1] Rank 1 : Send&Receive nb 15
[0] Rank 0 : Send&Receive nb 0
[0] Worker is Main Thread : True
[0] Rank 0 on genji271, Process PID for worker = 227046
[0] RunningWorker
[1] Worker is Main Thread : True
[1] Rank 1 on genji272, Process PID for worker = 211033
[1] RunningWorker
[0] Worker is Main Thread : True
[0] Rank 0 on genji271, Process PID for worker = 227046
[0] RunningWorker
[1] Worker is Main Thread : True
[1] Rank 1 on genji272, Process PID for worker = 211033
[1] RunningWorker
[0] Worker is Main Thread : True
[0] Rank 0 on genji271, Process PID for worker = 227046
...
如您所见,辅助线程和主线程都认为是调用MPI.Init()的线程。 此外,MPI通信在2个MPI进程之间停止(在没有python进程的情况下,或者在创建进程之后完成MPI.Init时,它可以完美地工作!!)。实际上,应该接收行的MPI_rank_0在第一次迭代中被卡住,并且从未接收到第一行

我(认为)理解python进程是主线程的一种克隆(或者至少在创建进程时共享/复制主线程的内存)。因此,MPI是否可能看不到主线程及其克隆之间的差异(如果它们具有不同的PID,则为事件!!)。或者也许我做错了什么


有人能帮我吗?我非常感谢,并可以与您分享有关我的问题的更多信息。

从文档中,似乎
多处理
确实启动了一个新进程(相对于新线程),这可能会混淆
Is\u thread\u main()。也就是说,如果您在
MPI_Init()
之后分叉一个新进程,那么可能会发生不好的事情,因此您可能不想在一开始就这样做。