Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/343.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用Pytork';s多处理与分布式_Python_Pytorch_Openmpi - Fatal编程技术网

Python 使用Pytork';s多处理与分布式

Python 使用Pytork';s多处理与分布式,python,pytorch,openmpi,Python,Pytorch,Openmpi,我试图在openmpi分布式后端中使用pytorch的多处理模块生成几个进程。我拥有的是以下代码: def run(rank_local, rank, world_size, maingp): print("I WAS SPAWNED ", rank_local, " OF ", rank) tensor = torch.zeros(1) tensor += 1 if rank == 0: tensor += 100 dist.

我试图在openmpi分布式后端中使用pytorch的多处理模块生成几个进程。我拥有的是以下代码:

def run(rank_local, rank, world_size, maingp):
    print("I WAS SPAWNED ", rank_local, " OF ", rank)

    tensor = torch.zeros(1)
    tensor += 1

    if rank == 0:
        tensor += 100
        dist.send(tensor, dst=1)
    else:
        print("I am spawn: ", rank, "and my tensor value before receive: ", tensor[0])
        dist.recv(tensor, src=0)
        print("I am spawn: ", rank, "and my tensor value after  receive: ", tensor[0])


if __name__ == '__main__':

    # Initialize Process Group
    dist.init_process_group(backend="mpi", group_name="main")
    maingp = None #torch.distributed.new_group([0,1])
    mp.set_start_method('spawn')    

    # get current process information
    world_size = dist.get_world_size()
    rank = dist.get_rank()

    # Establish Local Rank and set device on this node
    mp.spawn(run, args=(rank, world_size, maingp), nprocs=1)
我使用openmpi运行此代码,如下所示:

mpirun -n 2 python code.py
所以我的理解是,mpirun创建了两个级别为[0,1]的进程,每个进程都会生成本地级别为0的新进程。现在,如果我想在主进程的这两个子进程之间进行通信,我会得到一些回溯和以下错误:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/home/usama/anaconda3/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
    fn(i, *args)
  File "/home/usama/code/test/code.py", line 19, in run
    dist.send(tensor, dst=1)
  File "/home/usama/anaconda3/lib/python3.6/site-packages/torch/distributed/distributed_c10d.py", line 666, in send
    _check_default_pg()
  File "/home/usama/anaconda3/lib/python3.6/site-packages/torch/distributed/distributed_c10d.py", line 191, in _check_default_pg
    "Default process group is not initialized"
AssertionError: Default process group is not initialized
我的问题是如何使这些子进程能够进行通信,即[0,0]进程向[1,0]进程发送内容。有什么想法吗