Parallel processing mpi4py Gatherv面向键错误:';0';

Parallel processing mpi4py Gatherv面向键错误:';0';,parallel-processing,openmpi,mpi4py,Parallel Processing,Openmpi,Mpi4py,我是新来的。我编写代码是为了通过多处理器处理一个大的numpy数组数据。由于我无法提供输入文件,所以我提到了数据的形状。数据的形状为[3000000,15],它包含字符串类型的数据 from mpi4py import MPI import numpy as np import datetime as dt import math as math comm = MPI.COMM_WORLD numprocs = comm.size rank = comm.Get_rank() fname =

我是新来的。我编写代码是为了通过多处理器处理一个大的numpy数组
数据。由于我无法提供输入文件,所以我提到了
数据的形状。
数据的形状为[3000000,15],它包含字符串类型的数据

from mpi4py import MPI
import numpy as np
import datetime as dt
import math as math


comm = MPI.COMM_WORLD
numprocs = comm.size
rank = comm.Get_rank()
fname = "6.binetflow"
data = np.loadtxt(open(fname,"rb"), dtype=object, delimiter=",", skiprows=1)
X = data[:,[0,1,3,14,6,6,6,6,6,6,6,6]]
num_rows = math.ceil(len(X)/float(numprocs))
X = X.flatten()
sendCounts = list()
displacements = list()
for p in range(numprocs):
    if p == (numprocs-1): #for last processor
        sendCounts.append(int(len(X) - (p*num_rows*12)))
        displacements.append(int(p*num_rows*12))
        break
    sendCounts.append(int(num_rows*12))
    displacements.append(int(p*sendCounts[p]))
sendbuf = np.array(X[displacements[rank]: (displacements[rank]+sendCounts[rank])])

## Each processor will do some task on sendbuf

if rank == 0:
    recvbuf = np.empty(sum(sendCounts), dtype=object)
else:
    recvbuf = None

print("sendbuf: ",sendbuf)
comm.Gatherv(sendbuf=sendbuf, recvbuf=(recvbuf, sendCounts), root=0)
if rank == 0:
    print("Gathered array: {}".format(recvbuf))
但我面临以下错误:

Traceback (most recent call last):
  File "hello.py", line 36, in <module>
    comm.Gatherv(sendbuf=sendbuf, recvbuf=(recvbuf, sendCounts), root=0)
  File "MPI/Comm.pyx", line 602, in mpi4py.MPI.Comm.Gatherv (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:97993)
  File "MPI/msgbuffer.pxi", line 525, in mpi4py.MPI._p_msg_cco.for_gather (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:34678)
  File "MPI/msgbuffer.pxi", line 446, in mpi4py.MPI._p_msg_cco.for_cco_send (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:33938)
  File "MPI/msgbuffer.pxi", line 148, in mpi4py.MPI.message_simple (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:30349)
  File "MPI/msgbuffer.pxi", line 93, in mpi4py.MPI.message_basic (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:29448)
KeyError: 'O'
Traceback (most recent call last):
  File "hello.py", line 36, in <module>
    comm.Gatherv(sendbuf=sendbuf, recvbuf=(recvbuf, sendCounts), root=0)
  File "MPI/Comm.pyx", line 602, in mpi4py.MPI.Comm.Gatherv (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:97993)
  File "MPI/msgbuffer.pxi", line 525, in mpi4py.MPI._p_msg_cco.for_gather (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:34678)
  File "MPI/msgbuffer.pxi", line 446, in mpi4py.MPI._p_msg_cco.for_cco_send (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:33938)
  File "MPI/msgbuffer.pxi", line 148, in mpi4py.MPI.message_simple (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:30349)
  File "MPI/msgbuffer.pxi", line 93, in mpi4py.MPI.message_basic (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:29448)
KeyError: 'O'
Traceback (most recent call last):
  File "hello.py", line 36, in <module>
    comm.Gatherv(sendbuf=sendbuf, recvbuf=(recvbuf, sendCounts), root=0)
  File "MPI/Comm.pyx", line 602, in mpi4py.MPI.Comm.Gatherv (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:97993)
  File "MPI/msgbuffer.pxi", line 525, in mpi4py.MPI._p_msg_cco.for_gather (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:34678)
  File "MPI/msgbuffer.pxi", line 446, in mpi4py.MPI._p_msg_cco.for_cco_send (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:33938)
  File "MPI/msgbuffer.pxi", line 148, in mpi4py.MPI.message_simple (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:30349)
  File "MPI/msgbuffer.pxi", line 93, in mpi4py.MPI.message_basic (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:29448)
KeyError: 'O'
Traceback (most recent call last):
  File "hello.py", line 36, in <module>
    comm.Gatherv(sendbuf=sendbuf, recvbuf=(recvbuf, sendCounts), root=0)
  File "MPI/Comm.pyx", line 602, in mpi4py.MPI.Comm.Gatherv (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:97993)
  File "MPI/msgbuffer.pxi", line 516, in mpi4py.MPI._p_msg_cco.for_gather (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:34587)
  File "MPI/msgbuffer.pxi", line 466, in mpi4py.MPI._p_msg_cco.for_cco_recv (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:34097)
  File "MPI/msgbuffer.pxi", line 261, in mpi4py.MPI.message_vector (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:31977)
  File "MPI/msgbuffer.pxi", line 93, in mpi4py.MPI.message_basic (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:29448)
KeyError: 'O'
回溯(最近一次呼叫最后一次):
文件“hello.py”,第36行,在
comm.Gatherv(sendbuf=sendbuf,recvbuf=(recvbuf,sendCounts),root=0)
文件“MPI/Comm.pyx”,第602行,位于mpi4py.MPI.Comm.Gatherv中(d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:97993)
文件“MPI/msgbuffer.pxi”,第525行,mpi4py.MPI.\u p\u msg\u cco.for\u聚集(d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:34678)
文件“MPI/msgbuffer.pxi”,第446行,mpi4py.MPI.\u p\u msg\u cco.for\u cco\u send(d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:33938)
文件“MPI/msgbuffer.pxi”,第148行,在mpi4py.MPI.message_simple(d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:30349)中
mpi4py.MPI.message_basic(d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:29448)中第93行的文件“MPI/msgbuffer.pxi”
KeyError:'O'
回溯(最近一次呼叫最后一次):
文件“hello.py”,第36行,在
comm.Gatherv(sendbuf=sendbuf,recvbuf=(recvbuf,sendCounts),root=0)
文件“MPI/Comm.pyx”,第602行,位于mpi4py.MPI.Comm.Gatherv中(d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:97993)
文件“MPI/msgbuffer.pxi”,第525行,mpi4py.MPI.\u p\u msg\u cco.for\u聚集(d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:34678)
文件“MPI/msgbuffer.pxi”,第446行,mpi4py.MPI.\u p\u msg\u cco.for\u cco\u send(d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:33938)
文件“MPI/msgbuffer.pxi”,第148行,在mpi4py.MPI.message_simple(d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:30349)中
mpi4py.MPI.message_basic(d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:29448)中第93行的文件“MPI/msgbuffer.pxi”
KeyError:'O'
回溯(最近一次呼叫最后一次):
文件“hello.py”,第36行,在
comm.Gatherv(sendbuf=sendbuf,recvbuf=(recvbuf,sendCounts),root=0)
文件“MPI/Comm.pyx”,第602行,位于mpi4py.MPI.Comm.Gatherv中(d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:97993)
文件“MPI/msgbuffer.pxi”,第525行,mpi4py.MPI.\u p\u msg\u cco.for\u聚集(d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:34678)
文件“MPI/msgbuffer.pxi”,第446行,mpi4py.MPI.\u p\u msg\u cco.for\u cco\u send(d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:33938)
文件“MPI/msgbuffer.pxi”,第148行,在mpi4py.MPI.message_simple(d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:30349)中
mpi4py.MPI.message_basic(d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:29448)中第93行的文件“MPI/msgbuffer.pxi”
KeyError:'O'
回溯(最近一次呼叫最后一次):
文件“hello.py”,第36行,在
comm.Gatherv(sendbuf=sendbuf,recvbuf=(recvbuf,sendCounts),root=0)
文件“MPI/Comm.pyx”,第602行,位于mpi4py.MPI.Comm.Gatherv中(d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:97993)
文件“MPI/msgbuffer.pxi”,第516行,在mpi4py.MPI.\u p\u msg\u cco.for\u聚集中(d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:34587)
文件“MPI/msgbuffer.pxi”,第466行,在mpi4py.MPI.\u p\u msg\u cco.for\u cco\u recv中(d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:34097)
文件“MPI/msgbuffer.pxi”,第261行,在mpi4py.MPI.message_向量中(d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:31977)
mpi4py.MPI.message_basic(d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:29448)中第93行的文件“MPI/msgbuffer.pxi”
KeyError:'O'
任何帮助都将不胜感激。我陷入这个问题很久了


谢谢

问题是
dtype=object

Mpi4py提供两种通信功能,一种是名称以大写字母开头的功能,例如
散点
,另一种是名称以小写字母开头的功能,例如
散点

在MPI for Python中,Comm实例的Bcast()、Scatter()、Gather()、Allgather()和Alltoall()方法支持内存缓冲区的集体通信。变量bcast()、scatter()、gather()、allgather()和alltoall()可以与通用Python对象通信

由此不清楚的是,尽管numpy数组应该公开内存缓冲区,但缓冲区显然需要是一小部分原始数据类型中的一种,并且肯定不能用于泛型对象。比较以下两段代码:

from mpi4py import MPI
import numpy

Comm = MPI.COMM_WORLD
Size = Comm.Get_size()
Rank = Comm.Get_rank()

if Rank == 0:
    Data = numpy.empty(Size, dtype=object)
else:
    Data = None

Data = Comm.scatter(Data, 0) # I work fine!

print("Data on rank %d: " % Rank, Data)

不幸的是,Mpi4py没有提供scatterv
。从文档中的同一位置:

还支持向量变量(可以向每个进程传递不同数量的数据)Scatterv()、Gatherv()、Allgatherv()和Alltoallv(),它们只能传递暴露内存缓冲区的对象

对于数据类型,大写与小写规则也不例外:

from mpi4py import MPI
import numpy

Comm = MPI.COMM_WORLD
Size = Comm.Get_size()
Rank = Comm.Get_rank()

if Rank == 0:
    Data = numpy.empty(2*Size+1, dtype=numpy.dtype('float64'))
else:
    Data = None

if Rank == 0:
    Datb = numpy.empty(3, dtype=numpy.dtype('float64'))
else:
    Datb = numpy.empty(2, dtype=numpy.dtype('float64'))

Comm.Scatterv(Data, Datb, 0) # I work fine!

print("Datb on rank %d: " % Rank, Datb)

from mpi4py import MPI
import numpy

Comm = MPI.COMM_WORLD
Size = Comm.Get_size()
Rank = Comm.Get_rank()

if Rank == 0:
    Data = numpy.empty(2*Size+1, dtype=object)
else:
    Data = None

if Rank == 0:
    Datb = numpy.empty(3, dtype=object)
else:
    Datb = numpy.empty(2, dtype=object)

Comm.Scatterv(Data, Datb, 0) # I throw KeyError!

print("Datb on rank %d: " % Rank, Datb)
不幸的是,您需要编写代码,以便它可以使用
分散
,每个进程都需要相同的
发送计数
,或者更原始的点对点通信函数,或者使用Mpi4py以外的一些并行工具

使用Mpi4py 2.0.0,编写本文时的当前稳定版本

from mpi4py import MPI
import numpy

Comm = MPI.COMM_WORLD
Size = Comm.Get_size()
Rank = Comm.Get_rank()

if Rank == 0:
    Data = numpy.empty(2*Size+1, dtype=object)
else:
    Data = None

if Rank == 0:
    Datb = numpy.empty(3, dtype=object)
else:
    Datb = numpy.empty(2, dtype=object)

Comm.Scatterv(Data, Datb, 0) # I throw KeyError!

print("Datb on rank %d: " % Rank, Datb)