C++ 带有用户定义函数和MPI_底部的Allreduce_C++_Mpi

C++ 带有用户定义函数和MPI_底部的Allreduce

c++ mpi

C++ 带有用户定义函数和MPI_底部的Allreduce,c++,mpi,C++,Mpi,考虑一下下面的程序，它应该执行一些愚蠢的添加doubles的操作： #include <iostream> #include <vector> #include <mpi.h> void add(void* invec, void* inoutvec, int* len, MPI_Datatype*) { double* a = reinterpret_cast <double*> (inoutvec); double* b =

考虑一下下面的程序，它应该执行一些愚蠢的添加

double

s的操作：

#include <iostream>
#include <vector>

#include <mpi.h>

void add(void* invec, void* inoutvec, int* len, MPI_Datatype*)
{
    double* a = reinterpret_cast <double*> (inoutvec);
    double* b = reinterpret_cast <double*> (invec);

    for (int i = 0; i != *len; ++i)
    {
        a[i] += b[i];
    }
}

int main(int argc, char* argv[])
{
    MPI_Init(&argc, &argv);

    std::vector<double> buffer = { 2.0, 3.0 };

    MPI_Op operation;
    MPI_Op_create(add, 1, &operation);

    MPI_Datatype types[1];
    MPI_Aint addresses[1];
    int lengths[1];
    int count = 1;

    MPI_Get_address(buffer.data(), &addresses[0]);
    lengths[0] = buffer.size();
    types[0] = MPI_DOUBLE;

    MPI_Datatype type;
    MPI_Type_create_struct(count, lengths, addresses, types, &type);
    MPI_Type_commit(&type);

    MPI_Allreduce(MPI_IN_PLACE, MPI_BOTTOM, 1, type, operation, MPI_COMM_WORLD);

    MPI_Type_free(&type);
    MPI_Op_free(&operation);
    MPI_Finalize();

    std::cout << buffer[0] << " " << buffer[1] << "\n";
}

第39行是

MPI\u Allreduce

调用。这可能是一个愚蠢的错误，但在盯着它看了几个小时后，我仍然看不到它。有人发现错误了吗？谢谢

编辑：OpenMPI在执行就地reduce to all时如何处理具有非零下限的类型（例如使用绝对地址时创建的类型）存在缺陷。它似乎存在于所有版本中，包括开发分支。可通过以下方式跟踪状态

您的

add

运算符错误，因为您没有考虑数据类型的下限。合适的解决方案如下：

void add(void* invec, void* inoutvec, int* len, MPI_Datatype* datatype)
{
    MPI_Aint lb, extent;
    MPI_Type_get_true_extent(*datatype, &lb, &extent);

    double* a = reinterpret_cast <double*> (reinterpret_cast <char*>(inoutvec) + lb);
    double* b = reinterpret_cast <double*> (reinterpret_cast <char*>(invec) + lb);

    for (int i = 0; i != *len; ++i)
    {
        a[i] += b[i];
    }
}

我不确定该标准是否允许使用

MPI\u BOTTOM

。它崩溃（OpenMPI 1.10.2也证实了这一点）很可能是OpenMPI中的一个bug。它的MPI\u Allreduce实现分配一个临时缓冲区，然后尝试将接收缓冲区的内容复制到其中，假设两个缓冲区使用相同的数据类型。这根本不适用于具有绝对地址的类型，因为临时缓冲区的地址根本不接近零（在开放MPI中，

MPI\u BOTTOM==NULL

）。发布到。我刚刚尝试使用MPICH 3.0.4，它似乎可以工作。但是我的

add

显然是错误的，

inoutvec

是

NULL

，它可能应该是（

MPI\u BOTTOM

），并且

invec

的地址稍高。您的

add

是错误的，因为您无法正确处理数据类型

inoutvec

为

NULL

，因为在具有连续（非分段）地址空间的系统上，这就是

MPI\u BOTTOM

的含义

invec

稍高，因为MPICH根据数据类型的下限正确调整临时缓冲区的地址。但是您正在处理

add

的参数，就好像两个向量都是

MPI\u DOUBLE

，无法计算数据类型的下限。请注意，所有预定义的MPI数据类型，包括

MPI\u DOUBLE

，其下限都等于零，这意味着缓冲区指针直接指向数据的开头。您的

类型的情况并非如此，它的下限等于缓冲区[0]
。当您将该下限添加到MPI\u BOTTOM
（NULL）的值时，您将获得缓冲区[0]的地址。您是否报告了打开MPI的错误，或者我是否应该这样做？这肯定适用于MPICH 3.0.4。您还可以为OpenMPI添加错误报告吗？完成。我已经编辑了答案，并添加了一个问题跟踪链接。太好了！谢谢！
void add(void* invec, void* inoutvec, int* len, MPI_Datatype* datatype)
{
    MPI_Aint lb, extent;
    MPI_Type_get_true_extent(*datatype, &lb, &extent);

    double* a = reinterpret_cast <double*> (reinterpret_cast <char*>(inoutvec) + lb);
    double* b = reinterpret_cast <double*> (reinterpret_cast <char*>(invec) + lb);

    for (int i = 0; i != *len; ++i)
    {
        a[i] += b[i];
    }
}

for (int i = 0; i < 2*(*len); i++)
{
    a[i] += b[i];
}