C 如何在MPI中发送每个处理器的最后一个元素数组_C_Performance_Parallel Processing_Mpi_Hpc

C 如何在MPI中发送每个处理器的最后一个元素数组

c performance parallel-processing mpi

C 如何在MPI中发送每个处理器的最后一个元素数组,c,performance,parallel-processing,mpi,hpc,C,Performance,Parallel Processing,Mpi,Hpc,我很难编写代码，使其像以下示例一样执行，类似于前缀扫描中的Up Phase部分，并且不想使用函数MPI\u scan： WholeArray[16] = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15] Processor 0 got [0 , 1 , 2 , 3] , Processor 1 got [4 , 5 , 6 , 7] Processor 2 got [8 , 9 , 10 , 11] , Processor 3 got [12 , 13 , 1

我很难编写代码，使其像以下示例一样执行，类似于前缀扫描中的Up Phase部分，并且不想使用函数

MPI\u scan

：

WholeArray[16] = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]

Processor 0 got [0 , 1 , 2 , 3] , Processor 1 got [4 , 5 , 6 , 7] 

Processor 2 got [8 , 9 , 10 , 11] , Processor 3 got [12 , 13 , 14 , 15]

要发送最后一个数组并将其加总为2步，请执行以下操作：

(stride 1)

Processor 0 send Array[3] , Processor 1 receive from Processor 0 and add to Array[3]

Processor 2 send Array[3], Processor 3 receive from Processor 2 and add to Array[3] 

(stride 2)

Processor 1 sends Array[3], Processor 3 receive from Processor 1 and add to Array[3]

最后，我想使用

MPI\u Gather

让结果为：

WholeArray = [0 , 1 , 2 , 3 , 4 , 5 , 6 ,10 , 8 , 9 , 10 , 11 , 12 , 13 ,14 , 36]

我发现很难编写代码让程序执行以下4nodes示例：

(1st stride) - Processor 0 send to Processor 1 and Processor 1 receive from Processor 0
(1st stride) - Processor 2 send to Processor 3 and Processor 3 receive from Processor 2

(2nd stride) - Processor 1 send to Processor 3 and Processor 3 receive from Processor 1

以下是我到目前为止编写的代码：

int Send_Receive(int* my_input, int size_per_process, int rank, int size)
{

    int key = 1;
    int temp = my_input[size_per_process-1];

    while(key <= size/2)
{
    if((rank+1) % key == 0)
      {
        if(rank/key % 2 == 0)
        {
            MPI_Send(&temp, 1, MPI_INT, rank+key,0,MPI_COMM_WORLD);
        }
        else
        {
            MPI_Recv(&temp, 1, MPI_INT, rank-key,0,MPI_COMM_WORLD,MPI_STATUS_IGNORE);
            my_input[size_per_process]+= temp;
        }
        key = 2 * key;
        MPI_Barrier(MPI_COMM_WORLD);
      }
}

return (*my_input);

}

int-Send\u-Receive（int*my\u-input，int-size\u-per\u-process，int-rank，int-size）
{
int键=1；
int temp=我的输入[每个进程的大小-1]；
虽然（key代码中存在一些问题，即1）它始终在进程之间发送相同的temp
变量
MPI_Send(&temp, 1, MPI_INT, rank+key,0,MPI_COMM_WORLD);

temp
变量在循环之前初始化：
 int temp = my_input[size_per_process-1];
 while(key <= size/2)
 { ...}

此外，2）以下声明
my_input[size_per_process]+= temp;

不会将temp
添加到数组的最后一个位置my_input
。相反，它应该是：
my_input[size_per_process-1]+= temp;

最后，3）存在死锁和无限循环问题。对于初学者来说，在单个条件中调用集体通信例程（如MPI\u barrier
）通常是一个大的危险信号。而不是：
while(key <= size/2)
{
   if((rank+1) % key == 0){
       ...
       MPI_Barrier(MPI_COMM_WORLD);
   }
}

输入：
[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]

[0,1,2,3,4,5,6,10,8,9,10,11,12,13,14,36]

输出：
[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]

[0,1,2,3,4,5,6,10,8,9,10,11,12,13,14,36]

[0,1,2,3,4,5,6,10,8,9,10,11,12,13,14,36]