C 如何在MPI中发送每个处理器的最后一个元素数组

C 如何在MPI中发送每个处理器的最后一个元素数组,c,performance,parallel-processing,mpi,hpc,C,Performance,Parallel Processing,Mpi,Hpc,我很难编写代码,使其像以下示例一样执行,类似于前缀扫描中的Up Phase部分,并且不想使用函数MPI\u scan: WholeArray[16] = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15] Processor 0 got [0 , 1 , 2 , 3] , Processor 1 got [4 , 5 , 6 , 7] Processor 2 got [8 , 9 , 10 , 11] , Processor 3 got [12 , 13 , 1

我很难编写代码,使其像以下示例一样执行,类似于前缀扫描中的Up Phase部分,并且不想使用函数
MPI\u scan

WholeArray[16] = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]

Processor 0 got [0 , 1 , 2 , 3] , Processor 1 got [4 , 5 , 6 , 7] 

Processor 2 got [8 , 9 , 10 , 11] , Processor 3 got [12 , 13 , 14 , 15] 
要发送最后一个数组并将其加总为2步,请执行以下操作:

(stride 1)

Processor 0 send Array[3] , Processor 1 receive from Processor 0 and add to Array[3]

Processor 2 send Array[3], Processor 3 receive from Processor 2 and add to Array[3] 

(stride 2)

Processor 1 sends Array[3], Processor 3 receive from Processor 1 and add to Array[3]
最后,我想使用
MPI\u Gather
让结果为:

WholeArray = [0 , 1 , 2 , 3 , 4 , 5 , 6 ,10 , 8 , 9 , 10 , 11 , 12 , 13 ,14 , 36]
我发现很难编写代码让程序执行以下4nodes示例:

(1st stride) - Processor 0 send to Processor 1 and Processor 1 receive from Processor 0
(1st stride) - Processor 2 send to Processor 3 and Processor 3 receive from Processor 2

(2nd stride) - Processor 1 send to Processor 3 and Processor 3 receive from Processor 1
以下是我到目前为止编写的代码:

int Send_Receive(int* my_input, int size_per_process, int rank, int size)
{

    int key = 1;
    int temp = my_input[size_per_process-1];

    while(key <= size/2)
{
    if((rank+1) % key == 0)
      {
        if(rank/key % 2 == 0)
        {
            MPI_Send(&temp, 1, MPI_INT, rank+key,0,MPI_COMM_WORLD);
        }
        else
        {
            MPI_Recv(&temp, 1, MPI_INT, rank-key,0,MPI_COMM_WORLD,MPI_STATUS_IGNORE);
            my_input[size_per_process]+= temp;
        }
        key = 2 * key;
        MPI_Barrier(MPI_COMM_WORLD);
      }
}

return (*my_input);

}
int-Send\u-Receive(int*my\u-input,int-size\u-per\u-process,int-rank,int-size)
{
int键=1;
int temp=我的输入[每个进程的大小-1];

虽然(key代码中存在一些问题,即1)它始终在进程之间发送相同的
temp
变量

MPI_Send(&temp, 1, MPI_INT, rank+key,0,MPI_COMM_WORLD);
temp
变量在循环之前初始化:

 int temp = my_input[size_per_process-1];
 while(key <= size/2)
 { ...}
此外,2)以下声明

my_input[size_per_process]+= temp;
不会将
temp
添加到数组的最后一个位置
my_input
。相反,它应该是:

my_input[size_per_process-1]+= temp;
最后,3)存在死锁和无限循环问题。对于初学者来说,在单个条件中调用集体通信例程(如
MPI\u barrier
)通常是一个大的危险信号。而不是:

while(key <= size/2)
{
   if((rank+1) % key == 0){
       ...
       MPI_Barrier(MPI_COMM_WORLD);
   }
}
输入:

[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
[0,1,2,3,4,5,6,10,8,9,10,11,12,13,14,36]
输出:

[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
[0,1,2,3,4,5,6,10,8,9,10,11,12,13,14,36]
[0,1,2,3,4,5,6,10,8,9,10,11,12,13,14,36]