C 如何在MPI中发送每个处理器的最后一个元素数组
我很难编写代码,使其像以下示例一样执行,类似于前缀扫描中的Up Phase部分,并且不想使用函数C 如何在MPI中发送每个处理器的最后一个元素数组,c,performance,parallel-processing,mpi,hpc,C,Performance,Parallel Processing,Mpi,Hpc,我很难编写代码,使其像以下示例一样执行,类似于前缀扫描中的Up Phase部分,并且不想使用函数MPI\u scan: WholeArray[16] = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15] Processor 0 got [0 , 1 , 2 , 3] , Processor 1 got [4 , 5 , 6 , 7] Processor 2 got [8 , 9 , 10 , 11] , Processor 3 got [12 , 13 , 1
MPI\u scan
:
WholeArray[16] = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
Processor 0 got [0 , 1 , 2 , 3] , Processor 1 got [4 , 5 , 6 , 7]
Processor 2 got [8 , 9 , 10 , 11] , Processor 3 got [12 , 13 , 14 , 15]
要发送最后一个数组并将其加总为2步,请执行以下操作:
(stride 1)
Processor 0 send Array[3] , Processor 1 receive from Processor 0 and add to Array[3]
Processor 2 send Array[3], Processor 3 receive from Processor 2 and add to Array[3]
(stride 2)
Processor 1 sends Array[3], Processor 3 receive from Processor 1 and add to Array[3]
最后,我想使用MPI\u Gather
让结果为:
WholeArray = [0 , 1 , 2 , 3 , 4 , 5 , 6 ,10 , 8 , 9 , 10 , 11 , 12 , 13 ,14 , 36]
我发现很难编写代码让程序执行以下4nodes示例:
(1st stride) - Processor 0 send to Processor 1 and Processor 1 receive from Processor 0
(1st stride) - Processor 2 send to Processor 3 and Processor 3 receive from Processor 2
(2nd stride) - Processor 1 send to Processor 3 and Processor 3 receive from Processor 1
以下是我到目前为止编写的代码:
int Send_Receive(int* my_input, int size_per_process, int rank, int size)
{
int key = 1;
int temp = my_input[size_per_process-1];
while(key <= size/2)
{
if((rank+1) % key == 0)
{
if(rank/key % 2 == 0)
{
MPI_Send(&temp, 1, MPI_INT, rank+key,0,MPI_COMM_WORLD);
}
else
{
MPI_Recv(&temp, 1, MPI_INT, rank-key,0,MPI_COMM_WORLD,MPI_STATUS_IGNORE);
my_input[size_per_process]+= temp;
}
key = 2 * key;
MPI_Barrier(MPI_COMM_WORLD);
}
}
return (*my_input);
}
int-Send\u-Receive(int*my\u-input,int-size\u-per\u-process,int-rank,int-size)
{
int键=1;
int temp=我的输入[每个进程的大小-1];
虽然(key代码中存在一些问题,即1)它始终在进程之间发送相同的temp
变量
MPI_Send(&temp, 1, MPI_INT, rank+key,0,MPI_COMM_WORLD);
temp
变量在循环之前初始化:
int temp = my_input[size_per_process-1];
while(key <= size/2)
{ ...}
此外,2)以下声明
my_input[size_per_process]+= temp;
不会将temp
添加到数组的最后一个位置my_input
。相反,它应该是:
my_input[size_per_process-1]+= temp;
最后,3)存在死锁和无限循环问题。对于初学者来说,在单个条件中调用集体通信例程(如MPI\u barrier
)通常是一个大的危险信号。而不是:
while(key <= size/2)
{
if((rank+1) % key == 0){
...
MPI_Barrier(MPI_COMM_WORLD);
}
}
输入:
[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
[0,1,2,3,4,5,6,10,8,9,10,11,12,13,14,36]
输出:
[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
[0,1,2,3,4,5,6,10,8,9,10,11,12,13,14,36]
[0,1,2,3,4,5,6,10,8,9,10,11,12,13,14,36]