Parallel processing 程序被卡住了_Parallel Processing_Mpi_Openmpi

Parallel processing 程序被卡住了

parallel-processing mpi

Parallel processing 程序被卡住了,parallel-processing,mpi,openmpi,Parallel Processing,Mpi,Openmpi,我正在尝试发送实现主从模式，在这种模式中，主机有一个数组作为作业队列，并将数据发送到从处理器。根据从主服务器获得的数据，从属服务器计算结果并将答案返回给主服务器。在接收结果时，找出接收msg的从属列组，然后将下一个作业发送给该从属列组这是我已经实现的代码框架： if (my_rank != 0) { MPI_Recv(&seed, 1, MPI_FLOAT, 0, tag, MPI_COMM_WORLD, &status

我正在尝试发送实现主从模式，在这种模式中，主机有一个数组作为作业队列，并将数据发送到从处理器。根据从主服务器获得的数据，从属服务器计算结果并将答案返回给主服务器。在接收结果时，找出接收msg的从属列组，然后将下一个作业发送给该从属列组

这是我已经实现的代码框架：

        if (my_rank != 0) 
        {
            MPI_Recv(&seed, 1, MPI_FLOAT, 0, tag, MPI_COMM_WORLD, &status);

                    //.. some processing 

            MPI_Send(&message, 100, MPI_FLOAT, 0, my_rank, MPI_COMM_WORLD);
        } 
        else 
        {
            for (i = 1; i < p; i++) {
                MPI_Send(&A[i], 1, MPI_FLOAT, i, tag, MPI_COMM_WORLD);
            }

            for (i = p; i <= S; i++) {
                MPI_Recv(&buf, 100, MPI_FLOAT, MPI_ANY_SOURCE, MPI_ANY_TAG,
                        MPI_COMM_WORLD, &status);
                //.. processing to find out free slave rank from which above msg was received (y)
                MPI_Send(&A[i], 1, MPI_FLOAT, y, tag, MPI_COMM_WORLD);
            }

            for (i = 1; i < p; i++) {
                MPI_Recv(&buf, 100, MPI_FLOAT, MPI_ANY_SOURCE, MPI_ANY_TAG,MPI_COMM_WORLD, &status);

                // .. more processing 
            }

        }

如果我使用的是4处理器；1是主人，3是奴隶；程序发送和接收作业队列中前3个作业的消息，但之后程序挂起。有什么问题吗

如果这是基于MPI的代码的全部，那么看起来您在客户端代码的外部缺少一个while循环。我以前做过这件事，我通常会以监工和员工的身份把它分解

在taskMaster中：

 for (int i = 0; i < commSize; ++i){
    if (i == commRank){ // commRank doesn't have to be 0
        continue;
    }

    if (taskNum < taskCount){
        // tasks is vector<Task>, where I have crated a Task 
        // class and send it as a stream of bytes
        toSend = tasks.at(taskNum);  
        jobList.at(i) = taskNum;  // so we no which rank has which task
        taskNum += 1;
        activePeons += 1;
    } else{
        // stopTask is a flag value to stop receiving peon
        toSend = stopTask;
        allTasksDistributed = true;
    }

    // send the task, with the size of the task as the tag
    taskSize = sizeof(toSend);
    MPI_Send(&toSend, taskSize, MPI_CHAR, i, taskSize, MPI_COMM_WORLD);
}   

MPI_Status status;

while (activePeons > 0){ 
    // get the results from a peon (but figure out who it is coming from and what the size is)
    MPI_Probe(MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
    MPI_Recv(   &toSend,                    // receive the incoming task (with result data)
                status.MPI_TAG,             // Tag holds number of bytes
                MPI_CHAR,                   // type, may need to be more robust later
                status.MPI_SOURCE,          // source of send
                MPI_ANY_TAG,                // tag
                MPI_COMM_WORLD,             // COMM
                &status);                   // status

    // put the result from that task into the results vector
    results[jobList[status.MPI_SOURCE]] = toSend.getResult();

    // if there are more tasks to send, distribute the next one
    if (taskNum < taskCount ){
        toSend = tasks.at(taskNum);
        jobList[status.MPI_SOURCE] = taskNum;
        taskNum += 1;
    } else{ // otherwise send the stop task and decrement activePeons
        toSend = stopTask;
        activePeons -= 1;
    }

    // send the task, with the size of the task as the tag
    taskSize = sizeof(toSend);
    MPI_Send(&toSend, taskSize, MPI_CHAR, status.MPI_SOURCE, taskSize, MPI_COMM_WORLD);
}

有一些bool和int值必须分配，正如我所说，我有一个任务类，但这为我认为您想要做的事情提供了基本结构。

听起来好像其中一个进程在发送响应之前就已经死亡了。找出哪个进程没有向主进程发送响应。一些调试代码在这里可能会有所帮助。这是非常不完整的。^这是我正在执行发送和接收的唯一MPI代码。其他事情对我来说似乎很正常。我可能在这里吹毛求疵，但当存在MPI_GET_COUNT时，为什么要将数据大小作为标记发送，这可以用于获取给定MPI_Status对象的偷看消息中的元素数？老实说，我以前从未听说过该函数。我可能仍然更愿意使用status.count，而不是使用我已有的状态进行函数调用，但这对我来说非常有效。如果这样做，您必须记住MPI标准要求最大标记值至少为32767。实现可以自由地提供更大的标记空间，大多数实现都可以，但是如果您隐式地依赖于这样的假设，即这总是正确的，并将标记设置为大于32767的值（例如，您的消息很长），那么您的MPI程序将无法移植。

while (running){
    MPI_Probe(MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);    // tag holds data size

    incoming = (Task *) malloc(status.MPI_TAG);

    MPI_Recv(   incoming,           // memory location of input
                status.MPI_TAG,     // tag holds data size
                MPI_CHAR,           // type of data
                status.MPI_SOURCE,  // source is from distributor
                MPI_ANY_TAG,        // tag
                MPI_COMM_WORLD,     // comm
                &status);           // status

    task = Task(*incoming);

    if (task.getFlags() == STOP_FLAG){
        running = false;
        continue;
    }

    task.run();   // my task class has a "run" method
    MPI_Send(   &task,                  // string to send back
                status.MPI_TAG,         // size in = size out
                MPI_CHAR,               // data type
                status.MPI_SOURCE,      // destination
                status.MPI_TAG,         // tag doesn't matter
                MPI_COMM_WORLD);        // comm

    free(incoming);
}