C++ OpenMP：如何在并行块中正确嵌套MASTER和FOR？_C++_C_Multithreading_Openmp_Openmpi

C++ OpenMP：如何在并行块中正确嵌套MASTER和FOR？

c++ c multithreading

C++ OpenMP：如何在并行块中正确嵌套MASTER和FOR？,c++,c,multithreading,openmp,openmpi,C++,C,Multithreading,Openmp,Openmpi,我正在使用OpenMP和OpenMPI开发一个程序对于在初始节点上运行的进程，我希望有一个线程作为调度器（与其他节点交互），其他线程执行计算代码结构如下所示： int computation(...) { #pragma parallel for ..... } int main(...) { ... if (mpi_rank == 0) // initial node { #pragma omp parallel {

我正在使用OpenMP和OpenMPI开发一个程序

对于在初始节点上运行的进程，我希望有一个线程作为调度器（与其他节点交互），其他线程执行计算

代码结构如下所示：

int computation(...)
{
    #pragma parallel for .....
}

int main(...)
{
    ...
    if (mpi_rank == 0) // initial node
    {
        #pragma omp parallel
        {
            #pragma omp master
            {
                // task scheduling for other nodes
            }
            {
                // WRONG: said 4 threads in total, this block will be executed for
                // 3 times simultaneously, and the nested "for" in the function
                // will spawn 4 threads each as well
                // so ACTUALLY 3*4+1=13 threads here!
                int computation(...);
            }
        }
    }
    else // other nodes
    {
        // get a task from node 0 scheduler by MPI
        int computation(...);
    }
}

我想要的是，在初始节点中，调度器占用一个线程，同时只执行一个计算函数，因此最多只能同时使用4个线程

我还尝试：

int computation(...)
{
    register int thread_use = omp_get_max_threads();    // this is 4
    if (rank == 0)
    {
        --thread_use;   // if initial node, use 3
    }
    #pragma parallel for ..... num_threads(thread_use)
}

int main(...)
{
    ...
    if (mpi_rank == 0) // initial node
    {
        #pragma omp parallel
        {
            #pragma omp master
            {
                // task scheduling for other nodes
            }
            #pragma omp single
            {
                // WRONG: nest "for" can only use 1 thread
                int computation(...);
            }
        }
    }
    else // other nodes
    {
        // get a task from node 0 scheduler by MPI
        int computation(...);
    }
}

…或

……但没有一个成功

我应该如何使用OpenMP安排块以实现我的目标？非常感谢您的帮助。

首先，如果您想在OpenMP中指定嵌套并行性，需要将环境变量

OMP\u nested

设置为

true

然后，可能的实现可能如下所示：

// Parallel region. Topmost level
#pragma omp parallel sections num_threads(2)
{
    #pragma omp section
    scheduling_function();

    #pragma omp section
    compute_function();
}

其中，

scheduling\u function（）

是一个单线程函数，

compute\u function（）

结构类似于：

void compute_function() {
    // Nested parallel region. Bottommost level
    #pragma omp parallel
    {
        computation();
    }
}

更多关于

的信息太棒了！非常感谢。考虑到您还可以在

OMP\u NUM\u threads

环境变量中使用逗号分隔的值设置每个嵌套级别中使用的线程数。

void compute_function() {
    // Nested parallel region. Bottommost level
    #pragma omp parallel
    {
        computation();
    }
}