C++ 并行区域循环的OpenMP迭代_C++_C_Parallel Processing_Openmp

C++ 并行区域循环的OpenMP迭代

c++ c parallel-processing

C++ 并行区域循环的OpenMP迭代,c++,c,parallel-processing,openmp,C++,C,Parallel Processing,Openmp,对不起，如果标题是一个大的不清楚。我不太懂这个词我想知道我是否有办法做到以下几点： #pragma omp parallel { for (int i = 0; i < iterations; i++) { #pragma omp for for (int j = 0; j < N; j++) // Do something } } 如果有人能给我举一个使用OpenMP并行化的大型应用程序的例子，那就太好了，

对不起，如果标题是一个大的不清楚。我不太懂这个词

我想知道我是否有办法做到以下几点：

#pragma omp parallel
{
    for (int i = 0; i < iterations; i++) {
        #pragma omp for
        for (int j = 0; j < N; j++)
            // Do something
    }
}

如果有人能给我举一个使用OpenMP并行化的大型应用程序的例子，那就太好了，这样我就可以更好地理解使用OpenMP时要采用的策略。我好像找不到

澄清：我正在寻找不改变循环顺序或不涉及阻塞、缓存和一般性能考虑的解决方案。我想了解如何在OpenMP中按照指定的循环结构实现这一点。

//Do something

可能有依赖关系，也可能没有依赖关系，假设它们有依赖关系，并且您无法移动东西。

我不确定是否可以回答您的问题。我现在只使用OpenMP几个月了，但当我试图回答这样的问题时，我会做一些hello world printf测试，如下所示。我想这可能有助于回答你的问题。还可以尝试使用nowait的pragma omp，看看会发生什么

只要确保在“//做某事”和“//做其他事情”时，不会写入相同的内存地址并创建竞争条件。此外，如果您正在进行大量的读写操作，则需要考虑如何有效地使用缓存

#include "stdio.h"
#include <omp.h>
void loop(const int iterations, const int N) {
    #pragma omp parallel
    {
        int start_thread = omp_get_thread_num();
        printf("start thread %d\n", start_thread);
        for (int i = 0; i < iterations; i++) {
            printf("\titeration %d, thread num %d\n", i, omp_get_thread_num());
            #pragma omp for
            for (int j = 0; j < N; j++) {
                printf("\t\t inner loop %d, thread num %d\n", j, omp_get_thread_num());
            }
        }
    }
}

int main() {
    loop(2,30);
}

#包括“stdio.h”
#包括
void循环（常量int迭代，常量int N）{
#pragma-omp并行
{
int start_thread=omp_get_thread_num（）；
printf（“开始线程%d\n”，开始线程）；
对于（int i=0；i


在性能方面，你可能想考虑像这样融合你的循环。
#pragma omp for
for(int n=0; n<iterations*N; n++) {
    int i = n/N;
    int j = n%N;    
    //do something as function of index i and j
}

#pragma omp for
对于（int n=0；n很难回答，因为它实际上取决于代码内部的依赖关系。但解决这一问题的一般方法是反转循环的嵌套，如下所示：
#pragma omp parallel
{
    #pragma omp for
    for (int j = 0; j < N; j++) {
        for (int i = 0; i < iterations; i++) {
            // Do something
        }
    }
}

#pragma omp并行
{
#pragma omp for
对于（int j=0；j

当然，这是可能的，也不可能的，这取决于循环中的代码。
我认为处理两个for循环的方式是正确的，因为它实现了您想要的行为：外部循环不是并行的，而内部循环是并行的
为了更好地说明发生了什么，我将尝试在代码中添加一些注释：
#pragma omp parallel
{
  // Here you have a certain number of threads, let's say M
  for (int i = 0; i < iterations; i++) {
        // Each thread enters this region and executes all the iterations 
        // from i = 0 to i < iterations. Note that i is a private variable.
        #pragma omp for
        for (int j = 0; j < N; j++) {
            // What happens here is shared among threads so,
            // according to the scheduling you choose, each thread
            // will execute a particular portion of your N iterations
        } // IMPLICIT BARRIER             
  }
}

#pragma omp并行
{
//这里有一定数量的线程，比如说M
对于（int i=0；i

隐式屏障是线程相互等待的同步点。因此，作为一般经验法则，最好将外部循环而不是内部循环并行化，因为这将为迭代*N
迭代创建单点同步（而不是上面创建的迭代
点）.
也许你可以举个例子说明你想做什么。我的意思是填写代码//dosomething@raxman，这不会有什么帮助。这是一个针对此类问题的通用解决方案的请求，而不是针对特定应用程序的解决方案。你可以继续投票/接受一些答案。似乎人们付出了一些努力，却得到了相当少的回报pvotes。外循环应该指定某些算法的多个过程，因此不能并行化。如果我不清楚，很抱歉。外循环没有并行化，因为如果运行“hello world printf”，ITI上没有工作共享指令使用我建议的代码进行测试，它显示了所有这些。您可以看到，如果您添加nowait标签，则屏障被移除。换句话说，如果没有nowait，则外部循环不会并行化，并且使用nowait，@raxman，外部循环永远不会并行化。使用nowait子句，您可以移除同步点，仅此而已。好的，但这可能是术语，因为无论哪种情况，外循环都在不同的线程上运行。它在单个线程中运行的唯一方法（未省略）是将omp pragma移动到外循环内。
#pragma omp parallel
{
  // Here you have a certain number of threads, let's say M
  for (int i = 0; i < iterations; i++) {
        // Each thread enters this region and executes all the iterations 
        // from i = 0 to i < iterations. Note that i is a private variable.
        #pragma omp for
        for (int j = 0; j < N; j++) {
            // What happens here is shared among threads so,
            // according to the scheduling you choose, each thread
            // will execute a particular portion of your N iterations
        } // IMPLICIT BARRIER             
  }
}