Warning: file_get_contents(/data/phpspider/zhask/data//catemap/6/multithreading/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
C++ 增量OpenMP中用于进度报告的共享循环计数器_C++_Multithreading_Openmp_Raytracing - Fatal编程技术网

C++ 增量OpenMP中用于进度报告的共享循环计数器

C++ 增量OpenMP中用于进度报告的共享循环计数器,c++,multithreading,openmp,raytracing,C++,Multithreading,Openmp,Raytracing,我希望跟踪由长时间运行的光线跟踪过程处理的总像素和光线。如果我每次迭代都更新共享变量,那么由于同步的原因,这个过程会明显减慢。我希望跟踪进度,并在最后得到准确的计数结果。有没有办法用OpenMP for loops做到这一点 下面是一些有问题的循环代码: void Raytracer::trace(RenderTarget& renderTarget, const Scene& scene, std::atomic<int>& sharedPixelCount

我希望跟踪由长时间运行的光线跟踪过程处理的总像素和光线。如果我每次迭代都更新共享变量,那么由于同步的原因,这个过程会明显减慢。我希望跟踪进度,并在最后得到准确的计数结果。有没有办法用OpenMP for loops做到这一点

下面是一些有问题的循环代码:

void Raytracer::trace(RenderTarget& renderTarget, const Scene& scene, std::atomic<int>& sharedPixelCount, std::atomic<int>& sharedRayCount)
{
    int width = renderTarget.getWidth();
    int height = renderTarget.getHeight();
    int totalPixelCount = width * height;

    #pragma omp parallel for schedule(dynamic, 4096)
    for (int i = 0; i < totalPixelCount; ++i)
    {
        int x = i % width;
        int y = i / width;

        Ray rayToScene = scene.camera.getRay(x, y);
        shootRay(rayToScene, scene, sharedRayCount); // will increment sharedRayCount
        renderTarget.setPixel(x, y, rayToScene.color.clamped());

        ++sharedPixelCount;
    }
}
void Raytracer::trace(RenderTarget&RenderTarget,const Scene&Scene,std::atomic&sharedPixelCount,std::atomic&sharedRayCount)
{
int width=renderTarget.getWidth();
int height=renderTarget.getHeight();
int totalPixelCount=宽度*高度;
#pragma omp并行计划(动态,4096)
对于(int i=0;i
下面是一个如何操作的示例:

void Raytracer::trace(RenderTarget& renderTarget, const Scene& scene, std::atomic<int>& sharedPixelCount, std::atomic<int>& sharedRayCount)
{
    int width = renderTarget.getWidth();
    int height = renderTarget.getHeight();
    int totalPixelCount = width * height;
    int rayCount = 0;
    int previousRayCount = 0;

    #pragma omp parallel for schedule(dynamic, 1000) reduction(+:rayCount) firstprivate(previousRayCount)
    for (int i = 0; i < totalPixelCount; ++i)
    {
        int x = i % width;
        int y = i / width;

        Ray rayToScene = scene.camera.getRay(x, y);
        shootRay(rayToScene, scene, rayCount);
        renderTarget.setPixel(x, y, rayToScene.color.clamped());

        if ((i + 1) % 100 == 0)
        {
            sharedPixelCount += 100;
            sharedRayCount += (rayCount - previousRayCount);
            previousRayCount = rayCount;
        }
    }

    sharedPixelCount = totalPixelCount;
    sharedRayCount = rayCount;
}
void Raytracer::trace(RenderTarget&RenderTarget,const Scene&Scene,std::atomic&sharedPixelCount,std::atomic&sharedRayCount)
{
int width=renderTarget.getWidth();
int height=renderTarget.getHeight();
int totalPixelCount=宽度*高度;
int光线计数=0;
int-previousRayCount=0;
#pragma omp并行用于计划(动态,1000)缩减(+:rayCount)firstprivate(previousRayCount)
对于(int i=0;i

当循环运行时,它不会100%准确,但误差可以忽略不计。最后将报告精确的值。

既然动态调度的并行for循环的块大小为4096,为什么不将其用作分期偿还计数器更新的粒度

例如,下面这样的方法可能会起作用。我没有测试这段代码,您可能需要为
totalPixelCount%4096添加一些簿记=0

与前面的答案不同,这不会向循环中添加分支,而不是循环本身所暗示的分支,因为许多处理器都为其优化了指令。它也不需要任何额外的变量或算术运算

void Raytracer::trace(RenderTarget& renderTarget, const Scene& scene, std::atomic<int>& sharedPixelCount, std::atomic<int>& sharedRayCount)
{
    int width = renderTarget.getWidth();
    int height = renderTarget.getHeight();
    int totalPixelCount = width * height;

    #pragma omp parallel for schedule(dynamic, 1)
    for (int j = 0; j < totalPixelCount; j+=4096)
    {
      for (int i = j; i < (i+4096); ++i)
      {
        int x = i % width;
        int y = i / width;

        Ray rayToScene = scene.camera.getRay(x, y);
        shootRay(rayToScene, scene, sharedRayCount);
        renderTarget.setPixel(x, y, rayToScene.color.clamped());
      }
      sharedPixelCount += 4096;
    }
}
void Raytracer::trace(RenderTarget& renderTarget, const Scene& scene, std::atomic<int>& sharedPixelCount, std::atomic<int>& sharedRayCount)
{
    int width = renderTarget.getWidth();
    int height = renderTarget.getHeight();
    int totalPixelCount = width * height;

    int reducePixelCount = 0;
    #pragma omp parallel for schedule(dynamic, 4096) \
                         reduction(+:reducePixelCount) \
                         shared(reducePixelCount)
    for (int i = 0; i < totalPixelCount; ++i)
    {
        int x = i % width;
        int y = i / width;

        Ray rayToScene = scene.camera.getRay(x, y);
        shootRay(rayToScene, scene, sharedRayCount);
        renderTarget.setPixel(x, y, rayToScene.color.clamped());

        ++reducePixelCount; /* thread-local operation, not atomic */
    }

    /* The interoperability of C++11 atomics and OpenMP is not defined yet,
     * so this should just be avoided until OpenMP 5 at the earliest. 
     * It is sufficient to reduce over a non-atomic type and 
     * do the assignment here. */
    sharedPixelCount = reducePixelCount;
}