Fortran OpenMP代码比其非并行版本慢得多_Fortran_Openmp

Fortran OpenMP代码比其非并行版本慢得多

fortran

Fortran OpenMP代码比其非并行版本慢得多,fortran,openmp,Fortran,Openmp,我想解决随机游走问题，所以我写了一个fortran序列代码，现在我需要并行这个代码 subroutine random_walk(walkers) implicit none include "omp_lib.h" integer :: i, j, col, row, walkers,m,n,iter real, dimension(:, :), allocatable :: matrix, res real :: point, z col = 12 row = 12 allocate

我想解决随机游走问题，所以我写了一个fortran序列代码，现在我需要并行这个代码

subroutine random_walk(walkers)

implicit none
include "omp_lib.h"
integer :: i, j, col, row, walkers,m,n,iter
real, dimension(:, :), allocatable :: matrix, res
real :: point, z


col = 12
row = 12


allocate (matrix(row, col), res(row, col))

! Read from file
open(2, file='matrix.txt')
    do i = 1, row
        read(2, *)(matrix(i, j), j=1,col)
    end do

res = matrix


! Solve task

!$omp parallel private(i,j,m,n,point,iter) 

!$omp do collapse(2) 

do i= 2, 11        
    do j=2, 11  

        m = i
        n = j
        iter = 1
        point = 0

        do while (iter <= walkers)
            call random_number(z)
            if (z <= 0.25) m = m - 1
            if (z > 0.25 .and. z <= 0.5) n = n +1
            if (z > 0.5 .and. z <= 0.75) m = m +1
            if (z > 0.75) n = n - 1

            if (m == 1 .or. m == 12 .or. n == 1 .or. n == 12) then 
                point = point + matrix(m, n)
                m = i
                n = j
                iter = iter + 1
            end if

        end do
        point = point / walkers           

        res(i, j) = point    
    end do        
end do

!$omp end do
!$omp end parallel    

! Write to file
open(2, file='out_omp.txt')
    do i = 1, row
        write(2, *)(res(i, j), j=1,col)
    end do    
contains    

end

子例程随机行走（步行者）
隐式无
包括“omp_lib.h”
整数：：i，j，col，row，walkers，m，n，iter
实，维（：，：），可分配：：矩阵，res
实数：点，z
col=12
行=12
分配（矩阵（行，列），资源（行，列））
! 从文件中读取
打开（2，file='matrix.txt'）
i=1，行吗
读取（2，*）（矩阵（i，j），j=1，col）
结束
res=矩阵
! 解决任务
!$omp并行专用（i、j、m、n、点、iter）
!$omp do崩溃（2）
i=2，11吗
j=2，11吗
m=i
n=j
iter=1
点=0
do while（iter最有可能的是，该行为与随机数提取有关。随机数Fortran过程甚至不能保证线程安全，但由于GNU扩展，它至少在GNU编译器中是线程安全的。但无论如何，正如您所注意到的，性能似乎非常差
如果切换到不同的线程安全随机数生成器，代码的可伸缩性会很好。我使用了经典的ran2.f
生成器：

修改以使其线程安全。如果我没有错，请执行以下操作：

在呼叫单元中声明并定义：
integer:：iv（32），iy，idum2，idum

idum2=123456789；iv（：）=0；iy=0

在OpenMP指令中，将idum
添加为私有，idum2
，iv
，iy
添加为第一私有（顺便说一下，您也需要将z
添加为私有）
在平行部分中添加（在do之前）
idum=-omp\u get\u thread\u num（）
对于不同的线程有不同的随机数
从ran2函数中删除数据并将行e passidum2、iv、iy
保存为参数：
函数ran2（idum、iv、iy、idum2）

调用ran2而不是随机数
z=ran2（idum，iv，iy，idum2）


对于walkers=100000（GNU编译器），以下是我的时间：
1 thread   => 4.7s
2 threads  => 2.4s
4 threads  => 1.5s
8 threads  => 0.78s
16 threads => 0.49s

与这个问题没有严格的关系，但我必须说，为您需要的每4“位”信息提取一个实数（+1或-1）使用更有效的策略可能会改变条件的使用。
没有条件，其他人很难帮助你。此外，如果你能更详细地说明你的目标和设计选择，这也会有所帮助。一个选择是摆脱！$omp节
一个更好的选择是使用多个！$omp节
并获得ri（丑陋的imho）的dif（thrd_num…
我试图让每个线程计算单独的矩阵片段。线程0用于（2:6；2:6），线程1用于（7:11；2:6），线程2用于（2:6；7:11），线程3用于（7:11）；这就是为什么我需要这个丑陋的“if（thrd_num…）。除非您想要控制映射（例如，哪个线程做什么）我怀疑，您不需要这个。无论如何，只要正确使用OpenMP部分，请参阅示例