Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/css/38.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Julia中的线程性能_Julia - Fatal编程技术网

Julia中的线程性能

Julia中的线程性能,julia,Julia,我尝试并行Julia代码并没有随着线程数量的增加而提高性能 无论我将JULIA_NUM_THREADS设置为2还是32,下面的代码都大约在同一时间运行 using Random using Base.Threads rmax = 10 dr = 1 Ngal = 100000000 function bin(id, Njobs, x, y, z, w) bin_array = zeros(10) for i in (id-1)*Njobs + 1:id*Njobs

我尝试并行Julia代码并没有随着线程数量的增加而提高性能

无论我将JULIA_NUM_THREADS设置为2还是32,下面的代码都大约在同一时间运行

using Random
using Base.Threads

rmax = 10
dr = 1
Ngal = 100000000

function bin(id, Njobs, x, y, z, w)
    bin_array = zeros(10)
    for i in (id-1)*Njobs + 1:id*Njobs
        r = sqrt(x[i]^2 + y[i]^2 + z[i]^2)
        i_bin = floor(Int, r/dr) + 1
        if i_bin < 10
            bin_array[i_bin] += w[i]
        end
    end
    bin_array
end

Nthreads = nthreads()

x = rand(Ngal)*5
y = rand(Ngal)*5
z = rand(Ngal)*5
w = ones(Ngal)

V = let
    VV = [zeros(10) for _ in 1:Nthreads]
    jobs_per_thread = fill(div(Ngal, Nthreads),Nthreads)
    for i in 1:Ngal-sum(jobs_per_thread)
        jobs_per_thread[i] += 1
    end
    @threads for i = 1:Nthreads
        tid = threadid()
        VV[tid] = bin(tid, jobs_per_thread[tid], x, y, z, w)
    end
    reduce(+, VV)
end
使用随机
使用Base.Threads
rmax=10
dr=1
Ngal=100000000
功能箱(id、NJOB、x、y、z、w)
bin_数组=零(10)
对于i in(id-1)*Njobs+1:id*Njobs
r=sqrt(x[i]^2+y[i]^2+z[i]^2)
i_bin=楼层(内部,r/dr)+1
如果i_bin<10
bin_数组[i_bin]+=w[i]
结束
结束
bin_阵列
结束
Nthreads=Nthreads()
x=兰特(牛角)*5
y=兰特(Ngal)*5
z=兰特(Ngal)*5
w=一(Ngal)
V=出租
VV=[0(10)表示1:n字节中的uu]
每个线程的作业=填充(分区(Ngal,N个线程),N个线程)
对于1中的i:Ngal总和(每个线程的作业数)
每个线程的作业数[i]+=1
结束
@i=1:n的线程数
tid=threadid()
VV[tid]=bin(tid,每个线程的作业数[tid],x,y,z,w)
结束
减少(+,VV)
结束

我做错什么了吗?

与其他操作相比,在线程循环中花费的时间微不足道。您还根据线程的数量分配数组的大小,因此当使用多个线程时,您在内存分配上花费的时间甚至(稍微)更多


请看一下您是否关心性能。特别是,不惜任何代价避免全局变量(它们会破坏性能),并将所有内容都放在函数中,这样也更易于测试和调试。例如,我将您的代码改写为:

using Random
using Base.Threads

function bin(id, Njobs, x, y, z, w)
    dr = 1

    bin_array = zeros(10)
    for i in (id-1)*Njobs + 1:id*Njobs
        r = sqrt(x[i]^2 + y[i]^2 + z[i]^2)
        i_bin = floor(Int, r/dr) + 1
        if i_bin < 10
            bin_array[i_bin] += w[i]
        end
    end
    bin_array
end

function test()
    Ngal = 100000000
    x = rand(Ngal)*5
    y = rand(Ngal)*5
    z = rand(Ngal)*5
    w = ones(Ngal)

    Nthreads = nthreads()
    VV = [zeros(10) for _ in 1:Nthreads]
    jobs_per_thread = fill(div(Ngal, Nthreads),Nthreads)
    for i in 1:Ngal-sum(jobs_per_thread)
        jobs_per_thread[i] += 1
    end
    @threads for i = 1:Nthreads
        tid = threadid()
        VV[tid] = bin(tid, jobs_per_thread[tid], x, y, z, w)
    end
    reduce(+, VV)
end

test()
具有4个线程的性能:

julia> @time test();
  2.602698 seconds (65 allocations: 5.215 GiB, 9.92% gc time)
julia> @time test();
  2.481054 seconds (27 allocations: 5.215 GiB, 12.08% gc time)
如果我在
test()
中为
循环注释
,我会得到以下计时。一个线程:

julia> @time test();
  3.054144 seconds (33 allocations: 5.215 GiB, 11.03% gc time)
julia> @time test();
  2.444296 seconds (21 allocations: 5.215 GiB, 10.54% gc time)
4个线程:

julia> @time test();
  2.602698 seconds (65 allocations: 5.215 GiB, 9.92% gc time)
julia> @time test();
  2.481054 seconds (27 allocations: 5.215 GiB, 12.08% gc time)