在julia中并行化两个(或多个)函数

在julia中并行化两个(或多个)函数,julia,Julia,我试图用有限差分法解决一些波动方程问题(与我的博士学位相关)。为此,我(逐行)翻译了一段fortran代码(下面的链接):() 在这些代码和时间循环中,有四个独立的主循环。事实上,我可以把它们分成四个功能。 由于我必须运行这段代码大约一百次,所以最好能加快这个过程。从这个意义上说,我将目光转向并行化。请参见以下示例: function main() ...some common code... for time=1:N function fun1() # I want th

我试图用有限差分法解决一些波动方程问题(与我的博士学位相关)。为此,我(逐行)翻译了一段fortran代码(下面的链接):()
在这些代码和时间循环中,有四个独立的主循环。事实上,我可以把它们分成四个功能。 由于我必须运行这段代码大约一百次,所以最好能加快这个过程。从这个意义上说,我将目光转向并行化。请参见以下示例:

function main()

...some common code...
   for time=1:N
       function fun1() # I want this function to run parallel... 
       function fun2() # ..this function to run parallel with 1,3,4
       function fun3() # ..This function to run parallel with 2,3,4
       function fun4() # ..This function to run parallel with 1,2,3
   end
   ... more code here...
return
end
所以

1) 有可能做到我之前提到的吗

2) 这种方法会加速我的代码吗

3) 有没有更好的方法来思考这个问题

最简单的工作示例如下:

function fun1(t)
for i=1:1000
    for j=1:1000
        t+=(0.5)^t+(0.3)^(t-1);
    end
end
return t
end
因此,可以看出,上述三个函数(fun1、fun2和fun3)中的任何一个函数都依赖于任何其他函数,因此它们肯定可以并行运行。这些可以实现吗?它会降低我的计算速度吗

编辑:

Hi@BogumiłKamiński我修改了有限差分eq,以便在我的函数的输入和输出上实现“循环”(如您所建议的)。如果没有太多麻烦,我想听听您对代码并行化设计的看法:

关键要素
1) 我已将所有输入打包为4个元组:sig_xy_insig_xy_cros_in(用于2个西格玛函数)和vel_vx_invel_vy_in(用于2个速度函数)。然后,为了“循环”的目的,我将4个元组压缩成2个向量…
2) 为了“循环”的目的,我将4个函数打包成2个向量…
3) 我运行第一个并行循环,然后解压缩其输出元组…
4) 我运行第二个并行循环(速度),然后解压缩其输出元组…
5) 最后,我将输出的元素打包到输入元组中,并继续时间循环直到完成

...code

  l = Threads.SpinLock()
  arg_in_sig  = [sig_xy_in,sig_xy_cros_in]; # Inputs tuples x sigma funct
  arg_in_vel  = [vel_vx_in,     vel_vy_in]; # Inputs tuples x velocity funct
  func_sig    = [sig_xy   ,   sig_xy_cros]; # Vector with two sigma functions
  func_vel    = [vel_vx   ,        vel_vy]; # Vector with two velocity functions

  for it = 1:NSTEP # time steps
    #------------------------------------------------------------
    # Compute sigma functions 
    #------------------------------------------------------------
    Threads.@threads for j in 1:2 # Star parallel of two sigma functs  
        Threads.lock(l);
        Threads.unlock(l);
        arg_in_sig[j] = func_sig[j](arg_in_sig[j]);
    end

    # Unpack tuples for sig_xy and sig_xy_cros
    # Unpack tuples for sig_xy
    sigxx    = arg_in_sig[1][1];  # changed by sig_xy
    sigyy    = arg_in_sig[1][2];  # changed by sig_xy
    m_dvx_dx = arg_in_sig[1][3];  # changed by sig_xy
    m_dvy_dy = arg_in_sig[1][4];  # changed by sig_xy
    vx       = arg_in_sig[1][5];  # unchanged by sig_xy
    vy       = arg_in_sig[1][6];  # unchanged by sig_xy
    delx_1   = arg_in_sig[1][7];  # unchanged by sig_xy
    dely_1   = arg_in_sig[1][8];  # unchanged by sig_xy

    ...more unpacking...

    # Unpack tuples for sig_xy_cros
    sigxy    = arg_in_sig[2][1];  # changed by sig_xy_cros
    m_dvy_dx = arg_in_sig[2][2];  # changed by sig_xy_cros
    m_dvx_dy = arg_in_sig[2][3];  # changed by sig_xy_cros
    vx       = arg_in_sig[2][4];  # unchanged by sig_xy_cros
    vy       = arg_in_sig[2][5];  # unchanged by sig_xy_cros

    ...more unpacking....

    #--------------------------------------------------------
    # velocity
    #--------------------------------------------------------
    Threads.@threads for j in 1:2 # Start parallel ot two velocity funct
       Threads.lock(l)
       Threads.unlock(l)
       arg_in_vel[j] = func_vel[j](arg_in_vel[j])
    end

    # Unpack tuples for vel_vx
    vx          = arg_in_vel[1][1];  # changed by vel_vx
    m_dsigxx_dx = arg_in_vel[1][2];  # changed by vel_vx
    m_dsigxy_dy = arg_in_vel[1][3];  # changed by vel_vx
    sigxx       = arg_in_vel[1][4];  # unchanged changed by vel_vx
    sigxy       = arg_in_vel[1][5];....

    # Unpack tuples for vel_vy
    vy          = arg_in_vel[2][1];  # changed changed by vel_vy
    m_dsigxy_dx = arg_in_vel[2][2];  # changed changed by vel_vy
    m_dsigyy_dy = arg_in_vel[2][3];  # changed changed by vel_vy
    sigxy       = arg_in_vel[2][4];  # unchanged changed by vel_vy
    sigyy       = arg_in_vel[2][5];  # unchanged changed by vel_vy
    .....

    ...more unpacking...

    # ensamble new input variables
      sig_xy_in  = (sigxx,sigyy,
              m_dvx_dx,m_dvy_dy,
              vx,vy,....);

      sig_xy_cros_in = (sigxy,
              m_dvy_dx,m_dvx_dy,
              vx,vy,....;

      vel_vx_in = (vx,....
      vel_vy_in = (vy,.....
end #time loop

以下是以多线程模式运行代码的简单方法:

function fun1(t)
    for i=1:1000
        for j=1:1000
            t+=(0.5)^t+(0.3)^(t-1);
        end
    end
    return t
end
function fun2(t)
    for i=1:1000
        for j=1:1000
            t+=(0.5)^t;
        end
    end
    return t
end
function fun3(r)
    for i=1:1000
        for j=1:1000
            r = (r + rand())/r;
        end
    end
    return r
end

function main()
    l = Threads.SpinLock()
    a = [2.0, 2.5, 3.0]
    f = [fun1, fun2, fun3]
    Threads.@threads for i in 1:3
        for j in 1:4
            Threads.lock(l)
            println((thread=Threads.threadid(), iteration=j))
            Threads.unlock(l)
            a[i] = f[i](a[i])
        end
    end
    return a
end
我已经添加了锁定——这只是一个如何实现的示例(在Julia 1.3中,由于IO是线程安全的,所以不必这样做)。 还要注意,在Julia 1.3之前,
rand()
在线程之间共享数据,因此如果所有线程都使用
rand()
,则运行这些函数是不安全的(同样,在Julia 1.3中,这样做是安全的)

要运行此代码,请首先设置要使用的最大线程数,例如,在Windows上这样做:
设置JULIA_NUM_threads=4
(在Linux中,您应该
导出
)。下面是此代码运行的示例(为了缩短输出,我减少了迭代次数):


现在有一个小小的注意事项-虽然在Julia中使代码多线程化相对容易(在Julia 1.3中更简单),但您在执行时必须小心,因为您必须考虑竞争条件。

从您的描述中不清楚
fun1
fun2
fun3
fun4
共享任何内容。如果是-您需要多线程来获得快速代码。如果不是,那么多处理就足够了(但是每个函数都在一个单独的进程上执行,所以数据共享会很慢)。这两个选项在Julia中都相对容易实现。我建议最好是复制一些短代码(没有算法复杂性),但可以在单个内核上运行的代码,这样就更容易给出建议了。你想做分布式计算还是多线程计算?退房您可能想在Julia 1.3或分布式上生成线程。@spawnat,这取决于您是想使用一个节点还是多个节点。@BogumiłKamiński。因此,我用一个简单的工作示例编辑了这篇文章。现在有三个函数不共享任何信息。那么,我可以并行运行这三个函数吗?这些并行化会提高我的收敛速度吗?。提前谢谢@ChrisRackauckas感谢您分享本手册的链接。我现在正在仔细阅读。谢谢你的快速回复!如果您确定只读取数据,那么它是安全的。您的代码中不需要
lock
——它只用于打印(因为在Julia 1.3之前IO不是线程安全的)。除此之外,代码看起来是正确的(但可能有点过于冗长,但这是另一个与线程无关的问题)。谢谢!。现在,当我尝试运行代码时,julia抛出一条错误消息,说:error:LoadError:syntax:assignment not allowed in tuple。为什么会这样?我使用的是Julia 6.0您可能应该切换到Julia 1.2,但我认为代码应该在0.6下运行。请将部件
(thread=Threads.threadid(),iteration=j)
更改为
“thread”,Threads.threadid(),“,iteration”,j
,如Julia 0.6
中所述,如果我没记错的话,名为tuples
的部件还不可用()14.460027秒(909次分配:14.516千磅)分配的字节数:14864,而上一个为:julia>timev main()22.390655秒(906分配:14.438 KiB)分配的字节数:14784快多了。非常感谢!没有版本6.0。最新的稳定版本是1.2,但为了使用最新的多线程功能,您需要版本1.3的预发行版。我猜您可能正在使用版本0.6。@DNF确实如此。我的意思是V0.6。我的错!
function main()
    a = 2;
    b = 2.5;
    c = 3.0;
    for i=1:100
        a = fun1(a);
        b = fun2(b);
        c = fun3(c);
    end
return;
end
...code

  l = Threads.SpinLock()
  arg_in_sig  = [sig_xy_in,sig_xy_cros_in]; # Inputs tuples x sigma funct
  arg_in_vel  = [vel_vx_in,     vel_vy_in]; # Inputs tuples x velocity funct
  func_sig    = [sig_xy   ,   sig_xy_cros]; # Vector with two sigma functions
  func_vel    = [vel_vx   ,        vel_vy]; # Vector with two velocity functions

  for it = 1:NSTEP # time steps
    #------------------------------------------------------------
    # Compute sigma functions 
    #------------------------------------------------------------
    Threads.@threads for j in 1:2 # Star parallel of two sigma functs  
        Threads.lock(l);
        Threads.unlock(l);
        arg_in_sig[j] = func_sig[j](arg_in_sig[j]);
    end

    # Unpack tuples for sig_xy and sig_xy_cros
    # Unpack tuples for sig_xy
    sigxx    = arg_in_sig[1][1];  # changed by sig_xy
    sigyy    = arg_in_sig[1][2];  # changed by sig_xy
    m_dvx_dx = arg_in_sig[1][3];  # changed by sig_xy
    m_dvy_dy = arg_in_sig[1][4];  # changed by sig_xy
    vx       = arg_in_sig[1][5];  # unchanged by sig_xy
    vy       = arg_in_sig[1][6];  # unchanged by sig_xy
    delx_1   = arg_in_sig[1][7];  # unchanged by sig_xy
    dely_1   = arg_in_sig[1][8];  # unchanged by sig_xy

    ...more unpacking...

    # Unpack tuples for sig_xy_cros
    sigxy    = arg_in_sig[2][1];  # changed by sig_xy_cros
    m_dvy_dx = arg_in_sig[2][2];  # changed by sig_xy_cros
    m_dvx_dy = arg_in_sig[2][3];  # changed by sig_xy_cros
    vx       = arg_in_sig[2][4];  # unchanged by sig_xy_cros
    vy       = arg_in_sig[2][5];  # unchanged by sig_xy_cros

    ...more unpacking....

    #--------------------------------------------------------
    # velocity
    #--------------------------------------------------------
    Threads.@threads for j in 1:2 # Start parallel ot two velocity funct
       Threads.lock(l)
       Threads.unlock(l)
       arg_in_vel[j] = func_vel[j](arg_in_vel[j])
    end

    # Unpack tuples for vel_vx
    vx          = arg_in_vel[1][1];  # changed by vel_vx
    m_dsigxx_dx = arg_in_vel[1][2];  # changed by vel_vx
    m_dsigxy_dy = arg_in_vel[1][3];  # changed by vel_vx
    sigxx       = arg_in_vel[1][4];  # unchanged changed by vel_vx
    sigxy       = arg_in_vel[1][5];....

    # Unpack tuples for vel_vy
    vy          = arg_in_vel[2][1];  # changed changed by vel_vy
    m_dsigxy_dx = arg_in_vel[2][2];  # changed changed by vel_vy
    m_dsigyy_dy = arg_in_vel[2][3];  # changed changed by vel_vy
    sigxy       = arg_in_vel[2][4];  # unchanged changed by vel_vy
    sigyy       = arg_in_vel[2][5];  # unchanged changed by vel_vy
    .....

    ...more unpacking...

    # ensamble new input variables
      sig_xy_in  = (sigxx,sigyy,
              m_dvx_dx,m_dvy_dy,
              vx,vy,....);

      sig_xy_cros_in = (sigxy,
              m_dvy_dx,m_dvx_dy,
              vx,vy,....;

      vel_vx_in = (vx,....
      vel_vy_in = (vy,.....
end #time loop
function fun1(t)
    for i=1:1000
        for j=1:1000
            t+=(0.5)^t+(0.3)^(t-1);
        end
    end
    return t
end
function fun2(t)
    for i=1:1000
        for j=1:1000
            t+=(0.5)^t;
        end
    end
    return t
end
function fun3(r)
    for i=1:1000
        for j=1:1000
            r = (r + rand())/r;
        end
    end
    return r
end

function main()
    l = Threads.SpinLock()
    a = [2.0, 2.5, 3.0]
    f = [fun1, fun2, fun3]
    Threads.@threads for i in 1:3
        for j in 1:4
            Threads.lock(l)
            println((thread=Threads.threadid(), iteration=j))
            Threads.unlock(l)
            a[i] = f[i](a[i])
        end
    end
    return a
end
julia> main()
(thread = 1, iteration = 1)
(thread = 3, iteration = 1)
(thread = 2, iteration = 1)
(thread = 3, iteration = 2)
(thread = 3, iteration = 3)
(thread = 3, iteration = 4)
(thread = 2, iteration = 2)
(thread = 1, iteration = 2)
(thread = 2, iteration = 3)
(thread = 2, iteration = 4)
(thread = 1, iteration = 3)
(thread = 1, iteration = 4)
3-element Array{Float64,1}:
 21.40311930108456
 21.402807510451463
  1.219028489573526