两个数组的julia乘法_Julia_Linear Algebra

两个数组的julia乘法

julia

两个数组的julia乘法,julia,linear-algebra,Julia,Linear Algebra,有没有一种方法可以更优雅地加速/写入这个数组乘法（在numpy数组中，我会将其写成a*B）有很多Julia软件包，可以让你在一行简单的代码中写出你的收缩。以下是基于和的几个示例：除了这些优雅（且快速，见下文）的版本外，您还可以对循环代码进行一些改进。这里是您的代码，封装到函数中，以及带有注释的改进版本： function f(A,B) C = zeros(8,15,5) for i in 1:8 for j in 1:15 for k

有没有一种方法可以更优雅地加速/写入这个数组乘法（在numpy数组中，我会将其写成a*B）

有很多Julia软件包，可以让你在一行简单的代码中写出你的收缩。以下是基于和的几个示例：

除了这些优雅（且快速，见下文）的版本外，您还可以对循环代码进行一些改进。这里是您的代码，封装到函数中，以及带有注释的改进版本：

function f(A,B)
    C = zeros(8,15,5)
    for i in 1:8
        for j in 1:15
            for k in 1:10
                for l in 1:5
                    C[i,j,l] = A[i,j,:]⋅B[:,l]
                end
            end
        end
    end
    return C
end

function f_fast(A,B)
    # check bounds
    n1,n2,n3 = size(A)
    m1, m2 = size(B)
    @assert m1 == n3
    C = zeros(n1,n2,m2)

    # * @inbounds to skip boundchecks inside the loop
    # * different order of the loops to account for Julia's column major order
    # * written out the k-loop (dot product) explicitly to avoid temporary allocations
    @inbounds for l in 1:m2
                for k in 1:m1
                        for j in 1:n2
                            for i in 1:n1
                            C[i,j,l] += A[i,j,k]*B[k,l]
                        end
                    end
                end
            end
    return C
end

让我们比较一下所有的方法。首先，我们检查正确性：

using Test
@test f(A,B) ≈ f_omeinsum(A,B) # Test passed
@test f(A,B) ≈ f_einsum(A,B) # Test passed
@test f(A,B) ≈ f_tensor(A,B) # Test passed
@test f(A,B) ≈ f_fast(A,B) # Test passed

现在，让我们使用。我把计时作为注释放在我的机器上

using BenchmarkTools
@btime f($A,$B); # 663.500 μs (12001 allocations: 1.84 MiB)
@btime f_omeinsum($A,$B); # 33.799 μs (242 allocations: 20.20 KiB)
@btime f_einsum($A,$B); # 4.200 μs (1 allocation: 4.81 KiB)
@btime f_tensor($A,$B); # 2.367 μs (3 allocations: 4.94 KiB)
@btime f_fast($A,$B); # 7.375 μs (1 allocation: 4.81 KiB)

正如我们所看到的，所有基于einsum/张量表示法的方法都比您最初的循环实现快得多——而且只有一行！我们的

f_-fast

的性能大致相同，但仍然落后于最快的

f_-tensor

最后，让我们全力以赴，因为我们可以。利用来自的向导，我们将

f_fast

中的

@inbounds

替换为

@avx

（我们将此新版本称为

f_avx

），并自动获得比上述

f_张量

性能高2倍的速度：

@test f(A,B) ≈ f_avx(A,B) # Test passed
@btime f_avx($A,$B); # 930.769 ns (1 allocation: 4.81 KiB)

但是，由于它的简单性，我仍然更喜欢

f_tensor

，除非在应用程序中每微秒都很重要。

在python中如何使用numpy？我得到

ValueError:当我尝试使用A
和B
numpy数组时，操作数无法与形状（7,14,9）（9,4）一起广播。

using BenchmarkTools
@btime f($A,$B); # 663.500 μs (12001 allocations: 1.84 MiB)
@btime f_omeinsum($A,$B); # 33.799 μs (242 allocations: 20.20 KiB)
@btime f_einsum($A,$B); # 4.200 μs (1 allocation: 4.81 KiB)
@btime f_tensor($A,$B); # 2.367 μs (3 allocations: 4.94 KiB)
@btime f_fast($A,$B); # 7.375 μs (1 allocation: 4.81 KiB)

@test f(A,B) ≈ f_avx(A,B) # Test passed
@btime f_avx($A,$B); # 930.769 ns (1 allocation: 4.81 KiB)