Arrays 朱莉娅：矢量化与非矢量化代码_Arrays_For Loop_Vectorization_Julia

Arrays 朱莉娅：矢量化与非矢量化代码

arrays for-loop julia

Arrays 朱莉娅：矢量化与非矢量化代码,arrays,for-loop,vectorization,julia,Arrays,For Loop,Vectorization,Julia,据我所知，Julia应该使循环更快，速度与矢量化操作一样快。我编写了一个简单函数的三个版本，分别使用for循环和向量化操作来查找距离，后者使用数据帧： x = rand(500) y = rand(500) a = rand() b = rand() function devect() dist = Array(Float64, 0) twins = Array(Float64, 0,2) for i in 1:500 dist = [dist; sq

据我所知，Julia应该使循环更快，速度与矢量化操作一样快。我编写了一个简单函数的三个版本，分别使用for循环和向量化操作来查找距离，后者使用数据帧：

x = rand(500)
y = rand(500)
a = rand()
b = rand()

function devect()
    dist = Array(Float64, 0)
    twins = Array(Float64, 0,2)

    for i in 1:500
        dist = [dist; sqrt((x[i] - a)^2 + (y[i] - b)^2)]
        if dist[end] < 0.05
            twins = [twins; [x y][end,:]]
        end
    end

    return twins
end

function vect()
    d = sqrt((x-a).^2 + (y-b).^2)
    return [x y][d .< 0.05,:]
end

using DataFrames

function df_vect()
    df = DataFrame(x=x, y=y)
    dist = sqrt((df[:x]-a).^2 + (df[:y]-b).^2)

    return df[dist .< 0.05,:]
end

n = 10^3

@time for i in [1:n] devect() end
@time for i in [1:n] vect() end
@time for i in [1:n] df_vect() end

为什么矢量化版本的执行速度要快得多？

继续我对devect中用于构造解决方案的方法的评论。这是我的密码

julia> x, y, a, b = rand(500), rand(500), rand(), rand()

julia> function devect{T}(x::Vector{T}, y::Vector{T}, a::T, b::T)
       res = Array(T, 0)
       dim1 = 0
       for i = 1:size(x,1)
           if sqrt((x[i]-a)^2+(y[i]-b)^2) < 0.05
               push!(res, x[i])
               push!(res, y[i])
               dim1 += 1
           end
       end
       reshape(res, (2, dim1))'
   end
devect (generic function with 1 method)

julia> function vect{T}(x::Vector{T}, y::Vector{T}, a::T, b::T)
       d = sqrt((x-a).^2+(y-b).^2)
       [x y][d.<0.05, :]
   end
vect (generic function with 1 method)

julia> @time vect(x, y, a, b)
elapsed time: 3.7118e-5 seconds (37216 bytes allocated)
2x2 Array{Float64,2}:
 0.978099  0.0405639
 0.94757   0.0224974

julia> @time vect(x, y, a, b)
elapsed time: 7.1977e-5 seconds (37216 bytes allocated)
2x2 Array{Float64,2}:
 0.978099  0.0405639
 0.94757   0.0224974

julia> @time devect(x, y, a, b)
elapsed time: 1.7146e-5 seconds (376 bytes allocated)
2x2 Array{Float64,2}:
 0.978099  0.0405639
 0.94757   0.0224974

julia> @time devect(x, y, a, b)
elapsed time: 1.3065e-5 seconds (376 bytes allocated)
2x2 Array{Float64,2}:
 0.978099  0.0405639
 0.94757   0.0224974

julia> @time devect(x, y, a, b)
elapsed time: 1.8059e-5 seconds (376 bytes allocated)
2x2 Array{Float64,2}:
 0.978099  0.0405639
 0.94757   0.0224974

julia>x，y，a，b=rand（500），rand（500），rand（），rand（）
julia>函数devect{T}（x:：Vector{T}，y:：Vector{T}，a:：T，b:：T）
res=数组（T，0）
dim1=0
对于i=1：尺寸（x，1）
如果sqrt（（x[i]-a）^2+（y[i]-b）^2）<0.05
推（res，x[i]）
推（res，y[i]）
dim1+=1
结束
结束
重塑（res，（2，dim1））'
结束
devect（具有1个方法的通用函数）
julia>函数vect{T}（x:：Vector{T}，y:：Vector{T}，a:：T，b:：T）
d=sqrt（（x-a）。^2+（y-b）。^2）
[xy][d.@时间向量（x，y，a，b）
运行时间：3.7118e-5秒（分配37216字节）
2x2数组{Float64,2}：
0.978099  0.0405639
0.94757   0.0224974
朱莉娅>@时间向量（x，y，a，b）
运行时间：7.1977e-5秒（分配37216字节）
2x2数组{Float64,2}：
0.978099  0.0405639
0.94757   0.0224974
朱莉娅>@time-devect（x，y，a，b）
运行时间：1.7146e-5秒（分配376字节）
2x2数组{Float64,2}：
0.978099  0.0405639
0.94757   0.0224974
朱莉娅>@time-devect（x，y，a，b）
运行时间：1.3065e-5秒（分配376字节）
2x2数组{Float64,2}：
0.978099  0.0405639
0.94757   0.0224974
朱莉娅>@time-devect（x，y，a，b）
运行时间：1.8059e-5秒（分配376字节）
2x2数组{Float64,2}：
0.978099  0.0405639
0.94757   0.0224974

可能有更快的方法来执行devect解决方案，但请注意分配的字节数的差异。如果一个devectorized解决方案比一个矢量化解决方案分配更多的内存，那么这可能是错误的（至少在Julia中是错误的）。

您的代码在任何地方都使用非常量全局变量，这意味着您基本上回到了解释语言的性能领域，因为在编译时无法保证它们的类型。为了快速加速，只需在所有全局变量赋值之前加上

常量
即可。您开发的代码效率不高新界
我做了以下修改：

使所有全局变量保持不变
我预先分配了输出向量，而不是每次追加
我展示了两种不同的方法，您可以用一种更直接的方法来开发输出
const x = rand(500)
const y = rand(500)
const a = rand()
const b = rand()

function devect()
    dist = Array(Float64, 500)

    for i in 1:500
        dist[i] = sqrt((x[i] - a)^2 + (y[i] - b)^2)
    end

    return [x y][dist .< 0.05,:]
end

function devect2()
    pairs = Array(Float64, 500, 2)

    for i in 1:500
        dist = sqrt((x[i] - a)^2 + (y[i] - b)^2)
        if dist < 0.05
            pairs[i,:] = [x[i], y[i]]
        end
    end

    return pairs
end

function vect()
    d = sqrt((x-a).^2 + (y-b).^2)
    return [x y][d .< 0.05,:]
end

using DataFrames

function df_vect()
    df = DataFrame(x=x, y=y)
    dist = sqrt((df[:x]-a).^2 + (df[:y]-b).^2)

    return df[dist .< 0.05,:]
end

const n = 10^3

@time for i in [1:n] devect() end
@time for i in [1:n] devect2() end
@time for i in [1:n] vect() end
@time for i in [1:n] df_vect() end

这可能很有用：这可能是一个全局作用域问题。请尝试重写函数，使其使用局部变量而不是全局变量。这通常会影响计时。@ptb我最初这样做（在每个函数中声明了a、b、x和y）。没有帮助。使用inbounds和simd不会加快devect（）更详细。更仔细地看一下devect（）中的代码，速度可能会大大减慢。它们从零元素数组开始，然后在每次迭代时重新分配（即重新分配）到新数组。谢谢！我从函数中删除了a=[a；b]方法，我的devect（）现在的运行速度比vect（）快几倍。顺便问一下，有没有办法将一个向量推到另一个向量的末尾？append！函数可能就是您要找的。实际上，我不知道您的函数如何只分配了376个字节。我复制粘贴了您的函数，得到的结果是：经过的时间：0.012100854秒（分配了197856个字节）我只对一次执行计时，但在您的代码中，您累计了1000次计时。输入数组也是随机的，因此我不会预期相同的结果。好吧，我运行了几次（每次n=1），大多数情况下，它会给出一些小的字节数，就像您的示例中一样，但偶尔会给出新开发的字节数（）函数上升到比vect（）函数更高的位置。
const x = rand(500)
const y = rand(500)
const a = rand()
const b = rand()

function devect()
    dist = Array(Float64, 500)

    for i in 1:500
        dist[i] = sqrt((x[i] - a)^2 + (y[i] - b)^2)
    end

    return [x y][dist .< 0.05,:]
end

function devect2()
    pairs = Array(Float64, 500, 2)

    for i in 1:500
        dist = sqrt((x[i] - a)^2 + (y[i] - b)^2)
        if dist < 0.05
            pairs[i,:] = [x[i], y[i]]
        end
    end

    return pairs
end

function vect()
    d = sqrt((x-a).^2 + (y-b).^2)
    return [x y][d .< 0.05,:]
end

using DataFrames

function df_vect()
    df = DataFrame(x=x, y=y)
    dist = sqrt((df[:x]-a).^2 + (df[:y]-b).^2)

    return df[dist .< 0.05,:]
end

const n = 10^3

@time for i in [1:n] devect() end
@time for i in [1:n] devect2() end
@time for i in [1:n] vect() end
@time for i in [1:n] df_vect() end

elapsed time: 0.009283872 seconds (16760064 bytes allocated)
elapsed time: 0.003116157 seconds (8456064 bytes allocated)
elapsed time: 0.050070483 seconds (37248064 bytes allocated, 44.50% gc time)
elapsed time: 0.0566218 seconds (30432064 bytes allocated, 40.35% gc time)