Julia 矩阵列上的内存有效排序项_Julia

Julia 矩阵列上的内存有效排序项

julia

Julia 矩阵列上的内存有效排序项,julia,Julia,我有一个很大的ish矩阵，我想将sortperm应用于该矩阵的每一列。天真的做法是 order = sortperm(X[:,j]) 这是一个副本。这似乎是一个耻辱，所以我想我应该尝试一个子阵列： order = sortperm(sub(X,1:n,j)) 但这甚至更慢。我想笑一笑 order = sortperm(1:n,by=i->X[i,j]) 但那当然很可怕。最快的方法是什么以下是一些基准代码： getperm1(X,n,j) = sortperm(X[:,j]) get

我有一个很大的ish矩阵，我想将

sortperm

应用于该矩阵的每一列。天真的做法是

order = sortperm(X[:,j])

这是一个副本。这似乎是一个耻辱，所以我想我应该尝试一个

子阵列

：

order = sortperm(sub(X,1:n,j))

但这甚至更慢。我想笑一笑

order = sortperm(1:n,by=i->X[i,j])

但那当然很可怕。最快的方法是什么

以下是一些基准代码：

getperm1(X,n,j) = sortperm(X[:,j])
getperm2(X,n,j) = sortperm(sub(X,1:n,j))
getperm3(X,n) = mapslices(sortperm, X, 1)
n = 1000000
X = rand(n, 10)
for f in [getperm1, getperm2]
    println(f)
    for it in 1:5
        gc()
        @time f(X,n,5)
    end
end
for f in [getperm3]
    println(f)
    for it in 1:5
        gc()
        @time getperm3(X,n)
    end
end

结果:

getperm1
elapsed time: 0.258576164 seconds (23247944 bytes allocated)
elapsed time: 0.141448346 seconds (16000208 bytes allocated)
elapsed time: 0.137306078 seconds (16000208 bytes allocated)
elapsed time: 0.137385171 seconds (16000208 bytes allocated)
elapsed time: 0.139137529 seconds (16000208 bytes allocated)
getperm2
elapsed time: 0.433251141 seconds (11832620 bytes allocated)
elapsed time: 0.33970986 seconds (8000624 bytes allocated)
elapsed time: 0.339840795 seconds (8000624 bytes allocated)
elapsed time: 0.342436716 seconds (8000624 bytes allocated)
elapsed time: 0.342867431 seconds (8000624 bytes allocated)
getperm3
elapsed time: 1.766020534 seconds (257397404 bytes allocated, 1.55% gc time)
elapsed time: 1.43763525 seconds (240007488 bytes allocated, 1.85% gc time)
elapsed time: 1.41373546 seconds (240007488 bytes allocated, 1.82% gc time)
elapsed time: 1.42215519 seconds (240007488 bytes allocated, 1.83% gc time)
elapsed time: 1.419174037 seconds (240007488 bytes allocated, 1.83% gc time)

其中

mapsicles

版本是

getperm1

版本的10倍，正如您所期望的那样

值得指出的是，至少在我的机器上，copy+sortperm选项并不比相同长度的向量上的sortperm慢多少，但是不需要内存分配，所以最好避免它。

在一些非常特殊的情况下，您可以击败子阵列的性能（如连续查看

数组

）使用指针魔术：

function colview(X::Matrix,j::Int)
    n = size(X,1)
    offset = 1+n*(j-1) # The linear start position
    checkbounds(X, offset+n-1)
    pointer_to_array(pointer(X, offset), (n,))
end

getperm4(X,n,j) = sortperm(colview(X,j))

函数

colview

将返回一个完整的

数组

，该数组与原始

共享其数据。请注意，这是一个糟糕的想法，因为返回的数组引用的是Julia仅通过

跟踪的数据。这意味着如果

超出范围在列“查看”之前，数据访问将因segfault而崩溃

结果如下：

getperm1
elapsed time: 0.317923176 seconds (15 MB allocated)
elapsed time: 0.252215996 seconds (15 MB allocated)
elapsed time: 0.215124686 seconds (15 MB allocated)
elapsed time: 0.210062109 seconds (15 MB allocated)
elapsed time: 0.213339974 seconds (15 MB allocated)
getperm2
elapsed time: 0.509172302 seconds (7 MB allocated)
elapsed time: 0.509961218 seconds (7 MB allocated)
elapsed time: 0.506399583 seconds (7 MB allocated)
elapsed time: 0.512562736 seconds (7 MB allocated)
elapsed time: 0.506199265 seconds (7 MB allocated)
getperm4
elapsed time: 0.225968056 seconds (7 MB allocated)
elapsed time: 0.220587707 seconds (7 MB allocated)
elapsed time: 0.219854355 seconds (7 MB allocated)
elapsed time: 0.226289377 seconds (7 MB allocated)
elapsed time: 0.220391515 seconds (7 MB allocated)

我没有探究为什么子阵列的性能更差，但这可能只是因为每次访问内存时都会出现额外的指针解引用。非常值得注意的是，在时间方面，分配实际花费的成本是多么的少——getperm1的计时更加多变，但它仍然偶尔优于getperm4！我认为这是由于一些

Array

内部实现中的tra指针数学使用共享数据。还有一些疯狂的缓存行为…getperm1在重复运行时会显著加快。

mapslices（sortperm，X，1）如何执行？到目前为止您尝试的

sortperm

有

@时间吗？