在Clojure上使用自定义比较器对基元数组进行排序
我想使用自定义比较器对基本Java数组进行排序,但我得到了一个类型错误。我认为在Clojure上使用自定义比较器对基元数组进行排序,clojure,Clojure,我想使用自定义比较器对基本Java数组进行排序,但我得到了一个类型错误。我认为比较器函数正在创建一个比较器,而不是比较器,但我不知道如何解决这个问题 下面是一个简单的例子: x.core=> (def x (double-array [4 3 5 6 7])) #'x.core/x x.core=> (java.util.Arrays/sort x (comparator #(> %1 %2))) ClassCastException [D cannot be cast to
比较器
函数正在创建一个比较器
,而不是比较器
,但我不知道如何解决这个问题
下面是一个简单的例子:
x.core=> (def x (double-array [4 3 5 6 7]))
#'x.core/x
x.core=> (java.util.Arrays/sort x (comparator #(> %1 %2)))
ClassCastException [D cannot be cast to [Ljava.lang.Object; x.core/eval1524 (form-init5588058267991397340.clj:1)
我曾尝试向comparator函数添加不同类型的提示,但坦率地说,我对该语言还比较陌生,基本上只是在掷镖
我特意简化了上面的示例,以关注关键问题,这是一个类型错误。在下面的部分中,我尝试给出一些更详细的信息来激发这个问题,并演示我为什么使用自定义比较器
动机
我想做的是复制R的order
函数,其工作原理如下:
> x = c(7, 2, 5, 3, 1, 4, 6)
> order(x)
[1] 5 2 4 6 3 7 1
> x[order(x)]
[1] 1 2 3 4 5 6 7
如您所见,它返回将对其输入向量进行排序的索引排列
以下是Clojure中的工作解决方案:
(defn order
"Permutation of indices sorted by x"
[x]
(let [v (vec x)]
(sort-by #(v %) (range (count v)))))
x.core=> (order [7 2 5 3 1 4 6])
(4 1 3 5 2 6 0)
(请注意,R是1-索引的,而Clojure是0-索引的。)诀窍是按向量x本身对一个向量(即x[0,1,…(count x)]
的索引进行排序
R与Clojure的性能对比
不幸的是,我对该解决方案的性能感到困扰。R解决方案的速度要快得多:
相应的Clojure代码:
x.core=> (def x (repeatedly 1000000 rand))
#'x.core/x
x.core=> (time (def y (order x)))
"Elapsed time: 2857.216452 msecs"
#'x.core/y
原语数组解决方案?
我发现原始数组的排序时间往往与R相当:
> x = runif(1000000)
> system.time({ y = sort(x) })
user system elapsed
0.061 0.005 0.069
vs
这就是我尝试在java.util.Arrays类中使用自定义比较器的动机
我应该补充一点,我可以使用自定义比较器和ArrayList,如下所示,但性能并不比我的起始函数好:
(defn order2
[x]
(let [v (vec x)
compx (comparator (fn [i j] (< (v i) (v j))))
ix (java.util.ArrayList. (range (count v)))]
(java.util.Collections/sort ix compx)
(vec ix)))
不幸的是,它太慢了。我想原语可能根本不是答案。这里有一个使用随机枢轴快速排序的
顺序
函数的Clojure实现。它相当接近于R:使用一百万倍的基准测试,我得到的计时大多在520-530毫秒范围内,而R通常在520-530毫秒范围内在这里停留500毫秒左右
更新:使用非常基本的两线程版本(2x快速排序,然后是生成输出向量的合并步骤)我的计时得到了明显的改进–最差的基准平均值是415毫秒,否则我会得到325-365毫秒范围内的结果。请参阅本消息末尾的双线程版本,或者如果您更喜欢gist格式的任何一个版本,请参见-,
请注意,作为中间步骤,它将其输入倒入一个双精度数组,并最终返回一个long向量。在我的框中,将一百万双精度倒入一个向量似乎只需要30毫秒,因此,如果您对数组结果满意,可以停止该步骤
主要的复杂问题是invokePrim
——从Clojure 1.9.0-RC1开始,在该位置进行常规函数调用会导致装箱。其他方法也是可能的,但这是可行的,而且似乎足够简单
有关一些基准测试结果,请参阅本消息的结尾。第一次运行的较低分位数结果实际上是报告的最佳结果
(defn order2 [xs]
(let [rnd (java.util.Random.)
a1 (double-array xs)
a2 (long-array (alength a1))]
(dotimes [i (alength a2)]
(aset a2 i i))
(letfn [(quicksort [^long l ^long h]
(if (< l h)
(let [p (.invokePrim ^clojure.lang.IFn$LLL partition l h)]
(quicksort l (dec p))
(quicksort (inc p) h))))
(partition ^long [^long l ^long h]
(let [pidx (+ l (.nextInt rnd (- h l)))
pivot (aget a1 pidx)]
(swap1 a1 pidx h)
(swap2 a2 pidx h)
(loop [i (dec l)
j l]
(if (< j h)
(if (< (aget a1 j) pivot)
(let [i (inc i)]
(swap1 a1 i j)
(swap2 a2 i j)
(recur i (inc j)))
(recur i (inc j)))
(let [i (inc i)]
(when (< (aget a1 h) (aget a1 i))
(swap1 a1 i h)
(swap2 a2 i h))
i)))))
(swap1 [^doubles a ^long i ^long j]
(let [tmp (aget a i)]
(aset a i (aget a j))
(aset a j tmp)))
(swap2 [^longs a ^long i ^long j]
(let [tmp (aget a i)]
(aset a i (aget a j))
(aset a j tmp)))]
(quicksort 0 (dec (alength a1)))
(vec a2))))
用于比较的同一框中的一些R计时:
> system.time({ y = order(x) })
user system elapsed
0.512 0.004 0.514
> system.time({ y = order(x) })
user system elapsed
0.496 0.000 0.496
> system.time({ y = order(x) })
user system elapsed
0.508 0.000 0.510
> system.time({ y = order(x) })
user system elapsed
0.508 0.000 0.513
> system.time({ y = order(x) })
user system elapsed
0.496 0.000 0.499
> system.time({ y = order(x) })
user system elapsed
0.500 0.000 0.502
更新:双线程Clojure版本:
(defn order3 [xs]
(let [rnd (java.util.Random.)
a1 (double-array xs)
a2 (long-array (alength a1))]
(dotimes [i (alength a2)]
(aset a2 i i))
(letfn [(quicksort [^long l ^long h]
(if (< l h)
(let [p (.invokePrim ^clojure.lang.IFn$LLL partition l h)]
(quicksort l (dec p))
(quicksort (inc p) h))))
(partition ^long [^long l ^long h]
(let [pidx (+ l (.nextInt rnd (- h l)))
pivot (aget a1 pidx)]
(swap1 a1 pidx h)
(swap2 a2 pidx h)
(loop [i (dec l)
j l]
(if (< j h)
(if (< (aget a1 j) pivot)
(let [i (inc i)]
(swap1 a1 i j)
(swap2 a2 i j)
(recur i (inc j)))
(recur i (inc j)))
(let [i (inc i)]
(when (< (aget a1 h) (aget a1 i))
(swap1 a1 i h)
(swap2 a2 i h))
i)))))
(swap1 [^doubles a ^long i ^long j]
(let [tmp (aget a i)]
(aset a i (aget a j))
(aset a j tmp)))
(swap2 [^longs a ^long i ^long j]
(let [tmp (aget a i)]
(aset a i (aget a j))
(aset a j tmp)))]
(let [lim (alength a1)
mid (quot lim 2)
f1 (future (quicksort 0 (dec mid)))
f2 (future (quicksort mid (dec lim)))]
@f1
@f2
(loop [out (transient [])
i 0
j mid]
(cond
(== i mid)
(persistent!
(if (== j lim)
out
(reduce (fn [out j]
(conj! out (aget a2 j)))
out
(range j lim))))
(== j lim)
(persistent!
(reduce (fn [out i]
(conj! out (aget a2 i)))
out
(range i mid)))
:else
(let [ie (aget a1 i)
je (aget a1 j)]
(if (< ie je)
(recur (conj! out (aget a2 i)) (inc i) j)
(recur (conj! out (aget a2 j)) i (inc j))))))))))
我认为问题在于,
java.util.Arrays/sort
对于接受比较器的双数组似乎没有过载。唯一有过载的是对象
数组,这是它假设的意思。高度相关:如果您是排序原语,为什么要使用自定义比较器?唯一的选项是在ascend中排序如果需要,为什么不直接使用本机排序然后反转结果呢?这只是一个简单的例子。我实际上想对一个数组逐个排序,我只是希望原语上的java.util.Arrays排序方法比简单的解决方案更快。@break.eggshell请注意,在编辑的底部你写了“我想原语可能不是答案”。据我所知,任何使用java.util.array/sort
的解决方案都需要装箱。问题不在于基本数组的速度,而是在不需要装箱的情况下实际使用它们的局限性。如果只想使用基本数组,您可能需要查看第三方库。不幸的是,我不能建议真是个图书馆。哇,太美了。很抱歉我反应太慢了。很明显你是个Clojure大师;-)
x.core=> (def x (double-array [5 3 1 3.14 -10]))
#'x.core/x
x.core=> (order x)
[4 2 1 3 0]
x.core=> (map #(aget x %) (order x))
(-10.0 1.0 3.0 3.14 5.0)
(defn order2 [xs]
(let [rnd (java.util.Random.)
a1 (double-array xs)
a2 (long-array (alength a1))]
(dotimes [i (alength a2)]
(aset a2 i i))
(letfn [(quicksort [^long l ^long h]
(if (< l h)
(let [p (.invokePrim ^clojure.lang.IFn$LLL partition l h)]
(quicksort l (dec p))
(quicksort (inc p) h))))
(partition ^long [^long l ^long h]
(let [pidx (+ l (.nextInt rnd (- h l)))
pivot (aget a1 pidx)]
(swap1 a1 pidx h)
(swap2 a2 pidx h)
(loop [i (dec l)
j l]
(if (< j h)
(if (< (aget a1 j) pivot)
(let [i (inc i)]
(swap1 a1 i j)
(swap2 a2 i j)
(recur i (inc j)))
(recur i (inc j)))
(let [i (inc i)]
(when (< (aget a1 h) (aget a1 i))
(swap1 a1 i h)
(swap2 a2 i h))
i)))))
(swap1 [^doubles a ^long i ^long j]
(let [tmp (aget a i)]
(aset a i (aget a j))
(aset a j tmp)))
(swap2 [^longs a ^long i ^long j]
(let [tmp (aget a i)]
(aset a i (aget a j))
(aset a j tmp)))]
(quicksort 0 (dec (alength a1)))
(vec a2))))
user> (c/bench (order2 x))
Evaluation count : 120 in 60 samples of 2 calls.
Execution time mean : 522.485408 ms
Execution time std-deviation : 33.490530 ms
Execution time lower quantile : 470.089782 ms ( 2.5%)
Execution time upper quantile : 575.687990 ms (97.5%)
Overhead used : 15.378363 ns
nil
user> (let [x (repeatedly 1000000 rand)]
(c/quick-bench (order2 x)))
Evaluation count : 6 in 6 samples of 1 calls.
Execution time mean : 527.020004 ms
Execution time std-deviation : 14.846061 ms
Execution time lower quantile : 507.175127 ms ( 2.5%)
Execution time upper quantile : 543.675752 ms (97.5%)
Overhead used : 15.378363 ns
nil
user> (let [x (repeatedly 1000000 rand)]
(c/quick-bench (order2 x)))
Evaluation count : 6 in 6 samples of 1 calls.
Execution time mean : 513.476501 ms
Execution time std-deviation : 12.828449 ms
Execution time lower quantile : 497.164534 ms ( 2.5%)
Execution time upper quantile : 525.094463 ms (97.5%)
Overhead used : 15.378363 ns
nil
user> (let [x (repeatedly 1000000 rand)]
(c/quick-bench (order2 x)))
Evaluation count : 6 in 6 samples of 1 calls.
Execution time mean : 529.826816 ms
Execution time std-deviation : 21.454522 ms
Execution time lower quantile : 508.547461 ms ( 2.5%)
Execution time upper quantile : 552.592925 ms (97.5%)
Overhead used : 15.378363 ns
nil
> system.time({ y = order(x) })
user system elapsed
0.512 0.004 0.514
> system.time({ y = order(x) })
user system elapsed
0.496 0.000 0.496
> system.time({ y = order(x) })
user system elapsed
0.508 0.000 0.510
> system.time({ y = order(x) })
user system elapsed
0.508 0.000 0.513
> system.time({ y = order(x) })
user system elapsed
0.496 0.000 0.499
> system.time({ y = order(x) })
user system elapsed
0.500 0.000 0.502
(defn order3 [xs]
(let [rnd (java.util.Random.)
a1 (double-array xs)
a2 (long-array (alength a1))]
(dotimes [i (alength a2)]
(aset a2 i i))
(letfn [(quicksort [^long l ^long h]
(if (< l h)
(let [p (.invokePrim ^clojure.lang.IFn$LLL partition l h)]
(quicksort l (dec p))
(quicksort (inc p) h))))
(partition ^long [^long l ^long h]
(let [pidx (+ l (.nextInt rnd (- h l)))
pivot (aget a1 pidx)]
(swap1 a1 pidx h)
(swap2 a2 pidx h)
(loop [i (dec l)
j l]
(if (< j h)
(if (< (aget a1 j) pivot)
(let [i (inc i)]
(swap1 a1 i j)
(swap2 a2 i j)
(recur i (inc j)))
(recur i (inc j)))
(let [i (inc i)]
(when (< (aget a1 h) (aget a1 i))
(swap1 a1 i h)
(swap2 a2 i h))
i)))))
(swap1 [^doubles a ^long i ^long j]
(let [tmp (aget a i)]
(aset a i (aget a j))
(aset a j tmp)))
(swap2 [^longs a ^long i ^long j]
(let [tmp (aget a i)]
(aset a i (aget a j))
(aset a j tmp)))]
(let [lim (alength a1)
mid (quot lim 2)
f1 (future (quicksort 0 (dec mid)))
f2 (future (quicksort mid (dec lim)))]
@f1
@f2
(loop [out (transient [])
i 0
j mid]
(cond
(== i mid)
(persistent!
(if (== j lim)
out
(reduce (fn [out j]
(conj! out (aget a2 j)))
out
(range j lim))))
(== j lim)
(persistent!
(reduce (fn [out i]
(conj! out (aget a2 i)))
out
(range i mid)))
:else
(let [ie (aget a1 i)
je (aget a1 j)]
(if (< ie je)
(recur (conj! out (aget a2 i)) (inc i) j)
(recur (conj! out (aget a2 j)) i (inc j))))))))))
user> (let [x (repeatedly 1000000 rand)]
(c/quick-bench (order3 x)))
Evaluation count : 6 in 6 samples of 1 calls.
Execution time mean : 325.351056 ms
Execution time std-deviation : 3.511578 ms
Execution time lower quantile : 321.947510 ms ( 2.5%)
Execution time upper quantile : 330.375038 ms (97.5%)
Overhead used : 15.378363 ns
nil
user> (let [x (repeatedly 1000000 rand)]
(c/quick-bench (order3 x)))
Evaluation count : 6 in 6 samples of 1 calls.
Execution time mean : 339.422989 ms
Execution time std-deviation : 19.929177 ms
Execution time lower quantile : 318.996436 ms ( 2.5%)
Execution time upper quantile : 366.113347 ms (97.5%)
Overhead used : 15.378363 ns
nil
user> (let [x (repeatedly 1000000 rand)]
(c/quick-bench (order3 x)))
Evaluation count : 6 in 6 samples of 1 calls.
Execution time mean : 415.171336 ms
Execution time std-deviation : 13.624262 ms
Execution time lower quantile : 393.242455 ms ( 2.5%)
Execution time upper quantile : 428.881001 ms (97.5%)
Overhead used : 15.378363 ns
Found 1 outliers in 6 samples (16.6667 %)
low-severe 1 (16.6667 %)
Variance from outliers : 13.8889 % Variance is moderately inflated by outliers
nil
user> (let [x (repeatedly 1000000 rand)]
(c/quick-bench (order3 x)))
Evaluation count : 6 in 6 samples of 1 calls.
Execution time mean : 324.547827 ms
Execution time std-deviation : 5.196817 ms
Execution time lower quantile : 318.541727 ms ( 2.5%)
Execution time upper quantile : 331.878289 ms (97.5%)
Overhead used : 15.378363 ns
nil
user> (c/bench (order3 x))
Evaluation count : 180 in 60 samples of 3 calls.
Execution time mean : 361.529793 ms
Execution time std-deviation : 45.285047 ms
Execution time lower quantile : 307.535934 ms ( 2.5%)
Execution time upper quantile : 446.679687 ms (97.5%)
Overhead used : 15.378363 ns
Found 1 outliers in 60 samples (1.6667 %)
low-severe 1 (1.6667 %)
Variance from outliers : 78.9377 % Variance is severely inflated by outliers
nil