Clojure-更有效';最小值为';功能和性能分析
我是Clojure的新手,两个月前我就开始学习这门语言了。我正在读《clojure的快乐》一书,我在函数式编程主题中找到了一个min by function。我在想,我已经按功能完成了我的min,在leat 10.000个项目中,它的性能至少提高了50%。下面是函数Clojure-更有效';最小值为';功能和性能分析,clojure,Clojure,我是Clojure的新手,两个月前我就开始学习这门语言了。我正在读《clojure的快乐》一书,我在函数式编程主题中找到了一个min by function。我在想,我已经按功能完成了我的min,在leat 10.000个项目中,它的性能至少提高了50%。下面是函数 ; the test vector with random data (def my-rand-vec (vec (take 10000 (repeatedly #(rand-int 10000))))) ; the joy of
; the test vector with random data
(def my-rand-vec (vec (take 10000 (repeatedly #(rand-int 10000)))))
; the joy of clojure min-by
(defn min-by-reduce [f coll]
(when (seq coll)
(reduce (fn [min other]
(if (> (f min) (f other))
other
min))
coll)))
(time (min-by-reduce eval my-rand-vec))
; my poor min-by
(defn min-by-sort [f coll]
(first (sort (map f coll))))
(time (min-by-sort eval my-rand-vec))
终端输出为
"Elapsed time: 91.657505 msecs"
"Elapsed time: 62.441513 msecs"
我的解决方案是否存在任何性能或资源缺陷?我真的很想从clojure大师那里得到更优雅的clojure解决方案来实现这个功能
; find (f min) by reduce
(defn min-by-reduce [f coll]
(when (seq coll)
(reduce (fn [min other]
(if (> (f min) (f other))
other
min))
coll)))
; find (f min) by sort-by
(defn min-by-sort [f coll]
(first (sort-by f coll)))
;a helper function to build a sequence of {:resource x, :priority y} maps
(defn my-rand-map [length]
(map #(hash-map :resource %1 :priority %2)
(take length (repeatedly #(rand-int 200)))
(take length (repeatedly #(rand-int 10)))))
; test with 100 items in the seq
(let [rand-map (my-rand-map 100)]
(time (min-by-reduce :resource rand-map))
(time (min-by-sort :resource rand-map)))
编辑
带有标准的更清晰的测试代码
(ns min-by.core
(:gen-class))
(use 'criterium.core)
(defn min-by-reduce [f coll]
(when (seq coll)
(reduce (fn [min other]
(if (> (f min) (f other))
other
min))
coll)))
(defn min-by-sort [f coll]
(first (sort-by f coll)))
(defn my-rand-map [length]
(map #(hash-map :resource %1 :priority %2)
(take length (repeatedly #(rand-int 200)))
(take length (repeatedly #(rand-int 10)))))
(defn -main
[& args]
(let [rand-map (my-rand-map 100000)]
(println "min-by-reduce-----------")
(quick-bench (min-by-reduce :resource rand-map))
(println "min-by-sort-------------")
(quick-bench (min-by-sort :resource rand-map))
(println "min-by-min-key----------")
(quick-bench (apply min-key :resource rand-map)))
)
终端输出为:
min-by-reduce-----------
Evaluation count : 60 in 6 samples of 10 calls.
Execution time mean : 11,366539 ms
Execution time std-deviation : 2,045752 ms
Execution time lower quantile : 9,690590 ms ( 2,5%)
Execution time upper quantile : 14,763746 ms (97,5%)
Overhead used : 3,292762 ns
Found 1 outliers in 6 samples (16,6667 %)
low-severe 1 (16,6667 %)
Variance from outliers : 47,9902 % Variance is moderately inflated by outliers
min-by-sort-------------
Evaluation count : 6 in 6 samples of 1 calls.
Execution time mean : 174,747463 ms
Execution time std-deviation : 18,431608 ms
Execution time lower quantile : 158,138543 ms ( 2,5%)
Execution time upper quantile : 203,420044 ms (97,5%)
Overhead used : 3,292762 ns
Found 1 outliers in 6 samples (16,6667 %)
low-severe 1 (16,6667 %)
Variance from outliers : 30,7324 % Variance is moderately inflated by outliers
min-by-min-key----------
Evaluation count : 36 in 6 samples of 6 calls.
Execution time mean : 17,405529 ms
Execution time std-deviation : 1,661902 ms
Execution time lower quantile : 15,962259 ms ( 2,5%)
Execution time upper quantile : 19,366893 ms (97,5%)
Overhead used : 3,292762 ns
我相信JoC只是想说明
reduce
的用法,仅此而已
我也把《乔克》作为我的第一本书来读,但我希望我能把它保存下来,直到我先看了更多的介绍性书籍。外面有很多好的。你甚至可以在网上阅读(大部分)Clojure for the Brave and True:我还建议你购买完整的硬拷贝版本
你还应该看看Clojure食谱:和以前一样,我也建议你购买完整的硬拷贝版本。首先,你的版本返回
(f min)
而不是min
,从理论上讲,最小值是线性的O(n)
操作,而排序和取第一个值是准线性的O(n log n)
。对于小向量,可能很难获得准确的计时结果,因此,时间复杂性不能保证准线性运算总是比线性运算慢
尝试使用1000000或更大的样本大小,并使用更复杂的键函数。例如,生成样本字符串并使用length
或类似的排序方式。这样您可以获得更真实的结果
提示:您可以使用identity
来“跳过”出于测试目的给出函数,而不是eval
。这可能不会对基准测试产生太大影响,只是为了让您了解该函数
; find (f min) by reduce
(defn min-by-reduce [f coll]
(when (seq coll)
(reduce (fn [min other]
(if (> (f min) (f other))
other
min))
coll)))
; find (f min) by sort-by
(defn min-by-sort [f coll]
(first (sort-by f coll)))
;a helper function to build a sequence of {:resource x, :priority y} maps
(defn my-rand-map [length]
(map #(hash-map :resource %1 :priority %2)
(take length (repeatedly #(rand-int 200)))
(take length (repeatedly #(rand-int 10)))))
; test with 100 items in the seq
(let [rand-map (my-rand-map 100)]
(time (min-by-reduce :resource rand-map))
(time (min-by-sort :resource rand-map)))
正如用户断章取义地指出的,
eval
是一个很大的瓶颈,它使基准偏离了错误的结论。我更改了min by sort
函数。整个画面都变了
(defn min-by-sort [f coll]
(first (sort-by f coll)))
用于10.000个项目的2个功能的终端输出为
"Elapsed time: 0.863016 msecs"
"Elapsed time: 11.44852 msecs"
那么,问题是,是否存在更好或更优雅的min by xxx函数,它根据函数在集合中查找最小值并返回原始值
最后用map和类似关键字的函数进行测试
; find (f min) by reduce
(defn min-by-reduce [f coll]
(when (seq coll)
(reduce (fn [min other]
(if (> (f min) (f other))
other
min))
coll)))
; find (f min) by sort-by
(defn min-by-sort [f coll]
(first (sort-by f coll)))
;a helper function to build a sequence of {:resource x, :priority y} maps
(defn my-rand-map [length]
(map #(hash-map :resource %1 :priority %2)
(take length (repeatedly #(rand-int 200)))
(take length (repeatedly #(rand-int 10)))))
; test with 100 items in the seq
(let [rand-map (my-rand-map 100)]
(time (min-by-reduce :resource rand-map))
(time (min-by-sort :resource rand-map)))
测试100项“运行时间:0.245403毫秒”
“运行时间:0.18094毫秒”
测试1000项“运行时间:2.653952毫秒”
“运行时间:3.214373毫秒”
测试10000个项目“运行时间:14.275679毫秒”
“运行时间:38.064996毫秒”
我认为,区别当然是按项目的顺序排序,但减少只是遍历项目并累积实际的最小值。这是真的吗?标准只需要稍微调整一下:
(defn min-by [f coll]
(when (seq coll)
(apply min-key f coll)))
如果你看一下,这本质上与JoC的
minbyreduce
eval
是一个可怕的选择,如果你想尝试它的话。你的整个运行时都被它所支配。你变慢的原因是你在reduce版本中调用eval
2*N次,而在sort
版本中只调用N次谢谢!我已将eval
更改为identity
,结果已完全更改。100.000项的终端输出为“运行时间:8.234689毫秒”“运行时间:131.30328毫秒”
谢谢!我同意。我买了Brave Clojure,我先读了它。对于初学者来说,这是一本介绍FP和Clojure的非常好的书。JoC要高一两级。谢谢!很棒的地方!我错过了这个返回值错误。:)谢谢<代码>(应用最小键:资源兰德地图)
赢了!对于10.000个项目,是“经过的时间:6.270974毫秒”是min key
似乎非常接近于此,但似乎比JoCmin by reduce快约4-5倍。为什么?@tkircsi天真的Clojure计时是出了名的不可靠。努力获得可靠的结果。让我知道你发现了什么:)。谢谢!我会试试看,然后带着结果回来。好的学习经验。:)