Performance clojure lazy seq性能优化
我是Clojure的新手,我有一些代码正在尝试优化。我要计算并发计数。 主要功能是计算空间,输出是该类型的嵌套映射Performance clojure lazy seq性能优化,performance,clojure,lazy-sequences,Performance,Clojure,Lazy Sequences,我是Clojure的新手,我有一些代码正在尝试优化。我要计算并发计数。 主要功能是计算空间,输出是该类型的嵌套映射 {"w1" {"w11" 10, "w12" 31, ...} "w2" {"w21" 14, "w22" 1, ...} ... } 这意味着“w1”与“w11”共出现10次,以此类推 它需要一个coll文档(句子)和一个coll目标词,它对两者进行迭代,最后应用上下文fn(如滑动窗口)来提取上下文词。更具体地说,我是通过一个关闭滑动窗口 我已经用大约5000万个单词
{"w1" {"w11" 10, "w12" 31, ...}
"w2" {"w21" 14, "w22" 1, ...}
...
}
这意味着“w1”与“w11”共出现10次,以此类推
它需要一个coll文档(句子)和一个coll目标词,它对两者进行迭代,最后应用上下文fn(如滑动窗口)来提取上下文词。更具体地说,我是通过一个关闭滑动窗口
我已经用大约5000万个单词(约300万个句子)和大约20000个目标测试了它。这个版本需要一天多的时间才能完成。我还编写了一个pmap并行函数(pcompute space),可以将计算时间减少到10小时左右,但我仍然觉得应该更快。我没有其他代码可以比较,但直觉告诉我应该更快
(defn compute-space
([docs context-fn targets]
(let [space (atom {})]
(doseq [doc docs
target targets]
(when-let [contexts (context-fn target doc)]
(doseq [w contexts]
(if (get-in @space [target w])
(swap! space update-in [target w] (partial inc))
(swap! space assoc-in [target w] 1)))))
@space)))
(defn sliding-window
[target s n]
(loop [todo s seen [] acc []]
(let [curr (first todo)]
(cond (= curr target) (recur (rest todo) (cons curr seen) (concat acc (take n seen) (take n (rest todo))))
(empty? todo) acc
:else (recur (rest todo) (cons curr seen) acc)))))
(defn pcompute-space
[docs step context-fn targets]
(reduce
#(deep-merge-with + %1 %2)
(pmap
(fn [chunk]
(do (tick))
(compute-space chunk context-fn targets))
(partition-all step docs)))
我用jvisualvm分析了这个应用程序,发现clojure.lang.Cons、clojure.lang.ChunkedCons和clojure.lang.ArrayChunk在这个过程中占据了非常重要的地位(见图)。这肯定与我正在使用这个双doseq循环有关(之前的实验表明,这种方法比使用map、reduce等方法更快,尽管我在使用时间对函数进行基准测试)。
我非常感谢您能为我提供的任何见解,以及重构代码并使其运行更快的建议。我想减速机在这方面可能会有所帮助,但我不确定如何和/或为什么
规格
MacPro 2010 2,4 GHz Intel Core 2 Duo 4 GB RAM
Clojure 1.6.0
Java 1.7.051Java热点(TM)64位服务器虚拟机
试验数据为:
- 由42个字符串(目标)组成的惰性序列
- 105040个懒集的懒序列。(文件)
- 文档中的每个惰性seq都是一个惰性字符串序列。文档中包含的字符串总数为1146190
计算空间
花费了22秒:
WARNING: JVM argument TieredStopAtLevel=1 is active, and may lead to unexpected results as JIT C2 compiler may not be active. See http://www.slideshare.net/CharlesNutter/javaone-2012-jvm-jit-for-dummies.
Evaluation count : 60 in 60 samples of 1 calls.
Execution time mean : 21.989189 sec
Execution time std-deviation : 471.199127 ms
Execution time lower quantile : 21.540155 sec ( 2.5%)
Execution time upper quantile : 23.226352 sec (97.5%)
Overhead used : 13.353852 ns
Found 2 outliers in 60 samples (3.3333 %)
low-severe 2 (3.3333 %)
Variance from outliers : 9.4329 % Variance is slightly inflated by outliers
第一次优化更新为使用频率
从单词向量到单词映射及其出现次数计数
为了帮助我理解计算的结构,我编写了一个单独的函数,它接受一组文档,context fn
和一个目标,并将上下文单词的映射返回到counts。compute space
返回的一个目标的内部映射。使用内置的Clojure函数编写,而不是更新计数
(defn compute-context-map-f [documents context-fn target]
(frequencies (mapcat #(context-fn target %) documents)))
使用compute-context-map-f
编写的compute-space
,此处命名为compute-space-f
,相当简短:
(defn compute-space-f [docs context-fn targets]
(into {} (map #(vector % (compute-context-map-f docs context-fn %)) targets)))
与上述数据相同的定时为原始版本的65%:
WARNING: JVM argument TieredStopAtLevel=1 is active, and may lead to unexpected results as JIT C2 compiler may not be active. See http://www.slideshare.net/CharlesNutter/javaone-2012-jvm-jit-for-dummies.
Evaluation count : 60 in 60 samples of 1 calls.
Execution time mean : 14.274344 sec
Execution time std-deviation : 345.240183 ms
Execution time lower quantile : 13.981537 sec ( 2.5%)
Execution time upper quantile : 15.088521 sec (97.5%)
Overhead used : 13.353852 ns
Found 3 outliers in 60 samples (5.0000 %)
low-severe 1 (1.6667 %)
low-mild 2 (3.3333 %)
Variance from outliers : 12.5419 % Variance is moderately inflated by outliers
user> (criterium.core/bench (pcompute-space-f documents 4 #(sliding-window %1 %2 5) keywords))
WARNING: JVM argument TieredStopAtLevel=1 is active, and may lead to unexpected results as JIT C2 compiler may not be active. See http://www.slideshare.net/CharlesNutter/javaone-2012-jvm-jit-for-dummies.
Evaluation count : 60 in 60 samples of 1 calls.
Execution time mean : 3.623018 sec
Execution time std-deviation : 83.780996 ms
Execution time lower quantile : 3.486419 sec ( 2.5%)
Execution time upper quantile : 3.788714 sec (97.5%)
Overhead used : 13.353852 ns
Found 1 outliers in 60 samples (1.6667 %)
low-severe 1 (1.6667 %)
Variance from outliers : 11.0038 % Variance is moderately inflated by outliers
并行化第一次优化
我选择按目标而不是文档进行分块,这样将映射合并在一起就不需要修改目标的{context word count,…}
映射
(defn pcompute-space-f [docs step context-fn targets]
(into {} (pmap #(compute-space-f docs context-fn %) (partition-all step targets))))
与上述数据相同的定时为原始版本的16%:
WARNING: JVM argument TieredStopAtLevel=1 is active, and may lead to unexpected results as JIT C2 compiler may not be active. See http://www.slideshare.net/CharlesNutter/javaone-2012-jvm-jit-for-dummies.
Evaluation count : 60 in 60 samples of 1 calls.
Execution time mean : 14.274344 sec
Execution time std-deviation : 345.240183 ms
Execution time lower quantile : 13.981537 sec ( 2.5%)
Execution time upper quantile : 15.088521 sec (97.5%)
Overhead used : 13.353852 ns
Found 3 outliers in 60 samples (5.0000 %)
low-severe 1 (1.6667 %)
low-mild 2 (3.3333 %)
Variance from outliers : 12.5419 % Variance is moderately inflated by outliers
user> (criterium.core/bench (pcompute-space-f documents 4 #(sliding-window %1 %2 5) keywords))
WARNING: JVM argument TieredStopAtLevel=1 is active, and may lead to unexpected results as JIT C2 compiler may not be active. See http://www.slideshare.net/CharlesNutter/javaone-2012-jvm-jit-for-dummies.
Evaluation count : 60 in 60 samples of 1 calls.
Execution time mean : 3.623018 sec
Execution time std-deviation : 83.780996 ms
Execution time lower quantile : 3.486419 sec ( 2.5%)
Execution time upper quantile : 3.788714 sec (97.5%)
Overhead used : 13.353852 ns
Found 1 outliers in 60 samples (1.6667 %)
low-severe 1 (1.6667 %)
Variance from outliers : 11.0038 % Variance is moderately inflated by outliers
规格
- Mac Pro 2009 2.66 GHz四核Intel Xeon,48 GB RAM李>
- Clojure 1.6.0李>
- Java 1.8.0(40 Java热点(TM)64位服务器虚拟机
描述测试数据。分析问题中的
计算空间
算法
扫描句子的成本——寻找目标-
- 与单词总数成正比
- 与目标数量成正比,但
- 与单词所划分的句子数量无关李>
- 与目标命中率成正比
- 与上下文中不同单词的数量成正比李>
上下文fn
扫描句子,寻找目标。如果有一万个目标,它会扫描句子一万次
最好扫描一次句子,寻找所有目标。如果我们将目标保持为(散列)集,那么无论有多少个目标,我们都可以在或多或少的恒定时间内测试一个单词是否是目标
可能的改进
滑动窗口
函数通过将每个单词从todo
传递到seen
来生成上下文。将单词注入向量,然后将上下文作为subvec
s返回,可能会更快
无论如何,组织生成上下文的一种简单方法是让上下文fn
返回与单词序列对应的上下文序列。对滑动窗口执行此操作的函数是
(defn sliding-windows [w s]
(let [v (vec s), n (count v)
window (fn [i] (lazy-cat (subvec v (max (- i w) 0) i)
(subvec v (inc i) (min (inc (+ i w)) n))))]
(map window (range n))))
现在,我们可以根据新的上下文fn
定义计算空间
函数,如下所示:
(defn compute-space [docs contexts-fn target?]
(letfn [(stuff [s] (->> (map vector s (contexts-fn s))
(filter (comp target? first))))]
(reduce
(fn [a [k m]] (assoc a k (merge-with + (a k) (frequencies m))))
{}
(mapcat stuff docs))))
代码以内容为中心
:
- 我们将
开发为stuff
对序列李>[目标上下文序列]
- 然后,我们将每对合并到聚合中,为每个目标事件添加相应的邻居计数李>
- 10万字的词汇量
- 一句10万字的话,以及
- 10000个目标