Performance clojure lazy seq性能优化_Performance_Clojure_Lazy Sequences

Performance clojure lazy seq性能优化

performance clojure

Performance clojure lazy seq性能优化,performance,clojure,lazy-sequences,Performance,Clojure,Lazy Sequences,我是Clojure的新手，我有一些代码正在尝试优化。我要计算并发计数。主要功能是计算空间，输出是该类型的嵌套映射 {"w1" {"w11" 10, "w12" 31, ...} "w2" {"w21" 14, "w22" 1, ...} ... } 这意味着“w1”与“w11”共出现10次，以此类推它需要一个coll文档（句子）和一个coll目标词，它对两者进行迭代，最后应用上下文fn（如滑动窗口）来提取上下文词。更具体地说，我是通过一个关闭滑动窗口我已经用大约5000万个单词

我是Clojure的新手，我有一些代码正在尝试优化。我要计算并发计数。主要功能是计算空间，输出是该类型的嵌套映射

{"w1" {"w11" 10, "w12" 31, ...}
 "w2" {"w21" 14, "w22" 1,  ...}
 ... 
 }

这意味着“w1”与“w11”共出现10次，以此类推

它需要一个coll文档（句子）和一个coll目标词，它对两者进行迭代，最后应用上下文fn（如滑动窗口）来提取上下文词。更具体地说，我是通过一个关闭滑动窗口

我已经用大约5000万个单词（约300万个句子）和大约20000个目标测试了它。这个版本需要一天多的时间才能完成。我还编写了一个pmap并行函数（pcompute space），可以将计算时间减少到10小时左右，但我仍然觉得应该更快。我没有其他代码可以比较，但直觉告诉我应该更快

(defn compute-space 
  ([docs context-fn targets]
    (let [space (atom {})]
      (doseq [doc docs
              target targets]
        (when-let [contexts (context-fn target doc)]
          (doseq [w contexts]
            (if (get-in @space [target w])
              (swap! space update-in [target w] (partial inc))
              (swap! space assoc-in  [target w] 1)))))
     @space)))

(defn sliding-window
  [target s n]
  (loop [todo s seen [] acc []]
    (let [curr (first todo)]
      (cond (= curr target) (recur (rest todo) (cons curr seen) (concat acc (take n seen) (take n (rest todo))))
            (empty? todo) acc
            :else (recur (rest todo) (cons curr seen) acc)))))


(defn pcompute-space
  [docs step context-fn targets]
  (reduce
     #(deep-merge-with + %1 %2)
      (pmap
        (fn [chunk]
          (do (tick))
          (compute-space chunk context-fn targets))
        (partition-all step docs)))

我用jvisualvm分析了这个应用程序，发现clojure.lang.Cons、clojure.lang.ChunkedCons和clojure.lang.ArrayChunk在这个过程中占据了非常重要的地位（见图）。这肯定与我正在使用这个双doseq循环有关（之前的实验表明，这种方法比使用map、reduce等方法更快，尽管我在使用时间对函数进行基准测试）。我非常感谢您能为我提供的任何见解，以及重构代码并使其运行更快的建议。我想减速机在这方面可能会有所帮助，但我不确定如何和/或为什么

规格

MacPro 2010 2,4 GHz Intel Core 2 Duo 4 GB RAM

Clojure 1.6.0

Java 1.7.051Java热点（TM）64位服务器虚拟机

试验数据为：

由42个字符串（目标）组成的惰性序列
105040个懒集的懒序列。（文件）
文档中的每个惰性seq都是一个惰性字符串序列。文档中包含的字符串总数为1146190

比你的工作量小很多。用于收集时间。Criterium多次计算表达式，首先预热JIT，然后收集平均数据

使用我的测试数据和您的代码，

计算空间

花费了22秒：

WARNING: JVM argument TieredStopAtLevel=1 is active, and may lead to unexpected results as JIT C2 compiler may not be active. See http://www.slideshare.net/CharlesNutter/javaone-2012-jvm-jit-for-dummies.
Evaluation count : 60 in 60 samples of 1 calls.
             Execution time mean : 21.989189 sec
    Execution time std-deviation : 471.199127 ms
   Execution time lower quantile : 21.540155 sec ( 2.5%)
   Execution time upper quantile : 23.226352 sec (97.5%)
                   Overhead used : 13.353852 ns

Found 2 outliers in 60 samples (3.3333 %)
    low-severe   2 (3.3333 %)
 Variance from outliers : 9.4329 % Variance is slightly inflated by outliers

第一次优化更新为使用

频率

从单词向量到单词映射及其出现次数计数

为了帮助我理解计算的结构，我编写了一个单独的函数，它接受一组文档，

context fn

和一个目标，并将上下文单词的映射返回到counts。

compute space

返回的一个目标的内部映射。使用内置的Clojure函数编写，而不是更新计数

(defn compute-context-map-f [documents context-fn target]
  (frequencies (mapcat #(context-fn target %) documents)))

使用

compute-context-map-f

编写的

compute-space

，此处命名为

compute-space-f

，相当简短：

(defn compute-space-f [docs context-fn targets]
  (into {} (map #(vector % (compute-context-map-f docs context-fn %)) targets)))

与上述数据相同的定时为原始版本的65%：

WARNING: JVM argument TieredStopAtLevel=1 is active, and may lead to unexpected results as JIT C2 compiler may not be active. See http://www.slideshare.net/CharlesNutter/javaone-2012-jvm-jit-for-dummies.
Evaluation count : 60 in 60 samples of 1 calls.
             Execution time mean : 14.274344 sec
    Execution time std-deviation : 345.240183 ms
   Execution time lower quantile : 13.981537 sec ( 2.5%)
   Execution time upper quantile : 15.088521 sec (97.5%)
                   Overhead used : 13.353852 ns

Found 3 outliers in 60 samples (5.0000 %)
    low-severe   1 (1.6667 %)
    low-mild     2 (3.3333 %)
 Variance from outliers : 12.5419 % Variance is moderately inflated by outliers

user> (criterium.core/bench (pcompute-space-f documents 4 #(sliding-window %1 %2 5) keywords))
WARNING: JVM argument TieredStopAtLevel=1 is active, and may lead to unexpected results as JIT C2 compiler may not be active. See http://www.slideshare.net/CharlesNutter/javaone-2012-jvm-jit-for-dummies.
Evaluation count : 60 in 60 samples of 1 calls.
             Execution time mean : 3.623018 sec
    Execution time std-deviation : 83.780996 ms
   Execution time lower quantile : 3.486419 sec ( 2.5%)
   Execution time upper quantile : 3.788714 sec (97.5%)
                   Overhead used : 13.353852 ns

Found 1 outliers in 60 samples (1.6667 %)
    low-severe   1 (1.6667 %)
 Variance from outliers : 11.0038 % Variance is moderately inflated by outliers

并行化第一次优化

我选择按目标而不是文档进行分块，这样将映射合并在一起就不需要修改目标的

{context word count，…}

映射

(defn pcompute-space-f [docs step context-fn targets]
  (into {} (pmap #(compute-space-f docs context-fn %) (partition-all step targets))))

与上述数据相同的定时为原始版本的16%：

WARNING: JVM argument TieredStopAtLevel=1 is active, and may lead to unexpected results as JIT C2 compiler may not be active. See http://www.slideshare.net/CharlesNutter/javaone-2012-jvm-jit-for-dummies.
Evaluation count : 60 in 60 samples of 1 calls.
             Execution time mean : 14.274344 sec
    Execution time std-deviation : 345.240183 ms
   Execution time lower quantile : 13.981537 sec ( 2.5%)
   Execution time upper quantile : 15.088521 sec (97.5%)
                   Overhead used : 13.353852 ns

Found 3 outliers in 60 samples (5.0000 %)
    low-severe   1 (1.6667 %)
    low-mild     2 (3.3333 %)
 Variance from outliers : 12.5419 % Variance is moderately inflated by outliers

user> (criterium.core/bench (pcompute-space-f documents 4 #(sliding-window %1 %2 5) keywords))
WARNING: JVM argument TieredStopAtLevel=1 is active, and may lead to unexpected results as JIT C2 compiler may not be active. See http://www.slideshare.net/CharlesNutter/javaone-2012-jvm-jit-for-dummies.
Evaluation count : 60 in 60 samples of 1 calls.
             Execution time mean : 3.623018 sec
    Execution time std-deviation : 83.780996 ms
   Execution time lower quantile : 3.486419 sec ( 2.5%)
   Execution time upper quantile : 3.788714 sec (97.5%)
                   Overhead used : 13.353852 ns

Found 1 outliers in 60 samples (1.6667 %)
    low-severe   1 (1.6667 %)
 Variance from outliers : 11.0038 % Variance is moderately inflated by outliers

规格

Mac Pro 2009 2.66 GHz四核Intel Xeon，48 GB RAM
Clojure 1.6.0
Java 1.8.0(40 Java热点（TM）64位服务器虚拟机

待定

进一步优化

描述测试数据。

分析问题中的
计算空间
算法

扫描句子的成本——寻找目标-

与单词总数成正比
与目标数量成正比，但
与单词所划分的句子数量无关

处理目标的成本

与目标命中率成正比
与上下文中不同单词的数量成正比

重大改进

上下文fn

扫描句子，寻找目标。如果有一万个目标，它会扫描句子一万次

最好扫描一次句子，寻找所有目标。如果我们将目标保持为（散列）集，那么无论有多少个目标，我们都可以在或多或少的恒定时间内测试一个单词是否是目标

可能的改进

滑动窗口

函数通过将每个单词从

todo

传递到

seen

来生成上下文。将单词注入向量，然后将上下文作为

subvec

s返回，可能会更快

无论如何，组织生成上下文的一种简单方法是让

上下文fn

返回与单词序列对应的上下文序列。对滑动窗口执行此操作的函数是

(defn sliding-windows [w s]
  (let [v (vec s), n (count v)
        window (fn [i] (lazy-cat (subvec v (max (- i w) 0) i)
                                 (subvec v (inc i) (min (inc (+ i w)) n))))]
    (map window (range n))))

现在，我们可以根据新的

上下文fn

定义

计算空间

函数，如下所示：

(defn compute-space [docs contexts-fn target?]
  (letfn [(stuff [s] (->> (map vector s (contexts-fn s))
                          (filter (comp target? first))))]
    (reduce
     (fn [a [k m]] (assoc a k (merge-with + (a k) (frequencies m))))
     {}
     (mapcat stuff docs))))

代码以

内容为中心

：

我们将
```
stuff
```
开发为
```
[目标上下文序列]
```
对序列
然后，我们将每对合并到聚合中，为每个目标事件添加相应的邻居计数

结果

这个算法比问题中的算法快大约500倍：问题中的代码在一天半内完成了什么，这应该在大约一分钟内完成

给定

10万字的词汇量
一句10万字的话，以及
10000个目标

此代码在100毫秒内构造上下文映射

对于一个十分之一长的句子-10