哈希表插入中的Haskell空间泄漏

哈希表插入中的Haskell空间泄漏,haskell,memory-leaks,Haskell,Memory Leaks,我一直在编写直方图,在这方面我得到了很大的帮助。我一直在使用哈希表对直方图进行编码,以存储键和频率值,因为键的分布未知;因此,它们可能不会被排序或连续地放在一起 我的代码的问题是它在GC中花费了太多的时间,所以看起来像是一个空间泄漏,因为在GC中花费的时间是60.3%——所以我的生产率是39.7% 出什么事了?我已经尝试在直方图函数中严格要求,并且我也在其中添加了线条(GC时间从69.1%增加到59.4%) 请注意,我已通过不更新HT中的频率简化了此代码 {-# LANGUAGE BangPat

我一直在编写直方图,在这方面我得到了很大的帮助。我一直在使用哈希表对直方图进行编码,以存储键和频率值,因为键的分布未知;因此,它们可能不会被排序或连续地放在一起

我的代码的问题是它在GC中花费了太多的时间,所以看起来像是一个空间泄漏,因为在GC中花费的时间是60.3%——所以我的生产率是39.7%

出什么事了?我已经尝试在直方图函数中严格要求,并且我也在其中添加了线条(GC时间从69.1%增加到59.4%)

请注意,我已通过不更新HT中的频率简化了此代码

{-# LANGUAGE BangPatterns #-}
import qualified Data.HashTable.IO as H
import qualified Data.Vector as V

type HashTable k v = H.BasicHashTable k v

n :: Int 
n = 5000000

kv :: V.Vector (Int,Int)
kv = V.zip k v 
 where
    k = V.generate n (\i -> i `mod` 10)
    v = V.generate n (\i -> 1)

histogram :: V.Vector (Int,Int) -> Int -> IO (H.CuckooHashTable Int Int)
histogram vec !n = do
    ht <- H.newSized n 
    go ht (n-1)
        where
            go ht = go'
                where 
                    go' (-1) = return ht
                    go' !i = do
                        let (k,v) = vec V.! i
                        H.insert ht k v
                        go' (i-1)
{-# INLINE histogram #-}

main :: IO ()
main = do
    ht <- histogram kv n
    putStrLn "done"
诊断:

jap@devbox:~/dev$ ./histogram +RTS -sstderr
done
     863,187,472 bytes allocated in the heap
     708,960,048 bytes copied during GC
     410,476,592 bytes maximum residency (5 sample(s))
       4,791,736 bytes maximum slop
             613 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      1284 colls,     0 par    0.46s    0.46s     0.0004s    0.0322s
  Gen  1         5 colls,     0 par    0.36s    0.36s     0.0730s    0.2053s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.51s  (  0.50s elapsed)
  GC      time    0.82s  (  0.82s elapsed)
  EXIT    time    0.03s  (  0.04s elapsed)
  Total   time    1.36s  (  1.36s elapsed)

  %GC     time      60.3%  (60.4% elapsed)

  Alloc rate    1,708,131,822 bytes per MUT second

  Productivity  39.7% of total user, 39.7% of total elapsed

为了便于比较,以下是我运行您发布的代码得到的结果:

     863,187,472 bytes allocated in the heap
     708,960,048 bytes copied during GC
     410,476,592 bytes maximum residency (5 sample(s))
       4,791,736 bytes maximum slop
             613 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      1284 colls,     0 par    1.01s    1.01s     0.0008s    0.0766s
  Gen  1         5 colls,     0 par    0.81s    0.81s     0.1626s    0.4783s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    1.04s  (  1.04s elapsed)
  GC      time    1.82s  (  1.82s elapsed)
  EXIT    time    0.04s  (  0.04s elapsed)
  Total   time    2.91s  (  2.91s elapsed)

  %GC     time      62.6%  (62.6% elapsed)

  Alloc rate    827,493,210 bytes per MUT second

  Productivity  37.4% of total user, 37.4% of total elapsed
假设向量元素只是
(Int,Int)
元组,我们没有理由不使用
Data.vector.unbox
而不是普通的
Data.vector
。这已经带来了显著的改善:

     743,148,592 bytes allocated in the heap
          38,440 bytes copied during GC
     231,096,768 bytes maximum residency (4 sample(s))
       4,759,104 bytes maximum slop
             226 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0       977 colls,     0 par    0.23s    0.23s     0.0002s    0.0479s
  Gen  1         4 colls,     0 par    0.22s    0.22s     0.0543s    0.1080s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    1.04s  (  1.04s elapsed)
  GC      time    0.45s  (  0.45s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    1.49s  (  1.49s elapsed)

  %GC     time      30.2%  (30.2% elapsed)

  Alloc rate    715,050,070 bytes per MUT second

  Productivity  69.8% of total user, 69.9% of total elapsed
接下来,我们可以使用
vector
库为此提供的优化函数,而不是在向量上手动滚动递归。代码

import qualified Data.HashTable.IO as H
import qualified Data.Vector.Unboxed as V

n :: Int
n = 5000000

kv :: V.Vector (Int,Int)
kv = V.zip k v
 where
    k = V.generate n (\i -> i `mod` 10)
    v = V.generate n (\i -> 1)

histogram :: V.Vector (Int,Int) -> Int -> IO (H.CuckooHashTable Int Int)
histogram vec n = do
    ht <- H.newSized n
    V.mapM_ (\(k, v) -> H.insert ht k v) vec
    return ht
{-# INLINE histogram #-}

main :: IO ()
main = do
    ht <- histogram kv n
    putStrLn "done"
节省了81MB,还不错。我们能做得更好吗


堆配置文件(当内存消耗出现问题时,您首先想到的应该是堆配置文件——在没有堆配置文件的情况下调试它们是在黑暗中进行的)将揭示,即使使用原始代码,峰值内存消耗也会很早发生。严格地说,我们没有漏洞;我们只是从一开始就花费了大量的记忆。现在,请注意,哈希表是用
ht创建的。快速查看源代码可以发现
H.insert
有时会增长哈希表,它调用
newSizedReal
,创建两个新的
MutableArray
s,这可能会让旧的
MutableArray
s被垃圾收集。堆分析表明,
newSizedReal
确实分配了大量内存。我不知道解决方案是什么,但我怀疑是不使用哈希表。我的代码改编自哈希表的fromList实现,当我运行此代码时,它运行了v。快速,只花费1.3%的时间在GC中。创建哈希表后,您如何使用它?也许这里有更好的数据结构。@jap您的
fromList
示例给了我一个想法。我没有从
generate
生成向量,而是尝试了
kv=V.fromList(zip(取n$cycle[1..10])(重复1))
。这将我的机器上的GC时间减少到30%。所以我想问题在于向量,而不是哈希表。我尝试使用arry来构造直方图:
histogram=accumArray(+)0(1,10)(zip(取n$cycle[1..10])(取n$repeat 1))
。比哈希表更快、更少的GC时间。它需要事先知道数据的范围,但这应该是一个简单的计算。哇,我说不出话来!我的直方图创建代码已从1360毫秒增加到125毫秒,因此速度快了10倍多!加上代码已经被减少到一个线性在同一时间。非常感谢您抽出时间进行调查,我也学到了很多。Haskell一直给我留下深刻印象。如果你能使用
fromList
创建
HashMap
,它应该会更快。谢谢tibbe,我明天会试一试。为什么它应该更快?我很想知道,因为我认为列表不是连续的。
import qualified Data.HashTable.IO as H
import qualified Data.Vector.Unboxed as V

n :: Int
n = 5000000

kv :: V.Vector (Int,Int)
kv = V.zip k v
 where
    k = V.generate n (\i -> i `mod` 10)
    v = V.generate n (\i -> 1)

histogram :: V.Vector (Int,Int) -> Int -> IO (H.CuckooHashTable Int Int)
histogram vec n = do
    ht <- H.newSized n
    V.mapM_ (\(k, v) -> H.insert ht k v) vec
    return ht
{-# INLINE histogram #-}

main :: IO ()
main = do
    ht <- histogram kv n
    putStrLn "done"
     583,151,048 bytes allocated in the heap
          35,632 bytes copied during GC
     151,096,672 bytes maximum residency (3 sample(s))
       3,003,040 bytes maximum slop
             148 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0       826 colls,     0 par    0.20s    0.20s     0.0002s    0.0423s
  Gen  1         3 colls,     0 par    0.12s    0.12s     0.0411s    0.1222s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.92s  (  0.92s elapsed)
  GC      time    0.32s  (  0.33s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    1.25s  (  1.25s elapsed)

  %GC     time      25.9%  (26.0% elapsed)

  Alloc rate    631,677,209 bytes per MUT second

  Productivity  74.1% of total user, 74.0% of total elapsed
     432,059,960 bytes allocated in the heap
          50,200 bytes copied during GC
          44,416 bytes maximum residency (2 sample(s))
          25,216 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0       825 colls,     0 par    0.01s    0.01s     0.0000s    0.0000s
  Gen  1         2 colls,     0 par    0.00s    0.00s     0.0002s    0.0003s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.90s  (  0.90s elapsed)
  GC      time    0.01s  (  0.01s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    0.91s  (  0.90s elapsed)

  %GC     time       0.6%  (0.6% elapsed)

  Alloc rate    481,061,802 bytes per MUT second

  Productivity  99.4% of total user, 99.4% of total elapsed
import qualified Data.HashMap.Strict as M
import qualified Data.Vector.Unboxed as V

n :: Int
n = 5000000

kv :: V.Vector (Int,Int)
kv = V.zip k v
 where
    k = V.generate n (\i -> i `mod` 10)
    v = V.generate n (\i -> 1)

histogram :: V.Vector (Int,Int) -> M.HashMap Int Int
histogram vec =
    V.foldl' (\ht (k, v) -> M.insert k v ht) M.empty vec

main :: IO ()
main = do
    print $ M.size $ histogram kv
    putStrLn "done"
          55,760 bytes allocated in the heap
           3,512 bytes copied during GC
          44,416 bytes maximum residency (1 sample(s))
          17,024 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0         0 colls,     0 par    0.00s    0.00s     0.0000s    0.0000s
  Gen  1         1 colls,     0 par    0.00s    0.00s     0.0002s    0.0002s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.34s  (  0.34s elapsed)
  GC      time    0.00s  (  0.00s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    0.34s  (  0.34s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    162,667 bytes per MUT second

  Productivity  99.9% of total user, 100.0% of total elapsed