Multithreading 简单多线程Haskell的巨大内存消耗_Multithreading_Haskell_Queue_Ghc

Multithreading 简单多线程Haskell的巨大内存消耗

multithreading haskell

Multithreading 简单多线程Haskell的巨大内存消耗,multithreading,haskell,queue,ghc,Multithreading,Haskell,Queue,Ghc,我有一个相对简单的“复制”程序，它只将一个文件的所有行复制到另一个文件。我正在使用Haskell对TMQueue和STM的并发支持，所以我想我应该这样尝试： {-# LANGUAGE BangPatterns #-} module Main where import Control.Applicative import Control.Concurrent.Async -- from async import Control.Concurrent.Chan impo

我有一个相对简单的“复制”程序，它只将一个文件的所有行复制到另一个文件。我正在使用Haskell对

TMQueue

和

STM

的并发支持，所以我想我应该这样尝试：

{-# LANGUAGE BangPatterns #-}

module Main where

import Control.Applicative
import Control.Concurrent.Async              -- from async
import Control.Concurrent.Chan
import Control.Concurrent.STM (atomically)
import Control.Concurrent.STM.TMQueue        -- from stm-chans
import Control.Monad (replicateM, forM_, forever, unless)
import qualified Data.ByteString.Char8 as B
import Data.Function (fix)
import Data.Maybe (catMaybes, maybe)
import System.IO (withFile, IOMode(..), hPutStrLn, hGetLine)
import System.IO.Error (catchIOError)

input  = "data.dat"
output = "out.dat"
batch = 100 :: Int

consumer :: TMQueue B.ByteString -> IO ()
consumer q = withFile output WriteMode $ \fh -> fix $ \loop -> do
  !items <- catMaybes <$> replicateM batch readitem
  forM_ items $ B.hPutStrLn fh
  unless (length items < batch) loop
  where
    readitem = do
      !item <- atomically $ readTMQueue q
      return item

producer :: TMQueue B.ByteString -> IO ()
producer q = withFile input ReadMode $ \fh ->
  (forever (B.hGetLine fh >>= atomically . writeTMQueue q))
  `catchIOError` const (atomically (closeTMQueue q) >> putStrLn "Done")

main :: IO ()
main = do
  q <- atomically newTMQueue
  thread <- async $ consumer q
  producer q
  wait thread

像这样建造

ghc -e 'writeFile "data.dat" (unlines (map show [1..5000000]))'

ghc --make QueueTest.hs -O2 -prof -auto-all -caf-all -threaded -rtsopts -o q

当我像这样运行它时，

/q+RTS-s-prof-hc-L60-N2

，它说“2117mb总内存在使用中”！但是输入文件只有38MB

我不熟悉评测，但我已经绘制了一张又一张的图表，无法指出我的错误。

正如OP所指出的，现在我还不如写一个真实的答案。让我们从内存消耗开始

两个有用的参考文献是和。我们还需要了解一些结构的定义

-- from http://hackage.haskell.org/package/stm-chans-3.0.0.2/docs/src/Control-Concurrent-STM-TMQueue.html

data TMQueue a = TMQueue
    {-# UNPACK #-} !(TVar Bool)
    {-# UNPACK #-} !(TQueue a)
    deriving Typeable


-- from http://hackage.haskell.org/package/stm-2.4.3/docs/src/Control-Concurrent-STM-TQueue.html

-- | 'TQueue' is an abstract type representing an unbounded FIFO channel.
data TQueue a = TQueue {-# UNPACK #-} !(TVar [a])
                       {-# UNPACK #-} !(TVar [a])

TQueue

实现使用具有读端和写端的标准功能队列

让我们设置内存使用的上限，并假设在使用者执行任何操作之前，我们将整个文件读入

TMQueue

。在这种情况下，TQueue的写入端将包含一个列表，每个输入行有一个元素（存储为bytestring）。每个列表节点看起来像

(:) bytestring tail

这需要3个字（每个字段1个字+构造函数1个字）。每个bytestring有9个字，因此将这两个字相加，每行有12个字的开销，不包括实际数据。您的测试数据是500万行，因此整个文件（加上一些常量）的开销是6000万字，在64位系统上大约是460MB（假设我的计算正确，总是有问题）。加上40MB的实际数据，我们得到的值非常接近我在系统上看到的值
那么，为什么我们的内存使用率接近这个上限呢？我有一个理论（调查只是个练习！）。首先，生产者可能比消费者运行得快一点，因为读取通常比写入快（我使用的是旋转磁盘，可能SSD会有所不同）。以下是readTQueue的定义：

-- |Read the next value from the 'TQueue'. readTQueue :: TQueue a -> STM a readTQueue (TQueue read write) = do xs <- readTVar read case xs of (x:xs') -> do writeTVar read xs' return x [] -> do ys <- readTVar write case ys of [] -> retry _ -> case reverse ys of [] -> error "readTQueue" (z:zs) -> do writeTVar write [] writeTVar read zs return z

这样可以避免这个问题。现在，
z
和
zs
绑定都应该延迟计算，因此列表遍历将在该事务之外发生，从而允许读取操作有时在争用情况下成功。当然，首先假设我对这个问题的看法是正确的（而且这个定义已经足够懒了）。不过，可能还有其他意想不到的缺点。
我把责任归咎于排队。如果您使用
TBMQueue
和适当的边界（例如，10*批）交换
TMQueue
，则总内存使用量约为3 MB。顺便说一句，基本问题是生产者远远超过消费者，因此需要处理大量积压的数据。我认为，因为unagi chan使用了更高效的内存表示，所以即使它遇到了同样的问题，它最终也会使用更少的内存。如果您不能允许制作者阻止，那么您需要做好准备，以防消费者出现延迟时内存使用量可能出现峰值。也可以强制消费者断开连接并重新同步，但这可能会导致丢失物品。@JohnL是的，我注意到制作人比消费者跑得快，因为它在完成前几秒钟打印“完成”。但我不明白55x的内存开销。虽然unagi chan使用的内存确实比合理配置的TBM队列多出很多，但它实际上更快，这正是我从TMQueue中所期望的。@3哦，unagi chan比类似chans的MVar快几个数量级。此外，这不是一个很好的并行测试，因为它受IO限制，唯一重要的CPU负载来自通道操作。如果您的实际程序在管道中涉及到更重要的处理，那么这将占主导地位，而unagi的基本速度优势对最终结果的贡献也不会太大。@3noch:这不仅仅是
TMQueue
，几个Haskell数据结构和列表也是一个非常常见的问题。并不是每个数据结构在所有情况下都能很好地伸缩，您应该看到更多信息。惊人的答案！非常感谢您从各个角度进行的透彻分析。您是否考虑过将您的备选方案
readTQueue
归档，作为对
stm
的潜在增强？
-- |Read the next value from the 'TQueue'. readTQueue :: TQueue a -> STM a readTQueue (TQueue read write) = do xs <- readTVar read case xs of (x:xs') -> do writeTVar read xs' return x [] -> do ys <- readTVar write case ys of [] -> retry _ -> do writeTVar write [] let (z:zs) = reverse ys writeTVar read zs return z