Warning: file_get_contents(/data/phpspider/zhask/data//catemap/6/haskell/10.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/cmake/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Haskell在不牺牲性能的情况下避免折叠中的堆栈溢出_Haskell_Fold_Bytestring - Fatal编程技术网

Haskell在不牺牲性能的情况下避免折叠中的堆栈溢出

Haskell在不牺牲性能的情况下避免折叠中的堆栈溢出,haskell,fold,bytestring,Haskell,Fold,Bytestring,以下代码对于大输入会出现堆栈溢出: {-# LANGUAGE DeriveDataTypeable, OverloadedStrings #-} import qualified Data.ByteString.Lazy.Char8 as L genTweets :: L.ByteString -> L.ByteString genTweets text | L.null text = "" | otherwise = L.intercalate "\n\

以下代码对于大输入会出现堆栈溢出:

{-# LANGUAGE DeriveDataTypeable, OverloadedStrings #-}
import qualified Data.ByteString.Lazy.Char8 as L


genTweets :: L.ByteString -> L.ByteString
genTweets text | L.null text = ""
               | otherwise = L.intercalate "\n\n" $ genTweets' $ L.words text
  where genTweets' txt = foldr p [] txt
          where p word [] = [word]
                p word words@(w:ws) | L.length word + L.length w <= 139 =
                                        (word `L.append` " " `L.append` w):ws
                                    | otherwise = word:words
是什么原因导致第一个代码段产生thunks,并且可以避免?是否可以编写第二个代码段,使其不依赖于
(++)

p words@(w:ws)

这种模式匹配导致对“尾”的评估,当然,这是foldr p[](w:ws)的结果,这是p w ws的结果,这导致ws再次与头部模式匹配,等等

请注意,foldr和foldl'将以不同的方式拆分文本。foldr将首先显示最短的tweet,foldl'将最后显示最短的tweet


我会这样做:

genTweets' = unfoldr f where
  f [] = Nothing
  f (w:ws) = Just $ g w ws $ L.length w
  g w [] _ = (w, [])
  g w ws@(w':_) len | len+1+(L.length w') > 139 = (w,ws)
  g w (w':ws') len = g (w `L.append` " " `L.append` w') ws' $ len+1+(L.length w')
在末尾追加。显然,这需要很长时间(与累加器列表的长度成比例)。一个更好的解决方案是惰性地生成输出列表,将处理与读取输入流交错(您不需要读取整个输入来输出第一条140个字符的tweet)

以下版本的程序使用O(1)空间,以1秒的时间处理相对较大的文件(
/usr/share/dict/words
):

{-#语言重载字符串、模式}
模块主要在哪里
将限定数据.ByteString.Lazy.Char8作为L导入
导入Data.Int(Int64)
genTweets::L.ByteString->L.ByteString
genTweets text | L.null text=“”
|否则=L.interlate“\n\n”$toTweets$L.words text
哪里
--将文字连接成139个字符的推文。
toTweets::[L.ByteString]->[L.ByteString]
toTweets[]=[]
toTweets[w]=[w]
toTweets(w:ws)=go(L.长度w,w)ws
--主回路。请注意输出tweet(cur_str)是如何在
--可能,因此允许L.writeFile在整个
--输入被处理。
go::(Int64,L.ByteString)->[L.ByteString]->[L.ByteString]
go(_cur_len,!cur_str)[]=[cur_str]
走(!cur_len,!cur_str)(w:ws)

|我明白了,我没意识到模式匹配会评估尾巴。用显式调用
headwords
tail words
替换匹配可以解决问题,但生成的代码并不比使用
foldl'
快,这似乎是错误的。我不明白为什么显式使用head和tail会产生如此大的差异。如果你试图引用“未来”结果,你会遇到麻烦——单词的评估是惰性的,但是如果你试图计算单词的head元素的长度,你必须评估这个head元素。为了做到这一点,递归以与模式匹配相同的方式发生——在我们知道是否可以在剩余部分的头部加前缀之前,无法知道头部的长度。我看到了遍历是如何减慢它的。我得到的印象是,折叠已经懒散地生成了列表。
genTweets' = unfoldr f where
  f [] = Nothing
  f (w:ws) = Just $ g w ws $ L.length w
  g w [] _ = (w, [])
  g w ws@(w':_) len | len+1+(L.length w') > 139 = (w,ws)
  g w (w':ws') len = g (w `L.append` " " `L.append` w') ws' $ len+1+(L.length w')
L.length word + L.length (last words) <= 139
init words ++ [last words `L.append` " " `L.append` word]
{-# LANGUAGE OverloadedStrings, BangPatterns #-}

module Main where

import qualified Data.ByteString.Lazy.Char8 as L
import Data.Int (Int64)

genTweets :: L.ByteString -> L.ByteString
genTweets text | L.null text = ""
               | otherwise   = L.intercalate "\n\n" $ toTweets $ L.words text
  where

    -- Concatenate words into 139-character tweets.
    toTweets :: [L.ByteString] -> [L.ByteString]
    toTweets []     = []
    toTweets [w]    = [w]
    toTweets (w:ws) = go (L.length w, w) ws

    -- Main loop. Notice how the output tweet (cur_str) is generated as soon as
    -- possible, thus enabling L.writeFile to consume it before the whole
    -- input is processed.
    go :: (Int64, L.ByteString) -> [L.ByteString] -> [L.ByteString]
    go (_cur_len, !cur_str) []     = [cur_str]
    go (!cur_len, !cur_str) (w:ws)
      | lw + cur_len <= 139        = go (cur_len + lw + 1,
                                         cur_str `L.append` " " `L.append` w) ws
      | otherwise                  = cur_str : go (lw, w) ws
      where
        lw = L.length w

-- Notice the use of lazy I/O.
main :: IO ()
main = do dict <- L.readFile "/usr/share/dict/words"
          L.writeFile "tweets" (genTweets dict)