Parsing Haskell中严格使用二进制文件解码时出现的问题

Parsing Haskell中严格使用二进制文件解码时出现的问题,parsing,haskell,binary,io,lazy-evaluation,Parsing,Haskell,Binary,Io,Lazy Evaluation,我试图严格地读取和解码二进制文件,这似乎在大多数情况下都能正常工作。但不幸的是,在少数情况下,我的程序失败了 “字节太少。在字节位置1读取失败” 我猜二进制的解码功能认为没有可用的数据, 但我知道有,只要重新运行程序就可以了 import Data.Trie as T import qualified Data.ByteString as B import qualified Data.ByteString.Lazy as L import Data.Binary import System.I

我试图严格地读取和解码二进制文件,这似乎在大多数情况下都能正常工作。但不幸的是,在少数情况下,我的程序失败了

“字节太少。在字节位置1读取失败”

我猜二进制的解码功能认为没有可用的数据, 但我知道有,只要重新运行程序就可以了

import Data.Trie as T
import qualified Data.ByteString as B
import qualified Data.ByteString.Lazy as L
import Data.Binary
import System.IO

tmp = "blah"

main = do
    let trie = T.fromList    [(B.pack [p], p) | p <- [0..]]
    (file,hdl) <- openTempFile "/tmp" tmp
    B.hPutStr hdl (B.concat $ L.toChunks $ encode trie)
    hClose hdl
    putStrLn file
    t <- B.readFile file
    let trie' = decode (L.fromChunks [t])
    print (trie' == trie)
我尝试了几种解决方案,但都无法解决我的问题:(

  • 使用withBinaryFile:

    decodeFile' path = withBinaryFile path ReadMode doDecode
      where
        doDecode h = do c <- LBS.hGetContents h
                        return $! decode c
    
你知道这里发生了什么,以及如何解决这个问题吗

谢谢

编辑:我想我已经解决了我的问题。这不是严格地读取文件。我有很多进程,主要是从文件读取,但有时需要写入,这将首先截断文件,然后添加新内容。因此,对于写入,我需要先设置文件锁,这似乎不太可能当使用“Binary.encodeFile”时(当我说process时,我不是指线程,而是指正在运行的同一程序的真实实例)

编辑终于有时间用POSIX IO和文件锁解决了我的问题。从那以后,我再也没有遇到过问题

万一有人对我当前的解决方案感兴趣,或者有人能够指出错误/问题,我会在这里发布我的解决方案

文件的安全编码:

safeEncodeFile path value = do
    fd <- openFd path WriteOnly (Just 0o600) (defaultFileFlags {trunc = True})
    waitToSetLock fd (WriteLock, AbsoluteSeek, 0, 0)
    let cs = encode value
    let outFn = LBS.foldrChunks (\c rest -> writeChunk fd c >> rest) (return ()) cs
    outFn
    closeFd fd
  where
    writeChunk fd bs = unsafeUseAsCString bs $ \ptr ->
                         fdWriteBuf fd (castPtr ptr) (fromIntegral $ BS.length bs)

如果您能够生成一些最小的代码片段来运行和演示问题,这将非常有用。目前我不认为这与您的程序跟踪没有问题,因为这些句柄是打开/关闭的,读/写操作相互阻碍。下面是我制作的测试代码示例,效果很好

import Data.Trie as T
import qualified Data.ByteString as B
import qualified Data.ByteString.Lazy as L
import Data.Binary
import System.IO

tmp = "blah"

main = do
    let trie = T.fromList    [(B.pack [p], p) | p <- [0..]]
    (file,hdl) <- openTempFile "/tmp" tmp
    B.hPutStr hdl (B.concat $ L.toChunks $ encode trie)
    hClose hdl
    putStrLn file
    t <- B.readFile file
    let trie' = decode (L.fromChunks [t])
    print (trie' == trie)
import Data.Trie作为T
将限定数据.ByteString作为B导入
将限定的Data.ByteString.Lazy作为L导入
导入数据。二进制
导入系统.IO
tmp=“诸如此类”
main=do

让TIE=T.OfList[(B.PoC[P],p)p ]这不是一个严格的问题——例如,您的第二个解决方案保证所有数据都被读取。您的二进制解析器本身可能存在一些问题吗?考虑使用谷类而不是二进制。.因此,二进制实例的get函数基本上如下所示:“get=do trie谢谢。它或多或少与您的代码类似,但读写是交换的。1.读取数据,2.修改数据,3.使用Binary.encodeFile写入数据,这将在写入之前截断文件。因此,我认为这是一种竞争条件,即在覆盖文件时读取文件的进程加载(请参阅我文章中的“编辑”)。
safeDecodeFile def path = do
    e <- doesFileExist path
    if e
      then do fd <- openFd path ReadOnly Nothing
                           (defaultFileFlags{nonBlock=True})
              waitToSetLock fd (ReadLock, AbsoluteSeek, 0, 0)
              c  <- fdGetContents fd
              let !v = decode $! c
              return v
      else return def

fdGetContents fd = lazyRead
  where
    lazyRead = unsafeInterleaveIO loop

    loop = do blk <- readBlock fd
              case blk of
                Nothing -> return LBS.Empty
                Just c  -> do cs <- lazyRead
                              return (LBS.Chunk c cs)

readBlock fd = do buf <- mallocBytes 4096
                  readSize <- fdReadBuf fd buf 4096
                  if readSize == 0
                    then do free buf
                            closeFd fd
                            return Nothing
                    else do bs <- unsafePackCStringFinalizer buf
                                         (fromIntegral readSize)
                                         (free buf)
                            return $ Just bs
import qualified Data.ByteString as BS
import qualified Data.ByteString.Lazy as LBS
import qualified Data.ByteString.Lazy.Internal as LBS
import Data.Trie as T
import qualified Data.ByteString as B
import qualified Data.ByteString.Lazy as L
import Data.Binary
import System.IO

tmp = "blah"

main = do
    let trie = T.fromList    [(B.pack [p], p) | p <- [0..]]
    (file,hdl) <- openTempFile "/tmp" tmp
    B.hPutStr hdl (B.concat $ L.toChunks $ encode trie)
    hClose hdl
    putStrLn file
    t <- B.readFile file
    let trie' = decode (L.fromChunks [t])
    print (trie' == trie)