Attoparsec的CSV解析问题_Csv_Haskell_Text_Attoparsec

Attoparsec的CSV解析问题

csv haskell text

Attoparsec的CSV解析问题,csv,haskell,text,attoparsec,Csv,Haskell,Text,Attoparsec,下面是我的代码，它使用文本和attoparsec 图书馆： import qualified Data.Attoparsec.Text as A import qualified Data.Text as T -- | Parse a field of a record. field :: A.Parser T.Text -- ^ parser field = fmap T.concat quoted <|> normal A.<?> "field" where

下面是我的代码，它使用

文本

和

attoparsec

图书馆：

import qualified Data.Attoparsec.Text as A
import qualified Data.Text as T

-- | Parse a field of a record.
field :: A.Parser T.Text -- ^ parser
field = fmap T.concat quoted <|> normal A.<?> "field"
  where
    normal  = A.takeWhile (A.notInClass "\n\r,\"")     A.<?> "normal field"
    quoted  = A.char '"' *> many between <* A.char '"' A.<?> "quoted field"
    between = A.takeWhile1 (/= '"') <|> (A.string "\"\"" *> pure "\"")


-- | Parse a block of text into a CSV table.
comma :: T.Text                   -- ^ CSV text
      -> Either String [[T.Text]] -- ^ error | table
comma text
  | T.null text = Right []
  | otherwise   = A.parseOnly table text
  where
    table  = A.sepBy1 record A.endOfLine A.<?> "table"
    record = A.sepBy1 field (A.char ',') A.<?> "record"

通缉行为：

> comma "hello\nworld"
Right [["hello"],["world"]]

> comma "hello\nworld\n"
Right [["hello"],["world"],[""]]

> comma "hello\nworld"
Right [["hello"],["world"]]

> comma "hello\nworld\n"
Right [["hello"],["world"]]

我一直在试图解决这个问题，但我用完了IDAE。我差不多确定它必须是带有

A.endOfInput

的东西，因为这是重要的锚和我们唯一的“奖金”信息。有什么办法吗把它写进代码里

一个可能的想法是在运行 Attoparsec解析器并删除最后一个字符（如果是

\r\n

，则删除两个字符）但这似乎是一个我希望在代码中避免的黑客解决方案

库的完整代码可以在这里找到：

问题是

字段

接受空字符串（由于

takeWhile

），因此

sepBy1字段“，”

接受空字符串（和“，”，，，”等），以及

sepBy1记录下线

接受“\n”“\n”等。如果空字符串实际上是有效字段（CSV非常通用，因此这实际上是一种设计选择），那么“想要的行为”将是错误的，因为它们实际上是不同的表！如果不是，您应该修复

字段

，因为

parseOnly

已经忽略了所有未识别的尾随输入。