Parsing Haskell：遍历字符串/文本文件_Parsing_Haskell_Parsec_Attoparsec

Parsing Haskell：遍历字符串/文本文件

parsing haskell

Parsing Haskell：遍历字符串/文本文件,parsing,haskell,parsec,attoparsec,Parsing,Haskell,Parsec,Attoparsec,我试图读取一个脚本文件，然后处理它并将其输出到一个html文件。在我的脚本文件中，只要有@title（这是一个标题），我就会在html输出中添加标记[header]这是一个标题[/header]。因此，我的方法是首先读取脚本文件，将内容写入字符串，处理字符串，然后将字符串写入html文件换句话说，要识别@title，我需要逐字读取字符串中的字符。当我读取“@”时，我需要检测下一个字符，以查看它们是否正确问题：如何遍历Haskell中的字符串（即字符列表）？例如，您可以使用一个简单的递归技巧

我试图读取一个脚本文件，然后处理它并将其输出到一个html文件。在我的脚本文件中，只要有@title（这是一个标题），我就会在html输出中添加标记[header]这是一个标题[/header]。因此，我的方法是首先读取脚本文件，将内容写入字符串，处理字符串，然后将字符串写入html文件

换句话说，要识别@title，我需要逐字读取字符串中的字符。当我读取“@”时，我需要检测下一个字符，以查看它们是否正确

问题：如何遍历Haskell中的字符串（即字符列表）？

例如，您可以使用一个简单的递归技巧

findTag [] = -- end of list code.
findTag ('@':xs)
  | take 5 xs == "title" = -- your code for @title
  | otherwise            = findTag xs
findTag (_:xs) = findTag xs

因此，基本上，您只需对下一个字符（列表头）是否为“@”进行模式匹配，然后检查下5个字符是否构成“title”。如果是这样，那么您可以继续解析代码。如果下一个字符不是“@”，则继续递归。一旦列表为空，您将达到第一个模式匹配

其他人可能会有更好的解决方案

我希望这能回答你的问题

编辑：

为了获得更大的灵活性，如果您想找到特定的标记，可以执行以下操作：

findTag [] _ = -- end of list code.
findTag ('@':xs) tagName
  | take (length tagName) xs == tagName = -- your code for @title
  | otherwise = findTag xs
findTag (_:xs) _ = findTag xs

如果你这样做了

findTag text "title"

您将专门查找标题，并且始终可以将标记名更改为您想要的任何内容

另一编辑：

findTag [] _ = -- end of list code.
findTag ('@':xs) tagName
  | take tLength xs == tagName = getTagContents tLength xs
  | otherwise = findTag xs
  where tLength = length tagName
findTag (_:xs) _ = findTag xs

getTagContents :: Int -> String -> String
getTagContents len = takeWhile (/=')') . drop (len + 1)

老实说，事情变得有点混乱，但发生的情况如下：

首先删除标记名的长度，然后再删除一个用于左括号的长度，最后使用takeWhile将字符保留到右括号

显然，您的问题属于解析类。正如Daniel Wagner明智地指出的那样，出于可维护性的原因，您最好使用解析器来处理它

另一件事是，如果您想高效地处理文本数据，最好使用

Text

而不是

String

下面是使用解析器库解决问题的方法：

-- For autocasting of hardcoded strings to `Text` type
{-# LANGUAGE OverloadedStrings #-}

-- Import a way more convenient prelude, excluding symbols conflicting 
-- with the parser library. See
-- http://hackage.haskell.org/package/classy-prelude
import ClassyPrelude hiding (takeWhile, try)
-- Exclude the standard Prelude
import Prelude ()
import Data.Attoparsec.Text

-- A parser and an inplace converter for title
title = do
  string "@title("
  r <- takeWhile $ notInClass ")"
  string ")"
  return $ "[header]" ++ r ++ "[/header]"

-- A parser which parses the whole document to parts which are either
-- single-character `Text`s or modified titles
parts = 
  (try endOfInput >> return []) ++
    ((:) <$> (try title ++ (singleton <$> anyChar)) <*> parts)

-- The topmost parser which concats all parts into a single text
top = concat <$> parts

-- A sample input
input = "aldsfj@title(this is a title)sdlfkj@title(this is a title2)"

-- Run the parser and output result
main = print $ parseOnly top input

p.S.ClassyPrelude重新实现

++

作为

Monoid

的

mappend

的别名，因此您可以将其替换为

mappend

，

或

备选方案
，如果需要。对于模式搜索和替换，您可以使用

import Replace.Megaparsec
导入文本.Megaparsec
导入Text.Megaparsec.Char
标题：：Parsec无效字符串
title=do
无效$string“@title（”
someTill anySingle$string“）
编辑器t=“[header]”++t++“[/header]”
streamEdit标题编辑器“@title（这是一个标题）”

“[header]这是一个标题[/header]”
编写解析器。你可以在短期内做其他更简单的黑客，但你以后会后悔的。在解析器这个话题上，Parsec拥有@CatPlusPlus这是有争议的。在性能方面，阿托帕塞克可能经常超过它。谢谢你的建议！
Right "aldsfj[header]this is a title[/header]sdlfkj[header]this is a title2[/header]"