Parsing 如何解析中缀而不是带有Haskell的前缀?

Parsing 如何解析中缀而不是带有Haskell的前缀?,parsing,haskell,tokenize,infix-notation,infix-operator,Parsing,Haskell,Tokenize,Infix Notation,Infix Operator,我需要帮助这个程序,我正试图写在哈斯克尔。我写了大部分,下面是我基本上要做的: 当我写作时 解析“a+b” 在终端中,我希望将其作为输出: 加上(单词“a”)(单词“b”) 当我写作时 解析“a-2*b+c” 在终端中,我希望将其作为输出: 减(单词“a”)(加(单词“b”)(单词“c”) 到目前为止,我的代码是: data Ast = Word String | Num Int | Mult Ast Ast | Plus Ast Ast | Minus

我需要帮助这个程序,我正试图写在哈斯克尔。我写了大部分,下面是我基本上要做的:

  • 当我写作时
  • 解析“a+b”

    在终端中,我希望将其作为输出:

    加上(单词“a”)(单词“b”)

  • 当我写作时
  • 解析“a-2*b+c”

    在终端中,我希望将其作为输出:

    减(单词“a”)(加(单词“b”)(单词“c”)

    到目前为止,我的代码是:

    data Ast
        = Word String
        | Num Int
        | Mult Ast Ast
        | Plus Ast Ast
        | Minus Ast Ast
        deriving (Eq, Show)
    
    tokenize :: [Char] -> [String]
    tokenize [] = []
    tokenize (' ' : s) = tokenize s
    tokenize ('+' : s) = "+" : tokenize s
    tokenize ('*' : s) = "*" : tokenize s
    tokenize (c : s)
      | isDigit c =
        let (cs, s') = collectWhile isDigit s
         in (c : cs) : tokenize s'
      | isAlpha c =
        let (cs, s') = collectWhile isAlpha s
         in (c : cs) : tokenize s'
      | otherwise = error ("unexpected character " ++ show c)
    
    collectWhile :: (Char -> Bool) -> String -> (String, String)
    collectWhile p s = (takeWhile p s, dropWhile p s)
    
    isDigit, isAlpha :: Char -> Bool
    isDigit c = c `elem` ['0' .. '9']
    isAlpha c = c `elem` ['a' .. 'z'] ++ ['A' .. 'Z']
    
    parseU :: [String] -> (Ast, [String])
    parseU ("+" : s0) =
      let (e1, s1) = parseU s0
          (e2, s2) = parseU s1
       in (Plus e1 e2, s2)
    parseU ("*" : s0) =
      let (e1, s1) = parseU s0
          (e2, s2) = parseU s1
       in (Mult e1 e2, s2)
    parseU (t : ts)
      | isNumToken t = (Num (read t), ts)
      | isWordToken t = (Word t, ts)
      | otherwise = error ("unrecognized token " ++ show t)
    parseU [] = error "unexpected end of input"
    
    isNumToken, isWordToken :: String -> Bool
    isNumToken xs = takeWhile isDigit xs == xs
    isWordToken xs = takeWhile isAlpha xs == xs
    
    parse :: String -> Ast
    parse s =
      case parseU (tokenize s) of
        (e, []) -> e
        (_, t : _) -> error ("unexpected token " ++ show t)
    
    inn :: Ast -> String
    inn (Plus x y) = innP x ++ " + " ++ innP y
    inn (Mult x y) = innP x ++ " * " ++ innP y
    inn ast = innP ast
    
    innP :: Ast -> String
    innP (Num n) = show n
    innP (Plus x y) = "(" ++ innP x ++ " + " ++ innP y ++ ")"
    innP (Mult x y) = "(" ++ innP x ++ " * " ++ innP y ++ ")"
    innP (Word w) = w -- 
    
    innfiks :: String -> String
    innfiks s = inn (parse s)
    
    现在,我在终端上写的文本发布时出错,但当我这样写时:

    解析“+AB”

    我得到了正确的输出:

    加上(单词“a”)(单词“b”)

    我知道我必须更改代码,以便它接受我发送给此表单上解析函数的内容:

    值运算符值

    而不是在本表格上:

    运算符值


    但是我正在努力找出我必须做的改变。

    要处理具有优先级的中缀运算符,一种方法是引入一系列与优先级对应的解析函数。因此,如果您有“因子”,可以将其相乘以创建“术语”,可以将其相加或相减以创建“表达式”,那么您将需要为每个级别创建解析器函数。解析“因子”(即单词或数字)很容易,因为您已经编写了以下代码:

    parseFactor :: [String] -> (Ast, [String])
    parseFactor (t : ts)
      | isNumToken t = (Num (read t), ts)
      | isWordToken t = (Word t, ts)
      | otherwise = error ("unrecognized token " ++ show t)
    parseFactor [] = error "unexpected end of input"
    
    解析一个术语更为棘手。您希望首先解析一个因子,然后可选地解析一个
    *
    ,后跟另一个因子,然后将其视为一个术语,进一步可选地与另一个因子相乘,依此类推。以下是一种方法:

    parseTerm :: [String] -> (Ast, [String])
    parseTerm ts
      =  let (f1, ts1) = parseFactor ts     -- parse first factor
         in  go f1 ts1
      where go acc ("*":ts2)                -- add a factor to an accumulating term
              = let (f2, ts3) = parseFactor ts2
                in go (Mult acc f2) ts3
            go acc rest = (acc, rest)       -- no more factors: return the term
    
    如果需要,请尝试编写一个类似的
    parsexpr
    来解析由
    +
    字符分隔的术语(现在跳过减法),并在以下情况下对其进行测试:

    parseExpr (tokenize "2 + 3 * 6 + 4 * 8 * 12 + 1")
    
    对于破坏者,这里有一个版本可以同时处理
    +
    -
    ,但是请注意,您的标记器还不能正确处理减法,因此您必须首先解决这个问题

    parseExpr :: [String] -> (Ast, [String])
    parseExpr ts
      =  let (f1, ts1) = parseTerm ts
         in  go f1 ts1
      where go acc (op:ts2)
              | op == "+" || op == "-"
              = let (f2, ts3) = parseTerm ts2
                in go ((astOp op) acc f2) ts3
            go acc rest = (acc, rest)
            astOp "+" = Plus
            astOp "-" = Minus
    
    有了它,您可以将
    parse
    指向正确的解析器:

    parse :: String -> Ast
    parse s =
      case parseExpr (tokenize s) of
        (e, []) -> e
        (_, t : _) -> error ("unexpected token " ++ show t)
    
    你的例子应该有用:

    λ> parse "a - 2 * b + c"
    Plus (Minus (Word "a") (Mult (Num 2) (Word "b"))) (Word "c")
    
    请注意,这与您所希望的输出略有不同,但这种顺序对于左关联运算符是正确的(这对于正确处理
    -
    )。也就是说,您想要:

    5 - 4 + 1
    
    解析为:

    (5 - 4) + 1  -- i.e., (Plus (Minus (Num 5) (Num 4)) (Num 1))
    
    以便评估人员计算2的正确答案。如果将其解析为:

    5 - (4 + 1)  -- i.e., (Minus (Num 5) (Plus (Num 4) (Num 1)))
    
    您的评估者将计算错误的答案0

    但是,如果您真的想使用正确的关联运算符进行解析,请参见下文

    左关联运算符的完整修改代码:

    data Ast
        = Word String
        | Num Int
        | Mult Ast Ast
        | Plus Ast Ast
        | Minus Ast Ast
        deriving (Eq, Show)
    
    tokenize :: [Char] -> [String]
    tokenize [] = []
    tokenize (' ' : s) = tokenize s
    tokenize ('-' : s) = "-" : tokenize s
    tokenize ('+' : s) = "+" : tokenize s
    tokenize ('*' : s) = "*" : tokenize s
    tokenize (c : s)
      | isDigit c =
        let (cs, s') = collectWhile isDigit s
         in (c : cs) : tokenize s'
      | isAlpha c =
        let (cs, s') = collectWhile isAlpha s
         in (c : cs) : tokenize s'
      | otherwise = error ("unexpected character " ++ show c)
    
    collectWhile :: (Char -> Bool) -> String -> (String, String)
    collectWhile p s = (takeWhile p s, dropWhile p s)
    
    isDigit, isAlpha :: Char -> Bool
    isDigit c = c `elem` ['0' .. '9']
    isAlpha c = c `elem` ['a' .. 'z'] ++ ['A' .. 'Z']
    
    parseFactor :: [String] -> (Ast, [String])
    parseFactor (t : ts)
      | isNumToken t = (Num (read t), ts)
      | isWordToken t = (Word t, ts)
      | otherwise = error ("unrecognized token " ++ show t)
    parseFactor [] = error "unexpected end of input"
    
    parseTerm :: [String] -> (Ast, [String])
    parseTerm ts
      =  let (f1, ts1) = parseFactor ts
         in  go f1 ts1
      where go acc ("*":ts2)
              = let (f2, ts3) = parseFactor ts2
                in go (Mult acc f2) ts3
            go acc rest = (acc, rest)
    
    parseExpr :: [String] -> (Ast, [String])
    parseExpr ts
      =  let (f1, ts1) = parseTerm ts
         in  go f1 ts1
      where go acc (op:ts2)
              | op == "+" || op == "-"
              = let (f2, ts3) = parseTerm ts2
                in go ((astOp op) acc f2) ts3
            go acc rest = (acc, rest)
            astOp "+" = Plus
            astOp "-" = Minus
    
    isNumToken, isWordToken :: String -> Bool
    isNumToken xs = takeWhile isDigit xs == xs
    isWordToken xs = takeWhile isAlpha xs == xs
    
    parse :: String -> Ast
    parse s =
      case parseExpr (tokenize s) of
        (e, []) -> e
        (_, t : _) -> error ("unexpected token " ++ show t)
    
    对于右关联运算符,请修改以下定义:

    parseTerm :: [String] -> (Ast, [String])
    parseTerm ts
      =  let (fct, ts1) = parseFactor ts
         in  case ts1 of
               "*":ts2 -> let (trm, rest) = parseTerm ts2
                          in  (Mult fct trm, rest)
               _       -> (fct, ts1)
    
    parseExpr :: [String] -> (Ast, [String])
    parseExpr ts
      =  let (trm, ts1) = parseTerm ts
         in  case ts1 of
               op:ts2 | op == "+" || op == "-"
                       -> let (expr, rest) = parseExpr ts2
                          in  (astOp op trm expr, rest)
               _       -> (trm, ts1)
      where astOp "+" = Plus
            astOp "-" = Minus*
    

    这里有一个解析器。关于如何进一步解析中缀符号,我们可以找到大量的资料。正如你在回答中所说的,外部是加号,内部是负号。有没有办法改变这一点?我尝试了很多方法,但似乎最后一个表达式总是排在第一位。我已经更新了我的答案来展示如何做到这一点。但是请注意,如果您想要一个以“正常方式”解析数学表达式的解析器,实际上您需要外部加上,内部减下,正如我在上面所解释的。非常感谢:-)