Parsing AST的最佳ADT表示
对于我试图表示为Haskell ADT的表达式,我有以下语法:Parsing AST的最佳ADT表示,parsing,haskell,compiler-construction,abstract-syntax-tree,Parsing,Haskell,Compiler Construction,Abstract Syntax Tree,对于我试图表示为Haskell ADT的表达式,我有以下语法: Expr = SimpleExpr [OPrelation SimpleExpr] SimpleExpr = [OPunary] Term {OPadd Term} Term = Factor {OPmult Factor} 其中: {}表示0或更多 []表示0或1 OPmult、OPadd、OPrelation和OPunary是运算符的类别 请注意,此语法的优先级是正确的 以下是我尝试过的: data Expr =
Expr = SimpleExpr [OPrelation SimpleExpr]
SimpleExpr = [OPunary] Term {OPadd Term}
Term = Factor {OPmult Factor}
其中:
{}表示0或更多[]表示0或1
OPmult、OPadd、OPrelation和OPunary是运算符的类别 请注意,此语法的优先级是正确的 以下是我尝试过的:
data Expr = Expr SimpleExpr (Maybe OPrelation) (Maybe SimpleExpr)
data SimpleExpr = SimpleExpr (Maybe OPunary) Term [OPadd] [Term]
data Term = Term Factor [OPmult] [Factor]
事后看来,我认为这很糟糕,尤其是[OPadd][Term]和[OPmult][Factor]部分。因为,例如,在1+2+3的解析树中,[+,+]放在一个分支中,[2,3]放在另一个分支中,这意味着它们是解耦的
在编译的下一个阶段中,什么样的表示方法会起到很好的作用
- 将{}和[]分解为更多的数据类型似乎有些过分
- 使用列表似乎不太正确,因为它不再是一棵树(只是一个列表节点)
- 也许是{}。好主意
最后,我假设在解析之后,我必须通过解析树,并将其简化为AST?还是应该把整个语法修改得不那么复杂?或者它足够抽象 AST不需要太接近语法。语法被构造成多个级别来编码优先级,并使用重复来避免左递归,同时仍然能够正确处理左关联运算符。AST不需要担心这些事情 相反,我会这样定义AST:
data Expr = BinaryOperation BinaryOperator Expr Expr
| UnaryOperation UnaryOperator Expr
| Literal LiteralValue
| Variable Id
data BinaryOperator = Add | Sub | Mul | Div
data UnaryOperator = Not | Negate
expr :: Parser Expr
expr = do
t1 <- term
(PlusE t1 <$ symbol "+" <*> term)
<|> pure (TermE t1)
term :: Parser Term
term = do
f1 <- factor
(TimesT f1 <$ symbol "*" <*> factor)
<|> pure (FactorT f1)
factor :: Parser Factor
factor = NumberF . read <$> lexeme (many1 (satisfy isDigit))
<|> ParenF <$> between (symbol "(") (symbol ")") expr
AST不需要太接近语法。语法被构造成多个级别来编码优先级,并使用重复来避免左递归,同时仍然能够正确处理左关联运算符。AST不需要担心这些事情 相反,我会这样定义AST:
data Expr = BinaryOperation BinaryOperator Expr Expr
| UnaryOperation UnaryOperator Expr
| Literal LiteralValue
| Variable Id
data BinaryOperator = Add | Sub | Mul | Div
data UnaryOperator = Not | Negate
expr :: Parser Expr
expr = do
t1 <- term
(PlusE t1 <$ symbol "+" <*> term)
<|> pure (TermE t1)
term :: Parser Term
term = do
f1 <- factor
(TimesT f1 <$ symbol "*" <*> factor)
<|> pure (FactorT f1)
factor :: Parser Factor
factor = NumberF . read <$> lexeme (many1 (satisfy isDigit))
<|> ParenF <$> between (symbol "(") (symbol ")") expr
这里还有一个可能对您有所帮助的补充答案。我不想破坏你的乐趣,所以这里有一个非常简单的语法示例:
-- Expr = Term ['+' Term]
-- Term = Factor ['*' Factor]
-- Factor = number | '(' Expr ')'
-- number = one or more digits
使用CST
作为一种方法,我们可以将该语法表示为具体语法树(CST):
用于将具体语法转换为CST的基于Parsec的解析器可能如下所示:
data Expr = BinaryOperation BinaryOperator Expr Expr
| UnaryOperation UnaryOperator Expr
| Literal LiteralValue
| Variable Id
data BinaryOperator = Add | Sub | Mul | Div
data UnaryOperator = Not | Negate
expr :: Parser Expr
expr = do
t1 <- term
(PlusE t1 <$ symbol "+" <*> term)
<|> pure (TermE t1)
term :: Parser Term
term = do
f1 <- factor
(TimesT f1 <$ symbol "*" <*> factor)
<|> pure (FactorT f1)
factor :: Parser Factor
factor = NumberF . read <$> lexeme (many1 (satisfy isDigit))
<|> ParenF <$> between (symbol "(") (symbol ")") expr
之后,我们可以运行:
> parseExpr "1+1*(3+4)"
PlusE (FactorT (Number 1)) (TimesT (Number 1) (ParenF (PlusE
(FactorT (Number 3)) (FactorT (Number 4)))))
>
要将其转换为以下AST:
data AExpr -- Abstract Expression
= NumberA Int
| PlusA AExpr AExpr
| TimesA AExpr AExpr
我们可以写:
aexpr :: Expr -> AExpr
aexpr (TermE t) = aterm t
aexpr (PlusE t1 t2) = PlusA (aterm t1) (aterm t2)
aterm :: Term -> AExpr
aterm (FactorT f) = afactor f
aterm (TimesT f1 f2) = TimesA (afactor f1) (afactor f2)
afactor :: Factor -> AExpr
afactor (NumberF n) = NumberA n
afactor (ParenF e) = aexpr e
要解释AST,我们可以使用:
interp :: AExpr -> Int
interp (NumberA n) = n
interp (PlusA e1 e2) = interp e1 + interp e2
interp (TimesA e1 e2) = interp e1 * interp e2
然后写:
calc :: String -> Int
calc = interp . aexpr . parseExpr
之后我们有了一个粗略的小计算器:
> calc "1 + 2 * (6 + 3)"
19
>
跳过CST
作为替代方法,我们可以将解析器替换为直接将解析为AExpr
类型的AST的解析器:
expr :: Parser AExpr
expr = do
t1 <- term
(PlusA t1 <$ symbol "+" <*> term)
<|> pure t1
term :: Parser AExpr
term = do
f1 <- factor
(TimesA f1 <$ symbol "*" <*> factor)
<|> pure f1
factor :: Parser AExpr
factor = NumberA . read <$> lexeme (many1 (satisfy isDigit))
<|> between (symbol "(") (symbol ")") expr
参考程序
以下是使用中间CST的完整程序:
-- Calc1.hs, using a CST
{-# OPTIONS_GHC -Wall #-}
module Calc1 where
import Data.Char
import Text.Parsec
import Text.Parsec.String
data Expr = TermE Term | PlusE Term Term deriving (Show)
data Term = FactorT Factor | TimesT Factor Factor deriving (Show)
data Factor = NumberF Int | ParenF Expr deriving (Show)
lexeme :: Parser a -> Parser a
lexeme p = p <* spaces
symbol :: String -> Parser String
symbol = lexeme . string
expr :: Parser Expr
expr = do
t1 <- term
(PlusE t1 <$ symbol "+" <*> term)
<|> pure (TermE t1)
term :: Parser Term
term = do
f1 <- factor
(TimesT f1 <$ symbol "*" <*> factor)
<|> pure (FactorT f1)
factor :: Parser Factor
factor = NumberF . read <$> lexeme (many1 (satisfy isDigit))
<|> ParenF <$> between (symbol "(") (symbol ")") expr
parseExpr :: String -> Expr
parseExpr pgm = case parse (spaces *> expr) "(string)" pgm of
Right e -> e
Left err -> error $ show err
data AExpr -- Abstract Expression
= NumberA Int
| PlusA AExpr AExpr
| TimesA AExpr AExpr
aexpr :: Expr -> AExpr
aexpr (TermE t) = aterm t
aexpr (PlusE t1 t2) = PlusA (aterm t1) (aterm t2)
aterm :: Term -> AExpr
aterm (FactorT f) = afactor f
aterm (TimesT f1 f2) = TimesA (afactor f1) (afactor f2)
afactor :: Factor -> AExpr
afactor (NumberF n) = NumberA n
afactor (ParenF e) = aexpr e
interp :: AExpr -> Int
interp (NumberA n) = n
interp (PlusA e1 e2) = interp e1 + interp e2
interp (TimesA e1 e2) = interp e1 * interp e2
calc :: String -> Int
calc = interp . aexpr . parseExpr
--Calc1.hs,使用CST
{-#选项#GHC-墙#-}
模块Calc1,其中
导入数据.Char
导入文本.Parsec
导入Text.Parsec.String
数据表达式=术语|加上术语推导(显示)
数据项=系数T系数|时间T系数推导(显示)
数据系数=NumberF Int | ParenF Expr推导(显示)
词素::解析器a->解析器a
词素p=p解析器字符串
符号=词素。一串
expr::Parser expr
expr=do
t1错误$show错误
数据AExpr——抽象表达式
=整数
|PlusA AExpr AExpr
|TimesA AExpr AExpr
aexpr::Expr->aexpr
aexpr(TermE t)=aterm t
aexpr(脉冲t1 t2)=脉冲A(aterm t1)(aterm t2)
aterm::Term->AExpr
aterm(系数f)=系数f
aterm(TimesT f1 f2)=TimesA(afactor f1)(afactor f2)
afactor::Factor->AExpr
a因子(NumberF n)=NumberA n
afactor(ParenF e)=aexpr e
interp::AExpr->Int
interp(NumberA n)=n
interp(PlusA e1 e2)=interp e1+interp e2
interp(TimesA e1 e2)=interp e1*interp e2
计算:字符串->整数
计算=interp。aexpr。parseExpr
下面是跳过显式CST表示的更传统解决方案的完整程序:
-- Calc2.hs, with direct parsing to AST
{-# OPTIONS_GHC -Wall #-}
module Calc where
import Data.Char
import Text.Parsec
import Text.Parsec.String
lexeme :: Parser a -> Parser a
lexeme p = p <* spaces
symbol :: String -> Parser String
symbol = lexeme . string
expr :: Parser AExpr
expr = do
t1 <- term
(PlusA t1 <$ symbol "+" <*> term)
<|> pure t1
term :: Parser AExpr
term = do
f1 <- factor
(TimesA f1 <$ symbol "*" <*> factor)
<|> pure f1
factor :: Parser AExpr
factor = NumberA . read <$> lexeme (many1 (satisfy isDigit))
<|> between (symbol "(") (symbol ")") expr
parseExpr :: String -> AExpr
parseExpr pgm = case parse (spaces *> expr) "(string)" pgm of
Right e -> e
Left err -> error $ show err
data AExpr -- Abstract Expression
= NumberA Int
| PlusA AExpr AExpr
| TimesA AExpr AExpr
interp :: AExpr -> Int
interp (NumberA n) = n
interp (PlusA e1 e2) = interp e1 + interp e2
interp (TimesA e1 e2) = interp e1 * interp e2
calc :: String -> Int
calc = interp . parseExpr
--Calc2.hs,直接解析为AST
{-#选项#GHC-墙#-}
模块计算在哪里
导入数据.Char
导入文本.Parsec
导入Text.Parsec.String
词素::解析器a->解析器a
词素p=p解析器字符串
符号=词素。一串
解析器AExpr
expr=do
t1错误$show错误
数据AExpr——抽象表达式
=整数
|PlusA AExpr AExpr
|TimesA AExpr AExpr
interp::AExpr->Int
interp(NumberA n)=n
interp(PlusA e1 e2)=interp e1+interp e2
interp(TimesA e1 e2)=interp e1*interp e2
计算:字符串->整数
计算=interp。parseExpr
这里有一个可能对您有所帮助的补充答案。我不想破坏你的乐趣,所以这里有一个非常简单的语法示例:
-- Expr = Term ['+' Term]
-- Term = Factor ['*' Factor]
-- Factor = number | '(' Expr ')'
-- number = one or more digits
使用CST
作为一种方法,我们可以将该语法表示为具体语法树(CST):
用于将具体语法转换为CST的基于Parsec的解析器可能如下所示:
data Expr = BinaryOperation BinaryOperator Expr Expr
| UnaryOperation UnaryOperator Expr
| Literal LiteralValue
| Variable Id
data BinaryOperator = Add | Sub | Mul | Div
data UnaryOperator = Not | Negate
expr :: Parser Expr
expr = do
t1 <- term
(PlusE t1 <$ symbol "+" <*> term)
<|> pure (TermE t1)
term :: Parser Term
term = do
f1 <- factor
(TimesT f1 <$ symbol "*" <*> factor)
<|> pure (FactorT f1)
factor :: Parser Factor
factor = NumberF . read <$> lexeme (many1 (satisfy isDigit))
<|> ParenF <$> between (symbol "(") (symbol ")") expr
之后,我们可以运行:
> parseExpr "1+1*(3+4)"
PlusE (FactorT (Number 1)) (TimesT (Number 1) (ParenF (PlusE
(FactorT (Number 3)) (FactorT (Number 4)))))
>
要将其转换为以下AST:
data AExpr -- Abstract Expression
= NumberA Int
| PlusA AExpr AExpr
| TimesA AExpr AExpr
我们可以写:
aexpr :: Expr -> AExpr
aexpr (TermE t) = aterm t
aexpr (PlusE t1 t2) = PlusA (aterm t1) (aterm t2)
aterm :: Term -> AExpr
aterm (FactorT f) = afactor f
aterm (TimesT f1 f2) = TimesA (afactor f1) (afactor f2)
afactor :: Factor -> AExpr
afactor (NumberF n) = NumberA n
afactor (ParenF e) = aexpr e
要解释AST,我们可以使用:
interp :: AExpr -> Int
interp (NumberA n) = n
interp (PlusA e1 e2) = interp e1 + interp e2
interp (TimesA e1 e2) = interp e1 * interp e2
然后写:
calc :: String -> Int
calc = interp . aexpr . parseExpr
之后我们有了一个粗略的小计算器:
> calc "1 + 2 * (6 + 3)"
19
>
跳过CST
作为替代方法,我们可以将解析器替换为直接将解析为AExpr
类型的AST的解析器:
expr :: Parser AExpr
expr = do
t1 <- term
(PlusA t1 <$ symbol "+" <*> term)
<|> pure t1
term :: Parser AExpr
term = do
f1 <- factor
(TimesA f1 <$ symbol "*" <*> factor)
<|> pure f1
factor :: Parser AExpr
factor = NumberA . read <$> lexeme (many1 (satisfy isDigit))
<|> between (symbol "(") (symbol ")") expr
参考程序
以下是使用中间CST的完整程序:
-- Calc1.hs, using a CST
{-# OPTIONS_GHC -Wall #-}
module Calc1 where
import Data.Char
import Text.Parsec
import Text.Parsec.String
data Expr = TermE Term | PlusE Term Term deriving (Show)
data Term = FactorT Factor | TimesT Factor Factor deriving (Show)
data Factor = NumberF Int | ParenF Expr deriving (Show)
lexeme :: Parser a -> Parser a
lexeme p = p <* spaces
symbol :: String -> Parser String
symbol = lexeme . string
expr :: Parser Expr
expr = do
t1 <- term
(PlusE t1 <$ symbol "+" <*> term)
<|> pure (TermE t1)
term :: Parser Term
term = do
f1 <- factor
(TimesT f1 <$ symbol "*" <*> factor)
<|> pure (FactorT f1)
factor :: Parser Factor
factor = NumberF . read <$> lexeme (many1 (satisfy isDigit))
<|> ParenF <$> between (symbol "(") (symbol ")") expr
parseExpr :: String -> Expr
parseExpr pgm = case parse (spaces *> expr) "(string)" pgm of
Right e -> e
Left err -> error $ show err
data AExpr -- Abstract Expression
= NumberA Int
| PlusA AExpr AExpr
| TimesA AExpr AExpr
aexpr :: Expr -> AExpr
aexpr (TermE t) = aterm t
aexpr (PlusE t1 t2) = PlusA (aterm t1) (aterm t2)
aterm :: Term -> AExpr
aterm (FactorT f) = afactor f
aterm (TimesT f1 f2) = TimesA (afactor f1) (afactor f2)
afactor :: Factor -> AExpr
afactor (NumberF n) = NumberA n
afactor (ParenF e) = aexpr e
interp :: AExpr -> Int
interp (NumberA n) = n
interp (PlusA e1 e2) = interp e1 + interp e2
interp (TimesA e1 e2) = interp e1 * interp e2
calc :: String -> Int
calc = interp . aexpr . parseExpr
--Calc1.hs,使用CST
{-#选项#GHC-墙#-}
模块Calc1,其中
导入数据.Char
导入文本.Parsec
导入Text.Parsec.String
数据表达式=术语|加上术语推导(显示)
数据项=系数T系数|时间T系数推导(显示)
数据系数=NumberF Int | ParenF Expr推导(显示)
词素::解析器a->解析器a
词素p=p解析器字符串
符号=词素。一串
expr::Parser expr
expr=do
t1错误$show错误
数据AExpr——抽象表达式
=整数
|PlusA AExpr AExpr
|TimesA AExpr AExpr
aexpr::Expr->aexpr
aexpr(TermE t)=aterm t
aexpr(脉冲t1 t2)=脉冲A(aterm t1)(aterm t2)
aterm::Term->AExpr
aterm(系数f)=系数f
aterm(TimesT f1 f2)=TimesA(afactor f1)(afactor f2)
afactor::Factor->AExpr
a因子(NumberF n)=NumberA n
A演员(Pa)