Compiler construction 如何在表驱动的解析器中使用解析表和堆栈推送映射?

Compiler construction 如何在表驱动的解析器中使用解析表和堆栈推送映射?,compiler-construction,Compiler Construction,我正在编写一个编译器,使用自顶向下的表驱动解析。我已将语法转换为LL(1),如下所示: <START> -> <prog> <aParams> -> <expr> <rept-aParams1> <aParams> -> EPSILON <aParamsTail> -> ',' <expr> <addOp> -> '+' <addOp&g

我正在编写一个编译器,使用自顶向下的表驱动解析。我已将语法转换为LL(1),如下所示:

<START> -> <prog> 

<aParams> -> <expr> <rept-aParams1> 
<aParams> -> EPSILON 

<aParamsTail> -> ',' <expr> 

<addOp> -> '+' 
<addOp> -> '-' 
<addOp> -> 'or' 

<arithExpr> -> <term> <rightrec-arithExpr> 

<arraySize> -> '[' 'intNum' ']' 
<arraySize> -> '[' ']' 

<assignOp> -> '=' 

<assignStat> -> <variable> <assignOp> <expr> 

<classDecl> -> 'class' 'id' <opt-classDecl2> '{' <rept-classDecl4> '}' ';' 

<expr> -> <arithExpr> 
<expr> -> <relExpr> 

<fParams> -> <type> 'id' <rept-fParams2> <rept-fParams3> 
<fParams> -> EPSILON 

<fParamsTail> -> ',' <type> 'id' <rept-fParamsTail3> 

<factor> -> <variable> 
<factor> -> <functionCall> 
<factor> -> 'intNum' 
<factor> -> 'floatNum' 
<factor> -> '(' <arithExpr> ')' 
<factor> -> 'not' <factor> 
<factor> -> <sign> <factor> 

<funcBody> -> <opt-funcBody0> 'do' <rept-funcBody2> 'end' 

<funcDecl> -> 'id' '(' <fParams> ')' ':' <type> ';' 
<funcDecl> -> 'id' '(' <fParams> ')' ':' 'void' ';' 

<funcDef> -> <funcHead> <funcBody> ';' 

<funcHead> -> <opt-funcHead0> 'id' '(' <fParams> ')' ':' <type> 
<funcHead> -> <opt-funcHead0> 'id' '(' <fParams> ')' ':' 'void' 

<functionCall> -> <rept-functionCall0> 'id' '(' <aParams> ')' 

<idnest> -> 'id' <rept-idnest1> '.' 
<idnest> -> 'id' '(' <aParams> ')' '.' 

<indice> -> '[' <arithExpr> ']' 

<memberDecl> -> <funcDecl> 
<memberDecl> -> <varDecl> 

<multOp> -> '*' 
<multOp> -> '/' 
<multOp> -> 'and' 

<opt-classDecl2> -> 'inherits' 'id' <rept-opt-classDecl22> 
<opt-classDecl2> -> EPSILON 

<opt-funcBody0> -> 'local' <rept-opt-funcBody01> 
<opt-funcBody0> -> EPSILON 

<opt-funcHead0> -> 'id' 'sr' 
<opt-funcHead0> -> EPSILON 

<prog> -> <rept-prog0> <rept-prog1> 'main' <funcBody> 

<relExpr> -> <arithExpr> <relOp> <arithExpr> 

<relOp> -> 'eq' 
<relOp> -> 'neq' 
<relOp> -> 'lt' 
<relOp> -> 'gt' 
<relOp> -> 'leq' 
<relOp> -> 'geq' 

<rept-aParams1> -> <aParamsTail> <rept-aParams1> 
<rept-aParams1> -> EPSILON 

<rept-classDecl4> -> <visibility> <memberDecl> <rept-classDecl4> 
<rept-classDecl4> -> EPSILON 

<rept-fParams2> -> <arraySize> <rept-fParams2> 
<rept-fParams2> -> EPSILON 

<rept-fParams3> -> <fParamsTail> <rept-fParams3> 
<rept-fParams3> -> EPSILON 

<rept-fParamsTail3> -> <arraySize> <rept-fParamsTail3> 
<rept-fParamsTail3> -> EPSILON 

<rept-funcBody2> -> <statement> <rept-funcBody2> 
<rept-funcBody2> -> EPSILON 

<rept-functionCall0> -> <idnest> <rept-functionCall0> 
<rept-functionCall0> -> EPSILON 

<rept-idnest1> -> <indice> <rept-idnest1> 
<rept-idnest1> -> EPSILON 

<rept-opt-classDecl22> -> ',' 'id' <rept-opt-classDecl22> 
<rept-opt-classDecl22> -> EPSILON 

<rept-opt-funcBody01> -> <varDecl> <rept-opt-funcBody01> 
<rept-opt-funcBody01> -> EPSILON 

<rept-prog0> -> <classDecl> <rept-prog0> 
<rept-prog0> -> EPSILON 

<rept-prog1> -> <funcDef> <rept-prog1> 
<rept-prog1> -> EPSILON 

<rept-statBlock1> -> <statement> <rept-statBlock1> 
<rept-statBlock1> -> EPSILON 

<rept-varDecl2> -> <arraySize> <rept-varDecl2> 
<rept-varDecl2> -> EPSILON 

<rept-variable0> -> <idnest> <rept-variable0> 
<rept-variable0> -> EPSILON 

<rept-variable2> -> <indice> <rept-variable2> 
<rept-variable2> -> EPSILON 

<rightrec-arithExpr> -> EPSILON 
<rightrec-arithExpr> -> <addOp> <term> <rightrec-arithExpr> 

<rightrec-term> -> EPSILON 
<rightrec-term> -> <multOp> <factor> <rightrec-term> 

<sign> -> '+' 
<sign> -> '-' 

<statBlock> -> 'do' <rept-statBlock1> 'end' 
<statBlock> -> <statement> 
<statBlock> -> EPSILON 

<statement> -> <assignStat> ';' 
<statement> -> 'if' '(' <relExpr> ')' 'then' <statBlock> 'else' <statBlock> ';' 
<statement> -> 'while' '(' <relExpr> ')' <statBlock> ';' 
<statement> -> 'read' '(' <variable> ')' ';' 
<statement> -> 'write' '(' <expr> ')' ';' 
<statement> -> 'return' '(' <expr> ')' ';' 
<statement> -> <functionCall> ';' 

<term> -> <factor> <rightrec-term> 

<type> -> 'integer' 
<type> -> 'float' 
<type> -> 'id' 

<varDecl> -> <type> 'id' <rept-varDecl2> ';' 

<variable> -> <rept-variable0> 'id' <rept-variable2> 

<visibility> -> 'public' 
<visibility> -> 'private'
On the LL(1) Parsing Table's Meaning and Construction

    The top row corresponds to the columns for all the potential terminal symbols, augmented with $ to represent the end of the parse.
    The leftmost column and second row are all zero filled, to accomodate the way Fischer and LeBlanc wrote their parser's handling of abs().
    The remaining rows correspond to production rules in the original grammar that you typed in.
    Each entry in that row maps the left-hand-side (LHS) of a production rule onto a line-number. That number is the line in which the LHS had that specific column symbol in its predict set.

    If a terminal is absent from a non-terminal's predict set, an error code is placed in the table. If that terminal is in follow(that non-terminal), the error is a POP error. Else, it's a SCAN error.

    POP error code = # of predict table productions + 1

    SCAN error code = # of predict table productions + 2

In practice, you'd want to tear the top, label row off of the table and stick it in a comment, so that you can make sense of your table. The remaining table can be used as is.
推送地图:

{"1":[26],"2":[29,10],"4":[10,-1],"5":[-2],"6":[-3],"7":[-4],"8":[45,50],"9":[-7,-6,-5],"10":[-7,-5],"11":[-8],"12":[10,7,53],"13":[-13,-12,30,-11,23,-10,-9],"14":[5],"15":[27],"16":[32,31,-10,51],"18":[33,-10,51,-1],"19":[53],"20":[18],"21":[-6],"22":[-14],"23":[-16,5,-15],"24":[13,-17],"25":[13,47],"26":[-19,34,-18,24],"27":[-13,51,-20,-16,11,-15,-10],"28":[-13,-21,-20,-16,11,-15,-10],"29":[-13,14,17],"30":[51,-20,-16,11,-15,-10,25],"31":[-21,-20,-16,11,-15,-10,25],"32":[-16,2,-15,-10,35],"33":[-22,36,-10],"34":[-22,-16,2,-15,-10],"35":[-7,5,-5],"36":[15],"37":[52],"38":[-23],"39":[-24],"40":[-25],"41":[37,-10,-26],"43":[38,-27],"45":[-28,-10],"47":[14,-29,40,39],"48":[5,28,5],"49":[-30],"50":[-31],"51":[-32],"52":[-33],"53":[-34],"54":[-35],"55":[29,3],"57":[30,21,54],"59":[31,6],"61":[32,12],"63":[33,6],"65":[34,49],"67":[35,19],"69":[36,20],"71":[37,-10,-1],"73":[38,52],"75":[39,9],"77":[40,16],"79":[41,49],"81":[42,6],"83":[43,19],"85":[44,20],"88":[45,50,4],"90":[46,13,22],"91":[-2],"92":[-3],"93":[-19,41,-18],"94":[49],"96":[-13,8],"97":[-13,48,-38,48,-37,-16,27,-15,-36],"98":[-13,48,-16,27,-15,-39],"99":[-13,-16,53,-15,-40],"100":[-13,-16,10,-15,-41],"101":[-13,-16,10,-15,-42],"102":[-13,18],"103":[46,13],"104":[-43],"105":[-44],"106":[-10],"107":[-13,42,-10,51],"108":[44,-10,43],"109":[-45],"110":[-46]} 
解析表的构造如下所示:

<START> -> <prog> 

<aParams> -> <expr> <rept-aParams1> 
<aParams> -> EPSILON 

<aParamsTail> -> ',' <expr> 

<addOp> -> '+' 
<addOp> -> '-' 
<addOp> -> 'or' 

<arithExpr> -> <term> <rightrec-arithExpr> 

<arraySize> -> '[' 'intNum' ']' 
<arraySize> -> '[' ']' 

<assignOp> -> '=' 

<assignStat> -> <variable> <assignOp> <expr> 

<classDecl> -> 'class' 'id' <opt-classDecl2> '{' <rept-classDecl4> '}' ';' 

<expr> -> <arithExpr> 
<expr> -> <relExpr> 

<fParams> -> <type> 'id' <rept-fParams2> <rept-fParams3> 
<fParams> -> EPSILON 

<fParamsTail> -> ',' <type> 'id' <rept-fParamsTail3> 

<factor> -> <variable> 
<factor> -> <functionCall> 
<factor> -> 'intNum' 
<factor> -> 'floatNum' 
<factor> -> '(' <arithExpr> ')' 
<factor> -> 'not' <factor> 
<factor> -> <sign> <factor> 

<funcBody> -> <opt-funcBody0> 'do' <rept-funcBody2> 'end' 

<funcDecl> -> 'id' '(' <fParams> ')' ':' <type> ';' 
<funcDecl> -> 'id' '(' <fParams> ')' ':' 'void' ';' 

<funcDef> -> <funcHead> <funcBody> ';' 

<funcHead> -> <opt-funcHead0> 'id' '(' <fParams> ')' ':' <type> 
<funcHead> -> <opt-funcHead0> 'id' '(' <fParams> ')' ':' 'void' 

<functionCall> -> <rept-functionCall0> 'id' '(' <aParams> ')' 

<idnest> -> 'id' <rept-idnest1> '.' 
<idnest> -> 'id' '(' <aParams> ')' '.' 

<indice> -> '[' <arithExpr> ']' 

<memberDecl> -> <funcDecl> 
<memberDecl> -> <varDecl> 

<multOp> -> '*' 
<multOp> -> '/' 
<multOp> -> 'and' 

<opt-classDecl2> -> 'inherits' 'id' <rept-opt-classDecl22> 
<opt-classDecl2> -> EPSILON 

<opt-funcBody0> -> 'local' <rept-opt-funcBody01> 
<opt-funcBody0> -> EPSILON 

<opt-funcHead0> -> 'id' 'sr' 
<opt-funcHead0> -> EPSILON 

<prog> -> <rept-prog0> <rept-prog1> 'main' <funcBody> 

<relExpr> -> <arithExpr> <relOp> <arithExpr> 

<relOp> -> 'eq' 
<relOp> -> 'neq' 
<relOp> -> 'lt' 
<relOp> -> 'gt' 
<relOp> -> 'leq' 
<relOp> -> 'geq' 

<rept-aParams1> -> <aParamsTail> <rept-aParams1> 
<rept-aParams1> -> EPSILON 

<rept-classDecl4> -> <visibility> <memberDecl> <rept-classDecl4> 
<rept-classDecl4> -> EPSILON 

<rept-fParams2> -> <arraySize> <rept-fParams2> 
<rept-fParams2> -> EPSILON 

<rept-fParams3> -> <fParamsTail> <rept-fParams3> 
<rept-fParams3> -> EPSILON 

<rept-fParamsTail3> -> <arraySize> <rept-fParamsTail3> 
<rept-fParamsTail3> -> EPSILON 

<rept-funcBody2> -> <statement> <rept-funcBody2> 
<rept-funcBody2> -> EPSILON 

<rept-functionCall0> -> <idnest> <rept-functionCall0> 
<rept-functionCall0> -> EPSILON 

<rept-idnest1> -> <indice> <rept-idnest1> 
<rept-idnest1> -> EPSILON 

<rept-opt-classDecl22> -> ',' 'id' <rept-opt-classDecl22> 
<rept-opt-classDecl22> -> EPSILON 

<rept-opt-funcBody01> -> <varDecl> <rept-opt-funcBody01> 
<rept-opt-funcBody01> -> EPSILON 

<rept-prog0> -> <classDecl> <rept-prog0> 
<rept-prog0> -> EPSILON 

<rept-prog1> -> <funcDef> <rept-prog1> 
<rept-prog1> -> EPSILON 

<rept-statBlock1> -> <statement> <rept-statBlock1> 
<rept-statBlock1> -> EPSILON 

<rept-varDecl2> -> <arraySize> <rept-varDecl2> 
<rept-varDecl2> -> EPSILON 

<rept-variable0> -> <idnest> <rept-variable0> 
<rept-variable0> -> EPSILON 

<rept-variable2> -> <indice> <rept-variable2> 
<rept-variable2> -> EPSILON 

<rightrec-arithExpr> -> EPSILON 
<rightrec-arithExpr> -> <addOp> <term> <rightrec-arithExpr> 

<rightrec-term> -> EPSILON 
<rightrec-term> -> <multOp> <factor> <rightrec-term> 

<sign> -> '+' 
<sign> -> '-' 

<statBlock> -> 'do' <rept-statBlock1> 'end' 
<statBlock> -> <statement> 
<statBlock> -> EPSILON 

<statement> -> <assignStat> ';' 
<statement> -> 'if' '(' <relExpr> ')' 'then' <statBlock> 'else' <statBlock> ';' 
<statement> -> 'while' '(' <relExpr> ')' <statBlock> ';' 
<statement> -> 'read' '(' <variable> ')' ';' 
<statement> -> 'write' '(' <expr> ')' ';' 
<statement> -> 'return' '(' <expr> ')' ';' 
<statement> -> <functionCall> ';' 

<term> -> <factor> <rightrec-term> 

<type> -> 'integer' 
<type> -> 'float' 
<type> -> 'id' 

<varDecl> -> <type> 'id' <rept-varDecl2> ';' 

<variable> -> <rept-variable0> 'id' <rept-variable2> 

<visibility> -> 'public' 
<visibility> -> 'private'
On the LL(1) Parsing Table's Meaning and Construction

    The top row corresponds to the columns for all the potential terminal symbols, augmented with $ to represent the end of the parse.
    The leftmost column and second row are all zero filled, to accomodate the way Fischer and LeBlanc wrote their parser's handling of abs().
    The remaining rows correspond to production rules in the original grammar that you typed in.
    Each entry in that row maps the left-hand-side (LHS) of a production rule onto a line-number. That number is the line in which the LHS had that specific column symbol in its predict set.

    If a terminal is absent from a non-terminal's predict set, an error code is placed in the table. If that terminal is in follow(that non-terminal), the error is a POP error. Else, it's a SCAN error.

    POP error code = # of predict table productions + 1

    SCAN error code = # of predict table productions + 2

In practice, you'd want to tear the top, label row off of the table and stick it in a comment, so that you can make sense of your table. The remaining table can be used as is.

然而,考虑到这些,我并不完全确定如何继续。我只是好奇,给定这两个对象,一般的算法是什么来解析一个令牌流,并确定它的语法正确。

这对您没有多大帮助,因为如下所述,LL(1)为该语法生成的解析表是不准确的

然而,值得一提的是,我对这些表进行了反向工程。通过阅读本手册,您可能会对本程序有更深入的了解。(注意:链接不是背书,既不是书也不是供应商。我只是从工具中复制了它。)

终端符号按顺序出现在解析表的顶行中(指令中说应该删除以供使用)。因此,端子符号1是
,'
,符号2是
'+'
,以此类推,直到符号46,这是通常用作输入结束标记的
$
。(这不同于
“$”
,后者是一个字面上的美元符号。)

非终端符号不会显式显示(因此无法从表中恢复它们的名称),但它们也会按顺序编号。其中有54个,解析表的每一行(前两行之后)对应一个非终端符号

该工具输出的预测集部分列出了110个产品(及其相应的索引)。每个产品对应于“推送映射”中的一个条目,它(出于我不知道的原因)使用产品编号的字符串转换作为键

推送映射中的对应值是一个索引列表:负索引指终端,正索引指非终端。未使用索引0,这就是未使用解析映射的第0行的原因。从这些索引中,可以重建产品的右侧,但它们实际上用于指示在解析的每个步骤中推送到解析堆栈上的内容

堆栈包含列表当前预测,堆栈的顶部元素是解析中此时的即时预测

因此,算法如下所示:

  • 将解析器堆栈初始化为
    [1,-46]
    ,这表明当前预测由生产的右侧
    ->
    和输入结束标记
    $
    组成

  • 重复以下步骤,直到因错误或验收而终止:

  • 如果堆栈顶部为负值:
    • 如果前瞻令牌具有相应的令牌编号(即堆栈顶部的绝对值),则弹出堆栈并接受前瞻令牌。如果该标记是输入结束指示符,则解析完成,输入有效。否则,新的先行令牌就是下一个输入令牌
    • 如果前瞻标记与堆栈顶部不对应,则输入不正确。报告错误并终止分析
  • 如果堆栈顶部为正值:
    • parseTable[stack.top()][lookahead]
      中检索值
      rhs
      。如果
      rhs
      的值大于生产数量(在本例中为值111或112),则输入不正确。报告错误并终止分析。(该值将告诉您是扫描错误还是pop错误,但这可能对您没有多大影响。它可以用于改进错误报告。)
    • 弹出解析堆栈,从末尾开始将元素从
      pushMap[rhs]
      推送到堆栈上。(例如,如果
      rhs
      为4,您将使用
      pushMap[“4”]
      中的列表,即
      [10,-1]
      。因此,您将首先将
      -1
      推到解析器堆栈上,然后将
      10
      推到解析器堆栈上。)
    • 对于黑客工具生成的推送图,ε右侧的推送图中似乎没有条目。因此,如果
      pushMap[rhs]
      不存在,只需弹出解析堆栈;没有什么可推的
  • 该算法不包括为成功解析生成语法树的任何过程。但是,如果您想做的不仅仅是确定输入是否是有效的程序,那么您肯定需要生成某种语法树


    注意:语法不是LL(1),因此解析表是错误的。 我不知道你应该给你正在使用的工具多少可信度

    您的语法不是LL(1),但是该工具没有提供任何关于该事实的指示

    一个简单的例子是

    <arraySize> -> '[' 'intNum' ']' 
    <arraySize> -> '[' ']' 
    
    由于
    relExpr
    可以以
    arithExpr
    开头,因此
    expr
    的两个产品重叠;通过对单个前瞻令牌(或者任何恒定数量的前瞻令牌)的检查,无法预测
    expr
    是否是一个比较。在解析完初始算术表达式(可能具有任意长度)之前,您无法判断

    如果您坚持LL(1),则需要执行以下操作:

    <expr> -> <arithExpr> <optional-relExpr-tail>
    <optional_relExpr-tail> -> EPSILON
    <optional_relExpr-tail> -> <relop> <arithExpr>
    
    ->
    ->ε
    ->  
    
    此外,
    expr
    会导致
    因子
    ,它可以是
    变量
    函数调用
    ,两者都以
    idnest
    开头(最终导致终端
    id
    )。但这也不是全部,因为您可能会在
    idnest
    functionCall
    中遇到序列
    (“”)

    也许这个工具并不打算告诉你你的语法不是LL(1),但在我看来它应该是n