ANTLR语法标记问题(ANTLR工作)
我是ANTLR的业余爱好者,我正在为一个简单的处理器创建一个解释器,我遇到了一个值令牌抛出错误的小问题。我是一名学生,所以我不是要求你帮我做家庭作业……我已经基本完成了(包括口译员的所有课堂文件),但这一问题正在击败我,尽管它可能很简单,而且就在我面前 ANTLR works一直给我这个控制台错误消息 “错误(208):newExpr.g:193:1:无法匹配以下令牌定义,因为先前的令牌与相同的输入匹配:值” 很明显,值的正则表达式有问题,但我看不出它是什么,无论是在语法中还是在语法中的其他任何地方。如果你能指出我遗漏了什么,我将不胜感激……因为谷歌搜索并没有真正帮助我找到我自己语法中的错误ANTLR语法标记问题(ANTLR工作),antlr,Antlr,我是ANTLR的业余爱好者,我正在为一个简单的处理器创建一个解释器,我遇到了一个值令牌抛出错误的小问题。我是一名学生,所以我不是要求你帮我做家庭作业……我已经基本完成了(包括口译员的所有课堂文件),但这一问题正在击败我,尽管它可能很简单,而且就在我面前 ANTLR works一直给我这个控制台错误消息 “错误(208):newExpr.g:193:1:无法匹配以下令牌定义,因为先前的令牌与相同的输入匹配:值” 很明显,值的正则表达式有问题,但我看不出它是什么,无论是在语法中还是在语法中的其他任何
grammar newExpr;
options
{
language=Java;
}
@header
{
import java.util.*;
}
@members
{
ArrayList myInitialise = new ArrayList();
ArrayList InstructionList = new ArrayList();
}
/*--------------------------------------------------------------------------------------------------------------------------------*
* PARSER RULES *
*--------------------------------------------------------------------------------------------------------------------------------*//
/*
* prog is where the interpretation beings and consists of one or more (+) 'stat' rules
*/
prog : stat+;
/*
* stat rules are the general parse rules of entire operations on the processor.
* They consist of smaller data operations rules (dataop) or memory operations (memop).
*/
stat : BASIC r1=REG c1=COMMA r2=REG c2=COMMA dataop NEWLINE
{
int reg1 = Integer.parseInt($r1.text.substring(1)); // these lines convert the token input stream and converts to an actual integer
int reg2 = Integer.parseInt($r2.text.substring(1));
int IMDT = $dataop.value; // take the immediate integer
// LOAD operation
if($BASIC.text.equals("LD"))
InstructionList.add(new ld(reg1, reg2, IMDT));
// STORE operation
else if($BASIC.text.equals("ST"))
InstructionList.add(new st(reg1, reg2, IMDT));
// SUBTRACTION operation
else if($BASIC.text.equals("SUB"))
InstructionList.add(new sub(reg1, reg2, IMDT));
// ADDITION operation
else if($BASIC.text.equals("ADD"))
InstructionList.add(new add(reg1, reg2, IMDT));
// MULTIPLICATION operation
else if($BASIC.text.equals("MUL"))
InstructionList.add(new mul(reg1, reg2, IMDT));
// DIVISION operation
else if($BASIC.text.equals("DIV"))
InstructionList.add(new div(reg1, reg2, IMDT));
}
|
i1 = INDEX '=' memop NEWLINE
{
myInitialise.add(new memInit(Integer.parseInt($i1.text), $dataop.value));
}
|
JUMP REG COMMA dataop NEWLINE
{
int R = Integer.parseInt($REG.text.substring(1));
int val = $dataop.value;
// BRANCH EQUAL operation
if($JUMP.text.equals("BEZ"))
InstructionList.add(new branchEqualZero(R,value));
// BRANCH NOT EQUAL operation
else if($JUMP.text.equals("BNEZ"))
InstructionList.add(new branchNotEqualZero(R,value));
}
|
JUMP REG NEWLINE
{
int R = Integer.parseInt($REG.text.substring(1));
InstructionList.add(new jump(R));
}
|
HALT
{
InstructionList.add(new halt());
}
;
dataop returns [int value]
: INDEX
{
$value = Integer.parseInt($INDEX.text);
}
|
VALUE
{
$value = Integer.parseInt($VALUE.text.substring(1))*-1;
};
memop returns [int value]
: INDEX
{
$value = Integer.parseInt($INDEX.text);
}
|
VALUE
{
$value = Integer.parseInt($VALUE.text.substring(1))*-1;
}
|
MEMVAL
{
if($MEMVAL.text.startsWith("-"))
{
$value = Integer.parseInt($MEMVAL.text.substring(1))*-1;
}
else
$value = Integer.parseInt($MEMVAL.text);
};
/*--------------------------------------------------------------------------------------------------------------------------------*
* LEXER RULES *
*--------------------------------------------------------------------------------------------------------------------------------*/
/*
* RegExps for BASIC instructions (load, store, add, subtract, multiply, divide
*/
BASIC : ('L' 'D') | ('S' 'T') | ('A' 'D' 'D') | ('S' 'U' 'B') | ('M' 'U' 'L') | ('D' 'I' 'V');
/*
* The comma is simply for syntactic purposes, to separate data and register references
*/
COMMA : ',';
/*
* Regular Expressions for the processor registers R0-R31
*/
REG : ('R') (('0'..'9') | ('0'..'2') ('0'..'9') | ('3') ('0'..'1') );
/*
* 'Index' is the set of regular expressions matching memory locations
*/
INDEX : ('0'..'9')
|
('0'..'9') ('0'..'9')
|
('0'..'9') ('0'..'9') ('0'..'9')
|
('0'..'9') ('0'..'9') ('0'..'9') ('0'..'9')
|
('0'..'5') ('0'..'9') ('0'..'9') ('0'..'9') ('0'..'9')
|
('6') ('0'..'4') ('0'..'9') ('0'..'9') ('0'..'9')
|
('6') ('5') ('0'..'4') ('0'..'9') ('0'..'9')
|
('6') ('5') ('5') ('0'..'2') ('0'..'9')
|
('6') ('5') ('5') ('3') ('0'..'5');
/*
* Reg Exps for memory initialisation instructions
*/
MEMVAL : ('0'..'9')+ | '-' ('0'..'9')+;
/*
* Simple integers for data values
*/
VALUE : '-' (('0'..'9') **PROBLEM IS HERE**
|
('0'..'9') ('0'..'9')
|
('0'..'9') ('0'..'9') ('0'..'9')
|
('0'..'9') ('0'..'9') ('0'..'9') ('0'..'9')
|
('0'..'5') ('0'..'9') ('0'..'9') ('0'..'9') ('0'..'9')
|
('6') ('0'..'4') ('0'..'9') ('0'..'9') ('0'..'9')
|
('6') ('5') ('0'..'4') ('0'..'9') ('0'..'9')
|
('6') ('5') ('5') ('0'..'2') ('0'..'9')
|
('6') ('5') ('5') ('3') ('0'..'6'));
/*
* Regular Expressions for return/newline characters
*/
NEWLINE : '\r'? '\n' ;
/*
* This simply makes the interpreter tolerant to whitespace
*/
WHITESPACE : (' ' | '\t' | '\u000C')+ {skip();};
/*
* RegExp for Branch on Equal to Zero/Branch on Not Equal to Zero instructions
*/
BRANCH : ('B' 'E' 'Z') | ('B' 'N' 'E' 'Z');
/*
* RegExp for jump instruction
*/
JUMP : ('J' 'R');
/*
* The HALT instruction ends the program and executes all instructions
* in the Instruction List on the data/values that have been entered
*/
HALT : ('H' 'A' 'L' 'T');
ANTLR生成的lexer是这样工作的:它尝试尽可能多地匹配,当两个(或更多)规则匹配相同数量的字符时,首先定义的规则将“获胜”。因此,您的
值
规则永远无法从MEMVAL
规则中“获胜”,因为与值
匹配的所有内容也都与MEMVAL
匹配:'-'('0'..'9')+
因此,您会看到错误消息
如果您的解析器规则之一在某一时刻可能需要一个值
标记,则lexer只会根据我提到的规则生成一个标记:lexer不考虑来自解析器的任何信息
只需删除
值
规则并将其替换为MEMVAL
(或者将MEMVAL
重命名为INT
)。然后在您的解析器规则中,只需匹配MEMVAL
(或INT
)并检查该值是否在特定的数字范围内。啊,对……我不知道它是这样处理的,但这是有意义的。谢谢,你帮了我的忙!