antlr4给出了lexer岛语法的令牌识别错误_Antlr4

antlr4给出了lexer岛语法的令牌识别错误

antlr4

antlr4给出了lexer岛语法的令牌识别错误,antlr4,Antlr4,我需要antlr4来解析一些简单的HTML文件。我已经将我的语法分为解析器语法和lexer语法，这样我就可以使用孤岛语法来处理标记（inside）中的内容，如“权威ANTLR4参考”中所述。antlr4反复告诉我“令牌识别错误” 语法分析器： grammar Rule; options { tokenVocab = HTMLLexer; language = Java; } /* Parser Rules */ doc : type? html ; type : '<!

我需要antlr4来解析一些简单的HTML文件。我已经将我的语法分为解析器语法和lexer语法，这样我就可以使用孤岛语法来处理标记（inside<和>）中的内容，如“权威ANTLR4参考”中所述。antlr4反复告诉我“令牌识别错误”

语法分析器：

grammar Rule;

options {
    tokenVocab = HTMLLexer;
    language = Java;
}

/* Parser Rules */
doc : type? html ;
type : '<!DOCTYPE HTML>' ;
html : shtml head body ehtml ;

head : shead meta* ehead ;
meta : smeta ;

body : sbody ebody ;

shtml : '<' 'html' attr* '>' ;
ehtml : '<' '/html' '>' ;
shead : '<' 'head' attr* '>' ;
ehead : '<' '/head' '>' ;
smeta : '<' 'meta' attr+ '>' ;

sbody : '<' 'body' attr* '>' ;
ebody : '<' '/body' '>' ;

attr : NAME '=' VALUE ;

语法规则；
选择权{
tokenVocab=HTMLLexer；
语言=Java；
}
/*解析器规则*/
医生：类型？html；
类型：''；
html:shtml头体ehtml；
负责人：谢德·梅塔*埃赫德；
梅塔：斯梅塔；
身体：黑体；
shtml:“”；
ehtml:“”；
谢德：“；
电子头：“；
斯梅塔：“；
sbody:“”；
乌木；
属性：名称“=”值；

lexer语法：

lexer grammar HTMLLexer;

COMMENT : '<!--' .*? '-->' -> skip ;
CDATA   : '<![CDATA[' .*? ']]>' ;

OPEN      : '<'  -> pushMode(INSIDE) ;
SPEC_OPEN : '<!' -> pushMode(INSIDE) ;

TEXT : (ENTITY | ~[<&])+ ;
fragment ENTITY
    : '&' [a-zA-Z]+ ';'
    | '&#' [0-9]+ ';'
    | '&#x' [0-9A-Za-z]+ ';' ;

mode INSIDE;
CLOSE       : '>'  -> popMode ;
SLASH_CLOSE : '/>' -> popMode ;

StHTML : 'html' ;
EnHTML : '/html' ;

StHead : 'head' ;
EnHead : '/head' ;
StMeta : 'meta' ;

StBody : 'body' ;
EnBody : '/body' ;

NAME : 'class'
    | 'content'
    | 'http-equiv'
    | 'id'
    | 'lang'
    | 'name'
    | 'style'
    | 'type'
    ;

EQUALS : '=' ;

VALUE : ('"' ~["<>\r\n]+ '"')
    | ('\'' ~['<>\r\n]+ '\'')
    | ~["'<>= \t\r\n]+ ;
    ;

WS : [ \t\r\n]+ -> skip ;

lexer语法htmlexer；
注释->跳过；
CDATA:“”；
打开“”->popMode；
斜杠闭合：'/>'->popMode；
StHTML:'html'；
EnHTML:“/html”；
“头”；
EnHead:“/头”；
StMeta：‘meta’；
StBody：‘body’；
EnBody:“/body”；
名称：“类”
|“内容”
|“http等价”
|“身份证”
|“朗”
|“姓名”
|“风格”
|“类型”
;
等于：'='；
值：（“”~[“\r\n]+“”）
|（'\''-['\r\n]+'\''）
|~[“'=\t\r\n]+；
;
WS:[\t\r\n]+->跳过；

示例HTML文件：

<html>
<head>
<meta http-equiv=Content-Type content="text/html; charset=windows-1252">
<meta name=Generator content="Microsoft Word 14 (filtered)">
</head>

<body lang=EN-US style='text-justify-trim:punctuation'>
</body>
</html>

来自antlr4的输出：

line 1:6 token recognition error at: '\n'
line 2:6 token recognition error at: '\n'
line 3:5 token recognition error at: ' '
line 3:6 token recognition error at: 'htt'
line 3:9 token recognition error at: 'p'
...
[@0,0:0='<',<7>,1:0]
[@1,1:4='html',<10>,1:1]
[@2,5:5='>',<1>,1:5]
[@3,7:7='<',<7>,2:0]
[@4,8:11='head',<6>,2:1]
[@5,12:12='>',<1>,2:5]
[@6,14:14='<',<7>,3:0]
[@7,15:18='meta',<2>,3:1]
[@8,30:30='=',<9>,3:16]
[@9,51:51='=',<9>,3:37]
[@10,57:61='/html',<4>,3:43]
[@11,71:71='=',<9>,3:57]
[@12,85:85='>',<1>,3:71]
[@13,87:87='<',<7>,4:0]
[@14,88:91='meta',<2>,4:1]
[@15,115:115='=',<9>,4:28]
[@16,146:146='>',<1>,4:59]
[@17,148:148='<',<7>,5:0]
[@18,149:153='/head',<8>,5:1]
[@19,154:154='>',<1>,5:6]
[@20,157:157='<',<7>,7:0]
[@21,158:161='body',<5>,7:1]
[@22,167:167='=',<9>,7:10]
[@23,179:179='=',<9>,7:22]
[@24,211:211='>',<1>,7:54]
[@25,213:213='<',<7>,8:0]
[@26,214:218='/body',<11>,8:1]
[@27,219:219='>',<1>,8:6]
[@28,221:221='<',<7>,9:0]
[@29,222:226='/html',<4>,9:1]
[@30,227:227='>',<1>,9:6]
[@31,229:228='<EOF>',<-1>,10:0]
line 3:16 mismatched input '=' expecting NAME
line 4:28 mismatched input '=' expecting NAME
line 7:10 mismatched input '=' expecting {'>', NAME}

第1:6行令牌识别错误位于：'\n'
第2行：6处的令牌识别错误：'\n'
第3行：5处的令牌识别错误：“”
第3行：6“htt”处的令牌识别错误
第3行：9“p”处的令牌识别错误
...
[@0,0:0='',,1:5]
[@3,7:7='',,2:5]
[@6,14:14='',,3:71]
[@13,87:87='',,4:59]
[@17,148:148='',,5:6]
[@20,157:157='',,7:54]
[@25,213:213='',,8:6]
[@28,221:221='',,9:6]
[@31,229:228='',,10:0]
第3行：16不匹配的输入“=”应为名称
第4行：28不匹配的输入“=”应为名称
第7行：10不匹配的输入“=”应为{'>'，名称}

首先，您需要将解析器的声明更改为

解析器语法规则

而不是

语法规则。我看不出你的lexer有任何问题会产生那些特定的错误消息，所以这可能就是问题所在