Antlr 解析没有显式endtag的文本_Antlr

Antlr 解析没有显式endtag的文本

antlr

Antlr 解析没有显式endtag的文本,antlr,Antlr,解析词典条目的问题（参见下面的示例）基于没有明确的开始和结束标记，但：一个元素的结束标记已经是下一个元素的开始标记或者：开始标记不是一个语法元素，但它是当前的解析状态（因此它取决于您在inputstream中已经“看到”的内容）示例1，简单输入： wordWithoutSpace [phonetic information] definition as everything until colon: example sentence until EOF 示例2，多定义条目： wor

解析词典条目的问题（参见下面的示例）基于没有明确的开始和结束标记，但：

一个元素的结束标记已经是下一个元素的开始标记

或者：开始标记不是一个语法元素，但它是当前的解析状态（因此它取决于您在inputstream中已经“看到”的内容）

示例1，简单输入：

wordWithoutSpace [phonetic information] definition as everything until colon: example sentence until EOF
示例2，多定义条目：

wordWithoutSpace [phonetic information] 1. first definition until colon: example sentence until second definition 2. second definition until colon: example sentence until EOF
正如我用文字或伪代码所说：

dictionary-entry : word = .+ ' ' // catch everything as word until you see a space phon = '[' .+ ']' // then follows phonetic, which is everything in brackets (MultipleMeaning | UniqueMeaning) MultipleMeaning : Int '.' definition= .+ ':' // a MultipleMeaning has a number // before the definition UniqueMeaning : definition= .+ ':'

import org.antlr.runtime.ANTLRStringStream; import org.antlr.runtime.CharStream; import org.antlr.runtime.Token; public class TestLexer { public static void main(String[] args) { String str = "Word [phon]1.definition:"; CharStream input = new ANTLRStringStream(str); DudenLexer lexer = new DudenLexer(input); Token token; while ((token = lexer.nextToken())!=Token.EOF_TOKEN) { System.out.println("Token: "+token); } } }
我已经试用过带门的Lexer（antlr版本：3.2）
TestLexer:

dictionary-entry : word = .+ ' ' // catch everything as word until you see a space phon = '[' .+ ']' // then follows phonetic, which is everything in brackets (MultipleMeaning | UniqueMeaning) MultipleMeaning : Int '.' definition= .+ ':' // a MultipleMeaning has a number // before the definition UniqueMeaning : definition= .+ ':'

import org.antlr.runtime.ANTLRStringStream; import org.antlr.runtime.CharStream; import org.antlr.runtime.Token; public class TestLexer { public static void main(String[] args) { String str = "Word [phon]1.definition:"; CharStream input = new ANTLRStringStream(str); DudenLexer lexer = new DudenLexer(input); Token token; while ((token = lexer.nextToken())!=Token.EOF_TOKEN) { System.out.println("Token: "+token); } } }
我遇到的问题：

dictionary-entry : word = .+ ' ' // catch everything as word until you see a space phon = '[' .+ ']' // then follows phonetic, which is everything in brackets (MultipleMeaning | UniqueMeaning) MultipleMeaning : Int '.' definition= .+ ':' // a MultipleMeaning has a number // before the definition UniqueMeaning : definition= .+ ':'

import org.antlr.runtime.ANTLRStringStream; import org.antlr.runtime.CharStream; import org.antlr.runtime.Token; public class TestLexer { public static void main(String[] args) { String str = "Word [phon]1.definition:"; CharStream input = new ANTLRStringStream(str); DudenLexer lexer = new DudenLexer(input); Token token; while ((token = lexer.nextToken())!=Token.EOF_TOKEN) { System.out.println("Token: "+token); } } }

我得到错误消息：行1:0规则定义失败谓词：{cs==2}

我不知道这样做是否正确

我被这件事耽搁了大约三天，非常感谢你的帮助和提示
谢谢,，
Tom
我认为你需要在语法中明确地处理换行符才能做到这一点，否则你将很难处理
2之类的事情。直到冒号的第二个定义：这是2。行
。