Antlr 解析没有显式endtag的文本
解析词典条目的问题(参见下面的示例)基于 没有明确的开始和结束标记,但:Antlr 解析没有显式endtag的文本,antlr,Antlr,解析词典条目的问题(参见下面的示例)基于 没有明确的开始和结束标记,但: 一个元素的结束标记已经是下一个元素的开始标记 或者:开始标记不是一个语法元素,但它是当前的解析状态(因此它取决于您在inputstream中已经“看到”的内容) 示例1,简单输入: wordWithoutSpace [phonetic information] definition as everything until colon: example sentence until EOF 示例2,多定义条目: wor
- 一个元素的结束标记已经是下一个元素的开始标记
- 或者:开始标记不是一个语法元素,但它是当前的解析状态(因此它取决于您在inputstream中已经“看到”的内容)
wordWithoutSpace [phonetic information]
definition as everything until colon: example sentence until EOF
示例2,多定义条目:
wordWithoutSpace [phonetic information]
1. first definition until colon: example sentence until second definition
2. second definition until colon: example sentence until EOF
正如我用文字或伪代码所说:
dictionary-entry :
word = .+ ' ' // catch everything as word until you see a space
phon = '[' .+ ']' // then follows phonetic, which is everything in brackets
(MultipleMeaning | UniqueMeaning)
MultipleMeaning : Int '.' definition= .+ ':' // a MultipleMeaning has a number
// before the definition
UniqueMeaning : definition= .+ ':'
import org.antlr.runtime.ANTLRStringStream;
import org.antlr.runtime.CharStream;
import org.antlr.runtime.Token;
public class TestLexer {
public static void main(String[] args) {
String str = "Word [phon]1.definition:";
CharStream input = new ANTLRStringStream(str);
DudenLexer lexer = new DudenLexer(input);
Token token;
while ((token = lexer.nextToken())!=Token.EOF_TOKEN) {
System.out.println("Token: "+token);
}
}
}
我已经试用过带门的Lexer(antlr版本:3.2)
TestLexer:
dictionary-entry :
word = .+ ' ' // catch everything as word until you see a space
phon = '[' .+ ']' // then follows phonetic, which is everything in brackets
(MultipleMeaning | UniqueMeaning)
MultipleMeaning : Int '.' definition= .+ ':' // a MultipleMeaning has a number
// before the definition
UniqueMeaning : definition= .+ ':'
import org.antlr.runtime.ANTLRStringStream;
import org.antlr.runtime.CharStream;
import org.antlr.runtime.Token;
public class TestLexer {
public static void main(String[] args) {
String str = "Word [phon]1.definition:";
CharStream input = new ANTLRStringStream(str);
DudenLexer lexer = new DudenLexer(input);
Token token;
while ((token = lexer.nextToken())!=Token.EOF_TOKEN) {
System.out.println("Token: "+token);
}
}
}
我遇到的问题:
dictionary-entry :
word = .+ ' ' // catch everything as word until you see a space
phon = '[' .+ ']' // then follows phonetic, which is everything in brackets
(MultipleMeaning | UniqueMeaning)
MultipleMeaning : Int '.' definition= .+ ':' // a MultipleMeaning has a number
// before the definition
UniqueMeaning : definition= .+ ':'
import org.antlr.runtime.ANTLRStringStream;
import org.antlr.runtime.CharStream;
import org.antlr.runtime.Token;
public class TestLexer {
public static void main(String[] args) {
String str = "Word [phon]1.definition:";
CharStream input = new ANTLRStringStream(str);
DudenLexer lexer = new DudenLexer(input);
Token token;
while ((token = lexer.nextToken())!=Token.EOF_TOKEN) {
System.out.println("Token: "+token);
}
}
}
- 我得到错误消息:行1:0规则定义失败谓词:{cs==2}
- 我不知道这样做是否正确
Tom我认为你需要在语法中明确地处理换行符才能做到这一点,否则你将很难处理
2之类的事情。直到冒号的第二个定义:这是2。行
。