Antlr 解析没有显式endtag的文本

Antlr 解析没有显式endtag的文本,antlr,Antlr,解析词典条目的问题(参见下面的示例)基于 没有明确的开始和结束标记,但: 一个元素的结束标记已经是下一个元素的开始标记 或者:开始标记不是一个语法元素,但它是当前的解析状态(因此它取决于您在inputstream中已经“看到”的内容) 示例1,简单输入: wordWithoutSpace [phonetic information] definition as everything until colon: example sentence until EOF 示例2,多定义条目: wor

解析词典条目的问题(参见下面的示例)基于 没有明确的开始和结束标记,但:

  • 一个元素的结束标记已经是下一个元素的开始标记
  • 或者:开始标记不是一个语法元素,但它是当前的解析状态(因此它取决于您在inputstream中已经“看到”的内容)
示例1,简单输入:

wordWithoutSpace [phonetic information]
definition as everything until colon: example sentence until EOF
示例2,多定义条目:

wordWithoutSpace [phonetic information]
1. first definition until colon: example sentence until second definition
2. second definition until colon: example sentence until EOF
正如我用文字或伪代码所说:

dictionary-entry : 
     word = .+ ' ' // catch everything as word until you see a space
     phon = '[' .+ ']' // then follows phonetic, which is everything in brackets
     (MultipleMeaning | UniqueMeaning)

MultipleMeaning : Int '.' definition= .+ ':' // a MultipleMeaning has a number 
                                             // before the definition

UniqueMeaning : definition= .+ ':'
import org.antlr.runtime.ANTLRStringStream;
import org.antlr.runtime.CharStream;
import org.antlr.runtime.Token;

public class TestLexer {


    public static void main(String[] args) {


        String str = "Word [phon]1.definition:";
        CharStream input = new ANTLRStringStream(str);
        DudenLexer lexer = new DudenLexer(input);
        Token token;
        while ((token = lexer.nextToken())!=Token.EOF_TOKEN) {
          System.out.println("Token: "+token);
        }
    }
}
我已经试用过带门的Lexer(antlr版本:3.2)

TestLexer:

dictionary-entry : 
     word = .+ ' ' // catch everything as word until you see a space
     phon = '[' .+ ']' // then follows phonetic, which is everything in brackets
     (MultipleMeaning | UniqueMeaning)

MultipleMeaning : Int '.' definition= .+ ':' // a MultipleMeaning has a number 
                                             // before the definition

UniqueMeaning : definition= .+ ':'
import org.antlr.runtime.ANTLRStringStream;
import org.antlr.runtime.CharStream;
import org.antlr.runtime.Token;

public class TestLexer {


    public static void main(String[] args) {


        String str = "Word [phon]1.definition:";
        CharStream input = new ANTLRStringStream(str);
        DudenLexer lexer = new DudenLexer(input);
        Token token;
        while ((token = lexer.nextToken())!=Token.EOF_TOKEN) {
          System.out.println("Token: "+token);
        }
    }
}
我遇到的问题:

dictionary-entry : 
     word = .+ ' ' // catch everything as word until you see a space
     phon = '[' .+ ']' // then follows phonetic, which is everything in brackets
     (MultipleMeaning | UniqueMeaning)

MultipleMeaning : Int '.' definition= .+ ':' // a MultipleMeaning has a number 
                                             // before the definition

UniqueMeaning : definition= .+ ':'
import org.antlr.runtime.ANTLRStringStream;
import org.antlr.runtime.CharStream;
import org.antlr.runtime.Token;

public class TestLexer {


    public static void main(String[] args) {


        String str = "Word [phon]1.definition:";
        CharStream input = new ANTLRStringStream(str);
        DudenLexer lexer = new DudenLexer(input);
        Token token;
        while ((token = lexer.nextToken())!=Token.EOF_TOKEN) {
          System.out.println("Token: "+token);
        }
    }
}
  • 我得到错误消息:行1:0规则定义失败谓词:{cs==2}
  • 我不知道这样做是否正确
我被这件事耽搁了大约三天,非常感谢你的帮助和提示

谢谢,,
Tom

我认为你需要在语法中明确地处理换行符才能做到这一点,否则你将很难处理
2之类的事情。直到冒号的第二个定义:这是2。行