Parsing 使用ANTLR的DSL代码替换

Parsing 使用ANTLR的DSL代码替换,parsing,antlr,interpreter,antlr4,lexer,Parsing,Antlr,Interpreter,Antlr4,Lexer,我正在使用的DSL允许用户定义一个“完全文本替换”变量。在解析代码时,我们需要查找变量的值,然后再次从该代码开始解析 替换可以是非常简单的(单个常量)或整个语句或代码块。 这是一个模拟语法,我希望能说明我的观点 grammar a; entry : (set_variable | print_line)* ; set_variable : 'SET' ID '=' STRING_CONSTANT ';' ; print_line : 'PRINT' ID ';'

我正在使用的DSL允许用户定义一个“完全文本替换”变量。在解析代码时,我们需要查找变量的值,然后再次从该代码开始解析

替换可以是非常简单的(单个常量)或整个语句或代码块。 这是一个模拟语法,我希望能说明我的观点

grammar a;

entry
  : (set_variable
  | print_line)*
  ;

set_variable
  : 'SET' ID '=' STRING_CONSTANT ';'
  ;

print_line
  : 'PRINT' ID ';'
  ;

STRING_CONSTANT: '\'' ('\'\'' | ~('\''))* '\'' ;

ID: [a-z][a-zA-Z0-9_]* ;

VARIABLE: '&' ID;

BLANK: [ \t\n\r]+ -> channel(HIDDEN) ;
然后,连续执行的以下语句应有效

SET foo = 'Hello world!';
PRINT foo;            

SET bar = 'foo;'
PRINT &bar                    // should be interpreted as 'PRINT foo;'

SET baz = 'PRINT foo; PRINT'; // one complete statement and one incomplete statement
&baz foo;                     // should be interpreted as 'PRINT foo; PRINT foo;'
无论何时发现&variable标记,我们都会立即转而解释该变量的值。如上所述,这可能意味着您设置代码的方式是无效的,充满了只有在值恰好正确时才完成的半语句。变量可以在文本中的任意点重新定义

严格地说,当前的语言定义不允许嵌套和变量在彼此内部,但是当前的解析不能处理这个问题,如果不允许的话,我也不会感到不安

目前,我正在使用访问者构建一个口译员,但这一个我被卡住了


我如何构建一个lexer/解析器/解释器来实现这一点?谢谢你的帮助

处理需求的标准模式是实现符号表。最简单的形式是键:值存储。在访问者中,添加遇到的var声明,并在遇到var引用时读取值

如上所述,DSL没有对声明的变量定义范围要求。如果您确实需要作用域变量,那么就使用一堆key:value存储,在作用域入口和出口上推送和弹出

看这个相关的

另外,由于字符串可能包含命令,因此可以简单地将内容作为初始解析的一部分进行解析。也就是说,使用包含全套有效内容的规则扩展语法:

set_variable
   : 'SET' ID '=' stringLiteral ';'
   ;

stringLiteral: 
   Quote Quote? ( 
     (    set_variable
        | print_line
        | VARIABLE
        | ID
     )
     | STRING_CONSTANT  // redefine without the quotes
   )
   Quote
   ;

所以我找到了一个解决这个问题的办法。我认为它可能会更好,因为它可能会进行大量的数组复制,但至少目前它还可以工作

编辑:我以前是错的,我的解决方案将使用它找到的任何&包括那些位于有效位置(如字符串常量内部)的。这似乎是一个更好的解决方案:

首先,我扩展了InputStream,以便在遇到&时能够重写输入流。不幸的是,这涉及到复制阵列,我将来可能会解决这一问题:

MacroInputStream.java

    package preprocessor;

    import org.antlr.v4.runtime.ANTLRInputStream;

    public class MacroInputStream extends ANTLRInputStream {

      private HashMap<String, String> map;

      public MacroInputStream(String s, HashMap<String, String> map) {
        super(s);
        this.map = map;
      }

      public void rewrite(int startIndex, int stopIndex, String replaceText) {
        int length = stopIndex-startIndex+1;
        char[] replData = replaceText.toCharArray();
        if (replData.length == length) {
          for (int i = 0; i < length; i++) data[startIndex+i] = replData[i];
        } else {
          char[] newData = new char[data.length+replData.length-length];
          System.arraycopy(data, 0, newData, 0, startIndex);
          System.arraycopy(replData, 0, newData, startIndex, replData.length);
          System.arraycopy(data, stopIndex+1, newData, startIndex+replData.length, data.length-(stopIndex+1));
          data = newData;
          n = data.length;
        }
      }
    }
package language;

import language.DSL_GrammarLexer;

import org.antlr.v4.runtime.Token;

import java.util.HashMap;

public class MacroGrammarLexer extends MacroGrammarLexer{

  private HashMap<String, String> map;

  public DSL_GrammarLexerPre(MacroInputStream input, HashMap<String, String> map) {
    super(input);
    this.map = map;
    // TODO Auto-generated constructor stub
  }

  private MacroInputStream getInput() {
    return (MacroInputStream) _input;
  }

  @Override
  public Token nextToken() {
    Token t = super.nextToken();
    if (t.getType() == VARIABLE) {
      System.out.println("Encountered token " + t.getText()+" ===> rewriting!!!");
      getInput().rewrite(t.getStartIndex(), t.getStopIndex(),
          map.get(t.getText().substring(1)));
      getInput().seek(t.getStartIndex()); // reset input stream to previous
      return super.nextToken();
    }
    return t;   
  }   

}
    ...
    ...
    HashMap<String, String> map;  // same map as before, passed as a new argument.
    ...
    ...

public final SetContext set() throws RecognitionException {
  SetContext _localctx = new SetContext(_ctx, getState());
    enterRule(_localctx, 130, RULE_set);
    try {
        enterOuterAlt(_localctx, 1);
        {
        String vname = null; String vval = null;              // set up variables
        setState(1215); match(SET);
        setState(1216); vname = variable_name().getText();    // set vname
        setState(1217); match(EQUALS);
        setState(1218); vval = string_constant().getText();   // set vval
        System.out.println("Found SET " + vname +" = " + vval+";");
            map.put(vname, vval);
        }
    }
    catch (RecognitionException re) {
        _localctx.exception = re;
        _errHandler.reportError(this, re);
        _errHandler.recover(this, re);
    }
    finally {
        exitRule();
    }
    return _localctx;
}
    ...
    ...
包预处理器;
导入org.antlr.v4.runtime.antlInputStream;
公共类MacroInputStream扩展了AntlInputStream{
私有哈希映射;
公共宏输入流(字符串s、哈希映射){
超级(s);
this.map=map;
}
公共void重写(int startIndex、int stopIndex、String replaceText){
int length=stopIndex startIndex+1;
char[]replData=replaceText.toCharArray();
if(replData.length==长度){
对于(int i=0;i
其次,我扩展了Lexer,以便在遇到变量令牌时,调用上面的重写方法:

宏语法EXER.java

    package preprocessor;

    import org.antlr.v4.runtime.ANTLRInputStream;

    public class MacroInputStream extends ANTLRInputStream {

      private HashMap<String, String> map;

      public MacroInputStream(String s, HashMap<String, String> map) {
        super(s);
        this.map = map;
      }

      public void rewrite(int startIndex, int stopIndex, String replaceText) {
        int length = stopIndex-startIndex+1;
        char[] replData = replaceText.toCharArray();
        if (replData.length == length) {
          for (int i = 0; i < length; i++) data[startIndex+i] = replData[i];
        } else {
          char[] newData = new char[data.length+replData.length-length];
          System.arraycopy(data, 0, newData, 0, startIndex);
          System.arraycopy(replData, 0, newData, startIndex, replData.length);
          System.arraycopy(data, stopIndex+1, newData, startIndex+replData.length, data.length-(stopIndex+1));
          data = newData;
          n = data.length;
        }
      }
    }
package language;

import language.DSL_GrammarLexer;

import org.antlr.v4.runtime.Token;

import java.util.HashMap;

public class MacroGrammarLexer extends MacroGrammarLexer{

  private HashMap<String, String> map;

  public DSL_GrammarLexerPre(MacroInputStream input, HashMap<String, String> map) {
    super(input);
    this.map = map;
    // TODO Auto-generated constructor stub
  }

  private MacroInputStream getInput() {
    return (MacroInputStream) _input;
  }

  @Override
  public Token nextToken() {
    Token t = super.nextToken();
    if (t.getType() == VARIABLE) {
      System.out.println("Encountered token " + t.getText()+" ===> rewriting!!!");
      getInput().rewrite(t.getStartIndex(), t.getStopIndex(),
          map.get(t.getText().substring(1)));
      getInput().seek(t.getStartIndex()); // reset input stream to previous
      return super.nextToken();
    }
    return t;   
  }   

}
    ...
    ...
    HashMap<String, String> map;  // same map as before, passed as a new argument.
    ...
    ...

public final SetContext set() throws RecognitionException {
  SetContext _localctx = new SetContext(_ctx, getState());
    enterRule(_localctx, 130, RULE_set);
    try {
        enterOuterAlt(_localctx, 1);
        {
        String vname = null; String vval = null;              // set up variables
        setState(1215); match(SET);
        setState(1216); vname = variable_name().getText();    // set vname
        setState(1217); match(EQUALS);
        setState(1218); vval = string_constant().getText();   // set vval
        System.out.println("Found SET " + vname +" = " + vval+";");
            map.put(vname, vval);
        }
    }
    catch (RecognitionException re) {
        _localctx.exception = re;
        _errHandler.reportError(this, re);
        _errHandler.recover(this, re);
    }
    finally {
        exitRule();
    }
    return _localctx;
}
    ...
    ...
包语言;
import language.DSL_GrammarLexer;
导入org.antlr.v4.runtime.Token;
导入java.util.HashMap;
公共类MacroGrammarLexer扩展了MacroGrammarLexer{
私有哈希映射;
公共DSL_GrammarPre(宏输入流输入,HashMap映射){
超级(输入);
this.map=map;
//TODO自动生成的构造函数存根
}
私有宏输入流getInput(){
返回(宏输入流)\ u输入;
}
@凌驾
公共令牌nextToken(){
令牌t=super.nextToken();
if(t.getType()==变量){
System.out.println(“遇到的令牌”+t.getText()+“=>重写!!!”;
getInput().rewrite(t.getStartIndex(),t.getStopIndex(),
get(t.getText().substring(1));
getInput().seek(t.getStartIndex());//将输入流重置为上一个
返回super.nextToken();
}
返回t;
}   
}
最后,我修改了生成的解析器,以便在解析时设置变量:

DSL\u GrammarParser.java

    package preprocessor;

    import org.antlr.v4.runtime.ANTLRInputStream;

    public class MacroInputStream extends ANTLRInputStream {

      private HashMap<String, String> map;

      public MacroInputStream(String s, HashMap<String, String> map) {
        super(s);
        this.map = map;
      }

      public void rewrite(int startIndex, int stopIndex, String replaceText) {
        int length = stopIndex-startIndex+1;
        char[] replData = replaceText.toCharArray();
        if (replData.length == length) {
          for (int i = 0; i < length; i++) data[startIndex+i] = replData[i];
        } else {
          char[] newData = new char[data.length+replData.length-length];
          System.arraycopy(data, 0, newData, 0, startIndex);
          System.arraycopy(replData, 0, newData, startIndex, replData.length);
          System.arraycopy(data, stopIndex+1, newData, startIndex+replData.length, data.length-(stopIndex+1));
          data = newData;
          n = data.length;
        }
      }
    }
package language;

import language.DSL_GrammarLexer;

import org.antlr.v4.runtime.Token;

import java.util.HashMap;

public class MacroGrammarLexer extends MacroGrammarLexer{

  private HashMap<String, String> map;

  public DSL_GrammarLexerPre(MacroInputStream input, HashMap<String, String> map) {
    super(input);
    this.map = map;
    // TODO Auto-generated constructor stub
  }

  private MacroInputStream getInput() {
    return (MacroInputStream) _input;
  }

  @Override
  public Token nextToken() {
    Token t = super.nextToken();
    if (t.getType() == VARIABLE) {
      System.out.println("Encountered token " + t.getText()+" ===> rewriting!!!");
      getInput().rewrite(t.getStartIndex(), t.getStopIndex(),
          map.get(t.getText().substring(1)));
      getInput().seek(t.getStartIndex()); // reset input stream to previous
      return super.nextToken();
    }
    return t;   
  }   

}
    ...
    ...
    HashMap<String, String> map;  // same map as before, passed as a new argument.
    ...
    ...

public final SetContext set() throws RecognitionException {
  SetContext _localctx = new SetContext(_ctx, getState());
    enterRule(_localctx, 130, RULE_set);
    try {
        enterOuterAlt(_localctx, 1);
        {
        String vname = null; String vval = null;              // set up variables
        setState(1215); match(SET);
        setState(1216); vname = variable_name().getText();    // set vname
        setState(1217); match(EQUALS);
        setState(1218); vval = string_constant().getText();   // set vval
        System.out.println("Found SET " + vname +" = " + vval+";");
            map.put(vname, vval);
        }
    }
    catch (RecognitionException re) {
        _localctx.exception = re;
        _errHandler.reportError(this, re);
        _errHandler.recover(this, re);
    }
    finally {
        exitRule();
    }
    return _localctx;
}
    ...
    ...
。。。
...
HashMap映射;//与以前相同的映射,作为新参数传递。
...
...
public final SetContext set()引发识别异常{
SetContext _localctx=新的SetContext(_ctx,getState());
enterRule(_localctx,130,RULE\u set);
试一试{
肠子宫盐(_localctx,1);
{
String vname=null;String vval=null;//设置变量
设置状态(1215);匹配(设置);
setState(1216);vname=variable_name().getText();//设置vname
设置状态(1217);匹配(等于);
setState(1218);vval=string_constant().getText();//设置vval
System.out.println(“发现集”+vname+“=”+vval+“;”);
map.put(vname,vval);
}
}
捕获(识别异常re){
_localctx.exception=re;
_errHandler.reportError(此,re);
_errHandler.recover(此,re);
}
最后{
出口();
}
返回_localctx;
}
...
...

不幸的是,这种方法是
最终的
,因此这将使维护变得更加困难,但目前它仍然有效

在你的语法中要说明这是一种卑鄙的诡计。单个
条目中可以出现
变量的位置和数量是否有任何限制?我是说,是吗