Java ANTLR-将令牌连接到输出

Java ANTLR-将令牌连接到输出,java,token,grammar,antlr3,Java,Token,Grammar,Antlr3,使用ANTLR3,我想解析如下字符串: 姓名不为空,年龄不在(14,15) 对于这些情况,我希望获得以下AST: n0 [label="QUERY"]; n1 [label="AND"]; n1 [label="AND"]; n2 [label="IS NOT"]; n2 [label="IS NOT"]; n3 [label="name"]; n4 [label="empty"]; n5 [label="NOT IN"]; n5 [label="NOT

使用ANTLR3,我想解析如下字符串:

  • 姓名不为空,年龄不在(14,15)
对于这些情况,我希望获得以下AST:

  n0 [label="QUERY"];
  n1 [label="AND"];
  n1 [label="AND"];
  n2 [label="IS NOT"];
  n2 [label="IS NOT"];
  n3 [label="name"];
  n4 [label="empty"];
  n5 [label="NOT IN"];
  n5 [label="NOT IN"];
  n6 [label="age"];
  n7 [label="14"];
  n8 [label="15"];

  n0 -> n1 // "QUERY" -> "AND"
  n1 -> n2 // "AND" -> "IS NOT"
  n2 -> n3 // "IS NOT" -> "name"
  n2 -> n4 // "IS NOT" -> "empty"
  n1 -> n5 // "AND" -> "NOT IN"
  n5 -> n6 // "NOT IN" -> "age"
  n5 -> n7 // "NOT IN" -> "14"
  n5 -> n8 // "NOT IN" -> "15"
但是我的n2和n5节点出现如下情况: n2[label=“IS”]; n5[label=“NOT”]

也就是说,只有第一个词出现了。如何在一个令牌中加入两个令牌

我的语法是:

query
    :   expr EOF   ->   ^(QUERY expr)
    ;

expr
    :   logical_expr
    ;

logical_expr
    :   equality_expr (logical_op^ equality_expr)*
    ;

equality_expr
    :   ID equality_op+ atom    -> ^(equality_op ID atom)
    |   '(' expr ')'    ->  ^('(' expr)
    ;

atom
    :   ID
    |   id_list
    |   Int
    |   Number
    |   String
    |   '*'
    ;

id_list
    :   '(' ID (',' ID)+ ')'    ->  ID+
    |   '(' Number (',' Number)* ')' -> Number+
    |   '(' String (',' String)* ')' -> String+
    ;

equality_op
    :   'IN'
    |   'IS'
    |   'NOT'
    |   'in'
    |   'is'
    |   'not'
    ;

logical_op
    :   'AND'
    |   'OR'
    |   'and'
    |   'or'
    ;

Number
    :   Int ('.' Digit*)?
    ;

ID
    :   ('a'..'z' | 'A'..'Z' | '_' | '.' | '-' | '*' | '/' | ':' | Digit)* 
    ;

String
@after {
    setText(getText().substring(1, getText().length()-1).replaceAll("\\\\(.)", "$1"));
    }
    :  '"'  (~('"' | '\\')  | '\\' ('\\' | '"'))* '"' 
    |  '\'' (~('\'' | '\\') | '\\' ('\\' | '\''))* '\''
    ;

Comment
    :  '//' ~('\r' | '\n')* {skip();}
    |  '/*' .* '*/'         {skip();}
    ;

Space
    :  (' ' | '\t' | '\r' | '\n' | '\u000C') {skip();}
    ;

fragment Int
    :  '1'..'9' Digit*
    |  '0'
    ;

fragment Digit 
    :  '0'..'9'
    ;

indexes
    :  ('[' expr ']')+ -> ^(INDEXES expr+)
    ;

问题是equalityop+将只具有第一个匹配的值。 我看到了不同的解决方法:创建特定的规则(如果只是用于not或not in),创建子规则,或者像我在这里所做的那样连接变量:

equality_expr
    :   ID (full_op+=equality_op) + atom    -> ^(full_op ID atom)
    |   '(' expr ')'    ->  ^('(' expr)
    ;
以下问题不同,但我的想法是:

改为这样做(检查我添加的内联注释):

它生成以下AST:


此外,lexer规则应该始终至少匹配1个字符(我之前已经向您提到过)。你的lexer规则
ID
可能匹配了0个字符。

就是这样!非常感谢(再一次)!
tokens {
  IS_NOT; // added
  NOT_IN; // added
  QUERY;
  INDEXES;
}

query
    :   expr EOF   ->   ^(QUERY expr)
    ;

expr
    :   logical_expr
    ;

logical_expr
    :   equality_expr (logical_op^ equality_expr)*
    ;

equality_expr
    :   ID equality_op atom    -> ^(equality_op ID atom) // changed equality_op+ to equality_op
    |   '(' expr ')'    ->  ^('(' expr)
    ;

atom
    :   ID
    |   id_list
    |   Int
    |   Number
    |   String
    |   '*'
    ;

id_list
    :   '(' ID (',' ID)+ ')'    ->  ID+
    |   '(' Number (',' Number)* ')' -> Number+
    |   '(' String (',' String)* ')' -> String+
    ;

equality_op
    :   IS NOT -> IS_NOT // added
    |   NOT IN -> NOT_IN // added
    |   IN
    |   IS
    |   NOT
    ;

logical_op
    :   AND
    |   OR
    ;

IS : 'IS' | 'is'; // added
NOT : 'NOT' | 'not'; // added
IN : 'IN' | 'in'; // added
AND : 'AND' | 'and'; // added
OR : 'OR' | 'or'; // added

Number
    :   Int ('.' Digit*)?
    ;

ID
    :   ('a'..'z' | 'A'..'Z' | '_' | '.' | '-' | '*' | '/' | ':' | Digit)+ 
    ;

String
@after {
    setText(getText().substring(1, getText().length()-1).replaceAll("\\\\(.)", "$1"));
    }
    :  '"'  (~('"' | '\\')  | '\\' ('\\' | '"'))* '"' 
    |  '\'' (~('\'' | '\\') | '\\' ('\\' | '\''))* '\''
    ;

Comment
    :  '//' ~('\r' | '\n')* {skip();}
    |  '/*' .* '*/'         {skip();}
    ;

Space
    :  (' ' | '\t' | '\r' | '\n' | '\u000C') {skip();}
    ;

fragment Int
    :  '1'..'9' Digit*
    |  '0'
    ;

fragment Digit 
    :  '0'..'9'
    ;

indexes
    :  ('[' expr ']')+ -> ^(INDEXES expr+)
    ;