Comments JFlex将嵌套注释作为一个标记进行匹配_Comments_Grammar_Lex_Jflex

Comments JFlex将嵌套注释作为一个标记进行匹配

Comments JFlex将嵌套注释作为一个标记进行匹配,comments,grammar,lex,jflex,Comments,Grammar,Lex,Jflex,在Mathematica中，注释以（*开头，以*）结尾，注释可以嵌套。我目前使用JFlex扫描注释的方法包含以下代码 %xstate IN_COMMENT "(*" { yypushstate(IN_COMMENT); return MathematicaElementTypes.COMMENT;} <IN_COMMENT> { "(*" {yypushstate(IN_COMMENT); return MathematicaElementTypes.COMM

在Mathematica中，注释以

（*

开头，以

*）

结尾，注释可以嵌套。我目前使用JFlex扫描注释的方法包含以下代码

%xstate IN_COMMENT

"(*"  { yypushstate(IN_COMMENT); return MathematicaElementTypes.COMMENT;}

<IN_COMMENT> {
  "(*"        {yypushstate(IN_COMMENT); return MathematicaElementTypes.COMMENT;}
  [^\*\)\(]*  {return MathematicaElementTypes.COMMENT;}
  "*)"        {yypopstate(); return MathematicaElementTypes.COMMENT;}
  [\*\)\(]    {return MathematicaElementTypes.COMMENT;}
  .           {return MathematicaElementTypes.BAD_CHARACTER;}
}

让我有机会跟踪我正在处理的嵌套注释级别

不幸的是，这导致一条注释有几个

注释

标记，因为我必须匹配嵌套的注释开始和注释结束

问题：JFlex是否可以将其API与

yypushuck

或

advance（）

等方法一起使用，以便在整个注释范围内仅返回一个标记，即使注释是嵌套的？

Java传统注释在示例语法中定义为

TraditionalComment   = "/*" [^*] ~"*/" | "/*" "*"+ "/"

我想这个表达式也适用于Mathematica的评论。

似乎赏金是不需要的，因为解决方案如此简单，以至于我没有考虑它。让我解释一下。扫描简单嵌套注释时

(* (*..*) *)

我必须跟踪我看到的开始评论标记的数量，以便最终，在最后一个真正的结束评论上，我可以将整个评论作为一个标记返回

我没有意识到的是，当JFlex与某些内容匹配时，不需要告诉它前进到下一部分。经过仔细审查，我发现这只是我不喜欢的一个部分中的一部分：

因为我们还没有向解析器返回一个值，所以我们的扫描程序会立即进行

因此，

flex

文件中的规则如下

[^\(\*\)]+ { }

读取所有字符，可能是注释开始/结束的字符除外，不执行任何操作，但会前进到下一个标记

这意味着我可以简单地执行以下操作。在

YYINITIAL

状态中，我有一个规则匹配开始注释，但它不做其他事情，然后将lexer切换到

In_comment

状态。特别是，它不会返回任何令牌：

{CommentStart}      { yypushstate(IN_COMMENT);}

现在，我们处于

in_COMMENT

状态，在那里，我也这样做。我吃光了所有字符，但从未返回令牌。当我点击一个新的开始评论时，我会小心地将它推到一个堆栈上，但什么也不做。只是，当我点击最后一条结束注释时，我知道我将

保留在_comment

状态，这是唯一一点，我最终返回标记。让我们看看规则：

<IN_COMMENT> {
  {CommentStart}  { yypushstate(IN_COMMENT);}
  [^\(\*\)]+      { }
  {CommentEnd}    {  yypopstate();
                     if(yystate() != IN_COMMENT)
                       return MathematicaElementTypes.COMMENT_CONTENT;
                  }
    [\*\)\(]      { }
    .             { return MathematicaElementTypes.BAD_CHARACTER; }
}

{
{CommentStart}{yypushtate（IN_COMMENT）；}
[^\(\*\)]+      { }
{CommentEnd}{yypostate（）；
if（yystate（）！=IN_注释）
返回MathematicaElementTypes.COMMENT\u内容；
}
[\*\)\(]      { }
.{return MathematicaElementTypes.BAD_CHARACTER；}
}

就是这样。现在，不管你的注释嵌套得有多深，你都会得到一个包含整个注释的标记

现在，我为这么简单的问题感到尴尬和抱歉

最后说明如果你正在做这样的事情，你必须记住，只有当你点击正确的结束“字符”时，你才会返回一个标记。因此，你绝对应该制定一个捕获文件结尾的规则。默认行为是只返回注释标记，所以你需要另一行（有用与否，我想优雅地结束）:

{yyyClearstack（）；yybegin（YYINITIAL）；
返回MathematicaElementTypes.COMMENT；}

当我第一次写答案时，我甚至没有意识到现有的答案中有一个是关于提问者本身的。另一方面，我很少在这么小的SO lex社区中找到赏金。因此，在我看来，学习足够多的Java和jflex来编写示例是值得的：

/* JFlex scanner: to recognize nested comments in Mathematica style
 */

%%

%{
  /* counter for open (nested) comments */
  int open = 0;
%}

%state IN_COMMENT

%%

/* any state */

"(*" { if (!open++) yybegin(IN_COMMENT); }

"*)" { 
    if (open) {
      if (!--open) {
        yybegin(YYINITIAL);
        return MathematicaElementTypes.COMMENT;
      }
    } else {
      /* or return MathematicaElementTypes.BAD_CHARACTER;
      /* or: throw new Error("'*)' without '(*'!"); */
    }
  }

<IN_COMMENT> {
  . |
  \n { }
}

<<EOF>> {
    if (open) {
      /* This is obsolete if the scanner is instanced new for
       * each invocation.
       */
      open = 0; yybegin(IN_COMMENT);
      /* Notify about syntax error, e.g. */
      throw new Error("Premature end of file! ("
        + open + " open comments not closed.)");
    }
    return MathematicaElementTypes.EOF; /* just a guess */
  }

我在Windows 10（64位）上用cygwin中的flex和g++编译了它：

该警告是由于

%选项noyywrap

而出现的。我想这并不意味着有任何伤害，可以忽略

现在，我做了一些测试：

$ cat >good-text.txt <<EOF
> Test for nested comments.
> (* a comment *)
> (* a (* nested *) comment *)
> No comment.
> (* a
> (* nested
> (* multiline *)
>  *)
>  comment *)
> End of file.
> EOF

$ cat good-text | ./scan-nested-comments
Test for nested comments.
EMIT TOKEN COMMENT(lexem: '(* a comment *)')

EMIT TOKEN COMMENT(lexem: '(* a (* nested *) comment *)')

No comment.
EMIT TOKEN COMMENT(lexem: '(* a
(* nested
(* multiline *)
 *)
 comment *)')

End of file.

$ cat >bad-text-1.txt <<EOF
> Test for wrong comment.
> (* a comment *)
> with wrong nesting *)
> End of file.
> EOF

$ cat >bad-text-1.txt | ./scan-nested-comments
Test for wrong comment.
EMIT TOKEN COMMENT(lexem: '(* a comment *)')

with wrong nesting ERROR: '*)' without '(*'!

End of file.

$ cat >bad-text-2.txt <<EOF
> Test for wrong comment.
> (* a comment
> which is not closed.
> End of file.
> EOF

$ cat >bad-text-2.txt | ./scan-nested-comments
Test for wrong comment.
ERROR: Premature end of file!
(1 open comments not closed.)

$

$cat>good-text.txt（*注释*）
>（*a（*嵌套*）注释*）
>无可奉告。
>（*a）
>（*嵌套
>（*多行*）
>  *)
>评论*）
>文件结束。
>EOF
$cat good text |/扫描嵌套注释
测试嵌套注释。
发出标记注释（lexem:“（*a注释*）”）
发出标记注释（lexem:“（*a（*nested*）COMMENT*）”）
无可奉告。
发出标记注释（lexem:'（*a
（*嵌套
（*多行*）
*)
评论*）'）
文件结束。
$cat>bad-text-1.txt（*注释*）
>嵌套错误*）
>文件结束。
>EOF
$cat>bad-text-1.txt |/扫描嵌套注释
测试错误的评论。
发出标记注释（lexem:“（*a注释*）”）
有错误的嵌套错误：“*）”没有“（*”！
文件结束。
$cat>bad-text-2.txt（*注释
>它还没有关闭。
>文件结束。
>EOF
$cat>bad-text-2.txt |/扫描嵌套注释
测试错误的评论。
错误：文件过早结束！
（1打开的注释未关闭。）
$

Java的块注释是非嵌套的，因此我认为它无法解决原始问题。感谢您的回答。阅读后，我意识到我在回答（和问题）中漏掉了一个要点.我使用jflex作为IntelliJ IDEA插件，该插件由所谓的jflex Adaper实现。因此，我不需要返回字符串，只需要返回标记本身。除此之外，我还使用堆栈来计算我所处的嵌套级别，我们的方法类似，似乎我没有犯任何愚蠢的错误。无论如何，谢谢你非常感谢您不是用Java而是用C来做这件事，因为它确实为未来的访问者增加了价值。因为我的答案解决了我的直接问题，我将把它标记为已被接受，但我非常高兴我可以给您奖金，并且在/dev/null.+1 btw中没有失去声誉。谢谢您在我尝试移植示例t时出现的奖金o Java/jflex。正如前面所提到的，应该“有一点保留”。

/* JFlex scanner: to recognize nested comments in Mathematica style
 */

%%

%{
  /* counter for open (nested) comments */
  int open = 0;
%}

%state IN_COMMENT

%%

/* any state */

"(*" { if (!open++) yybegin(IN_COMMENT); }

"*)" { 
    if (open) {
      if (!--open) {
        yybegin(YYINITIAL);
        return MathematicaElementTypes.COMMENT;
      }
    } else {
      /* or return MathematicaElementTypes.BAD_CHARACTER;
      /* or: throw new Error("'*)' without '(*'!"); */
    }
  }

<IN_COMMENT> {
  . |
  \n { }
}

<<EOF>> {
    if (open) {
      /* This is obsolete if the scanner is instanced new for
       * each invocation.
       */
      open = 0; yybegin(IN_COMMENT);
      /* Notify about syntax error, e.g. */
      throw new Error("Premature end of file! ("
        + open + " open comments not closed.)");
    }
    return MathematicaElementTypes.EOF; /* just a guess */
  }

%{
#include <cstdio>
#include <string>

// counter for open (nested) comments
static int open = 0;
// buffer for collected comments
static std::string comment;
%}

/* make never interactive (prevent usage of certain C functions) */
%option never-interactive
/* force lexer to process 8 bit ASCIIs (unsigned characters) */
%option 8bit
/* prevent usage of yywrap */
%option noyywrap

%s IN_COMMENT

%%

"(*" {
  if (!open++) BEGIN(IN_COMMENT);
  comment += "(*";
}

"*)" {
  if (open) {
    comment += "*)";
    if (!--open) {
      BEGIN(INITIAL);
      printf("EMIT TOKEN COMMENT(lexem: '%s')\n", comment.c_str());
      comment.clear();
    }
  } else {
    printf("ERROR: '*)' without '(*'!\n");
  }
}

<IN_COMMENT>{
  . |
  "\n" { comment += *yytext; }
}

<<EOF>> {
  if (open) {
    printf("ERROR: Premature end of file!\n"
      "(%d open comments not closed.)\n", open);
    return 1;
  }
  return 0;
}

%%

int main(int argc, char **argv)
{
  if (argc > 1) {
    yyin = fopen(argv[1], "r");
    if (!yyin) {
      printf("Cannot open file '%s'!\n", argv[1]);
      return 1;
    }
  } else yyin = stdin;
  return yylex();
}

$ flex -oscan-nested-comments.cc scan-nested-comments.l ; g++ -o scan-nested-comments scan-nested-comments.cc
scan-nested-comments.cc:398:0: warning: "yywrap" redefined

 ^
scan-nested-comments.cc:74:0: note: this is the location of the previous definition

 ^

$

$ cat >good-text.txt <<EOF
> Test for nested comments.
> (* a comment *)
> (* a (* nested *) comment *)
> No comment.
> (* a
> (* nested
> (* multiline *)
>  *)
>  comment *)
> End of file.
> EOF

$ cat good-text | ./scan-nested-comments
Test for nested comments.
EMIT TOKEN COMMENT(lexem: '(* a comment *)')

EMIT TOKEN COMMENT(lexem: '(* a (* nested *) comment *)')

No comment.
EMIT TOKEN COMMENT(lexem: '(* a
(* nested
(* multiline *)
 *)
 comment *)')

End of file.

$ cat >bad-text-1.txt <<EOF
> Test for wrong comment.
> (* a comment *)
> with wrong nesting *)
> End of file.
> EOF

$ cat >bad-text-1.txt | ./scan-nested-comments
Test for wrong comment.
EMIT TOKEN COMMENT(lexem: '(* a comment *)')

with wrong nesting ERROR: '*)' without '(*'!

End of file.

$ cat >bad-text-2.txt <<EOF
> Test for wrong comment.
> (* a comment
> which is not closed.
> End of file.
> EOF

$ cat >bad-text-2.txt | ./scan-nested-comments
Test for wrong comment.
ERROR: Premature end of file!
(1 open comments not closed.)

$