AntLR4:为DockerFile编写语法

AntLR4:为DockerFile编写语法,antlr,grammar,antlr4,dockerfile,Antlr,Grammar,Antlr4,Dockerfile,我想写一个能识别dockerfile的语法。(上下文无关语法) 长话短说,DockerFile是由命令组成的文本文件。 命令可以是单行或多行,并通过其名称进行标识 dockerfile命令的最简单示例: 来自AnimageneMethematicatcanContainesPrettymuchanythingButws 公开一个成员 还有一些更复杂的命令,例如: 添加另一条路径 复制另一条路径 ENV aKey=aValue 下面是最复杂的命令RUN命令。 RUN命令基本上可以是任何shell命

我想写一个能识别dockerfile的语法。(上下文无关语法) 长话短说,DockerFile是由命令组成的文本文件。 命令可以是单行或多行,并通过其名称进行标识

dockerfile命令的最简单示例:

来自AnimageneMethematicatcanContainesPrettymuchanythingButws

公开一个成员

还有一些更复杂的命令,例如:

添加另一条路径

复制另一条路径

ENV aKey=aValue

下面是最复杂的命令
RUN
命令。
RUN
命令基本上可以是任何shell命令,也可以是任何命令。我试图实现的唯一目标是通过
&&
来“拆分”
运行
命令

到目前为止我所做的:

grammar Dockerfile;

dockerfile: ((COMMENT | command))+ EOF;

COMMENT
    :   ( '#' ~[\r\n]* '\r'? '\n'
        | '/*' .*? '*/'
        ) -> skip
    ;

command: one_line | run;
one_line: (from | env | entrypoint | maintainer | workdir | add | copy | expose) (NEWLINE)*;

from: FROM ANYKEYS;
maintainer: MAINTAINER ANYKEYS ANYKEYS;

env: ENV ANYKEYS '=' ANYKEYS;

entrypoint: ENTRYPOINT ANYKEYS;

workdir: WORKDIR ANYKEYS;

add: ADD .*?;

copy: COPY src dest;
src: ANYKEYS | '.';
dest: ANYKEYS | '.';

expose: EXPOSE NUMBER;

run: RUN body NEWLINE;
body: shellCmd (SHELLAND shellCmd)* ;
shellCmd: ANYKEYS+;

SHARP: '#';
FROM: [fF][rR][oO][mM];
ENV: [eE][nN][vV];
RUN: [rR][uU][nN];
ENTRYPOINT: [eE][nN][tT][rR][yY][pP][oO][iI][nN][tT];
MAINTAINER: [mM][aA][iI][nN][tT][aA][iI][nN][eE][rR];
WORKDIR: [wW][oO][rR][kK][dD][iI][rR];
SHELLAND: '&&' | ('\\' NEWLINE '&&');
ADD: [aA][dD][dD];
COPY: [cC][oO][pP][yY];
EXPOSE: [eE][xX][pP][oO][sS][eE];

NUMBER: [0-9]+;
LETTER: [a-zA-Z];
ANYKEYS: (LETTER | NUMBER | ':' | '_' | '-' | '/' | '|' | '"' | '=' | '*' | '\\' | '\'' | '+' | ']' | '[' | '{' | '}' | ';' | '!' | '~' | '.' | '–' | '$' | '<' | '>' | '@' | ',')+;

NEWLINE: ('\n' | '\r')+;
WS : ((' ' | '\t')+) -> skip;
语法文档;
dockerfile:((COMMENT | command))+EOF;
评论
:(“#”~[\r\n]*“\r”?“\n”
| '/*' .*? '*/'
)->跳过
;
命令:一行运行;
一行:(从| env | entrypoint | mainter | workdir | add | copy | expose)(换行)*;
from:来自任意键;
维护者:维护者任意键任意键;
env:env ANYKEYS'='ANYKEYS;
入口点:入口点任意键;
workdir:workdir任意键;
添加:添加。*?;
复制:复制src dest;
src:anykey |';
目标:任意键|';
曝光:曝光次数;
跑:跑体换行;
正文:shellCmd(SHELLAND shellCmd)*;
shellCmd:ANYKEYS+;
夏普:“#”;
发件人:[fF][rR][oO][mM];
环境:[eE][nN][vV];
运行:[rR][uU][nN];
入口点:[eE][nN][tT][rR][yY][pP][oO][iI][nN][tT];
维修人员:[mM][aA][iI][nN][tT][aA][iI][nN][eE][rR];
WORKDIR:[wW][oO][rR][kK][dD][iI][rR];
SHELLAND:'&&'|('\'新行'&&');
加:[aA][dD][dD];
副本:[cC][oO][pP][yY];
曝光:[eE][xX][pP][oO][sS][eE];
编号:[0-9]+;
信:[a-zA-Z];
任何键:(字母|数字|):“|”|“|”-“|”/“|”|“=”|“*”|“\”\“\”+“|”]“[”|“{”|“}”|“|”!“|”|“~”、“|”-“+”;
换行符:('\n'|'\r')+;
WS:((“|”'\t')+)->跳过;
那怎么了? 首先,
ANYKEYS
规则是不明确的,但我可以找到更好的方法。。 接下来,
RUN exit 9000
将不起作用,产生
ANYKEY
规则中的
ANYKEY
应该匹配的
NUMBER
之外的输入'9000'期望{SHELLAND,ANYKEY,NEWLINE}错误,我不理解

我有点不知所措,不明白为什么它不匹配这样的输入,也不知道如何更好地做到这一点


感谢您的帮助和建议!

我对AntLR4不太熟悉,但您的语法必须包含一些说明的多种形式

以下说明有两种形式:

RUN
ADD
COPY
ENTRYPOINT
HEALTHCHECK
RUN <command>                          # (shell form, the command is run in a shell
                                       #  which by default is /bin/sh -c on Linux
                                       #  or cmd /S /C on Windows)

RUN ["executable", "param1", "param2"] # (exec form)
ADD <src>... <dest>
ADD ["<src>",... "<dest>"] # (this form is required for paths containing whitespace)
COPY <src>... <dest>
COPY ["<src>",... "<dest>"] # (this form is required for paths containing whitespace)
ENTRYPOINT ["executable", "param1", "param2"] # (exec form, preferred)
ENTRYPOINT command param1 param2              # (shell form)
HEALTHCHECK [OPTIONS] CMD command # (check container health by running a command inside the container)
HEALTHCHECK NONE                  # (disable any healthcheck inherited from the base image)
以下说明有三种形式:

CMD
CMD ["executable","param1","param2"] # (exec form, this is the preferred form)
CMD ["param1","param2"]              # (as default parameters to ENTRYPOINT)
CMD command param1 param2            # (shell form)

RUN
指令有两种形式:

RUN
ADD
COPY
ENTRYPOINT
HEALTHCHECK
RUN <command>                          # (shell form, the command is run in a shell
                                       #  which by default is /bin/sh -c on Linux
                                       #  or cmd /S /C on Windows)

RUN ["executable", "param1", "param2"] # (exec form)
ADD <src>... <dest>
ADD ["<src>",... "<dest>"] # (this form is required for paths containing whitespace)
COPY <src>... <dest>
COPY ["<src>",... "<dest>"] # (this form is required for paths containing whitespace)
ENTRYPOINT ["executable", "param1", "param2"] # (exec form, preferred)
ENTRYPOINT command param1 param2              # (shell form)
HEALTHCHECK [OPTIONS] CMD command # (check container health by running a command inside the container)
HEALTHCHECK NONE                  # (disable any healthcheck inherited from the base image)

复制
COPY
指令有两种形式:

RUN
ADD
COPY
ENTRYPOINT
HEALTHCHECK
RUN <command>                          # (shell form, the command is run in a shell
                                       #  which by default is /bin/sh -c on Linux
                                       #  or cmd /S /C on Windows)

RUN ["executable", "param1", "param2"] # (exec form)
ADD <src>... <dest>
ADD ["<src>",... "<dest>"] # (this form is required for paths containing whitespace)
COPY <src>... <dest>
COPY ["<src>",... "<dest>"] # (this form is required for paths containing whitespace)
ENTRYPOINT ["executable", "param1", "param2"] # (exec form, preferred)
ENTRYPOINT command param1 param2              # (shell form)
HEALTHCHECK [OPTIONS] CMD command # (check container health by running a command inside the container)
HEALTHCHECK NONE                  # (disable any healthcheck inherited from the base image)

健康检查
HEALTHCHECK
指令有两种形式:

RUN
ADD
COPY
ENTRYPOINT
HEALTHCHECK
RUN <command>                          # (shell form, the command is run in a shell
                                       #  which by default is /bin/sh -c on Linux
                                       #  or cmd /S /C on Windows)

RUN ["executable", "param1", "param2"] # (exec form)
ADD <src>... <dest>
ADD ["<src>",... "<dest>"] # (this form is required for paths containing whitespace)
COPY <src>... <dest>
COPY ["<src>",... "<dest>"] # (this form is required for paths containing whitespace)
ENTRYPOINT ["executable", "param1", "param2"] # (exec form, preferred)
ENTRYPOINT command param1 param2              # (shell form)
HEALTHCHECK [OPTIONS] CMD command # (check container health by running a command inside the container)
HEALTHCHECK NONE                  # (disable any healthcheck inherited from the base image)

指令
CMD
指令有三种形式:

CMD
CMD ["executable","param1","param2"] # (exec form, this is the preferred form)
CMD ["param1","param2"]              # (as default parameters to ENTRYPOINT)
CMD command param1 param2            # (shell form)


有关每个命令的更多信息和示例,请参见:

您陷入了。您可能应该使用lexer模式,以防该命令恰好有一个与您的关键字匹配的参数。谢谢您的回答!但是将
ANYKEYS
移动到语法末尾不会改变任何内容。我错了吗?所以我不知道该怎么做我确实知道如何解决这个问题?使用mode可以解决这个问题?如果我正确理解lexer mode,is可以用来在语法中引入某种上下文,避免像
RUN
这样模棱两可的情况被解释为RUN命令,而它在RUN体中。Antlr4将首先尝试匹配可能最长的标记。@lucastrezesniewski我读到关于模式的文档和示例,但我不明白,在我的情况下,如何从运行模式弹出,没有离开运行模式的令牌。在换行之后,当您阅读一条命令时,您会离开运行模式。嗯,是的,默认的lexer可能不太适合此类任务。您确定ANTLR在这方面做得不过分吗?我猜是手写的解析器f或者这应该很容易写,语法也很简单。