Regex 正则表达式语法分析器字符串结束错误_Regex_Parsing_Grammar

Regex 正则表达式语法分析器字符串结束错误

regex parsing

Regex 正则表达式语法分析器字符串结束错误,regex,parsing,grammar,Regex,Parsing,Grammar,我正在写语法来编译.abc文件。这些是文本文件，其中每行文本都是一种音乐声音（演奏一些音符的乐器）。在我的语法中，我利用文本的逐行结构一次解析一行。简化语法如下所示 // Body // spaces and tabs have explicit meaning in the body, don't automatically ignore them abc_body ::= abc_line+; abc_line ::= element+ end_of_line (lyric end_of

我正在写语法来编译.abc文件。这些是文本文件，其中每行文本都是一种音乐声音（演奏一些音符的乐器）。在我的语法中，我利用文本的逐行结构一次解析一行。简化语法如下所示

// Body

// spaces and tabs have explicit meaning in the body, don't automatically ignore them

abc_body ::= abc_line+;
abc_line ::= element+ end_of_line (lyric end_of_line)?  | middle_of_body_field | comment;
element ::= note_element | rest_element | tuplet_element | barline | nth_repeat | space_or_tab; 

// notes
note_element ::= note | chord;

note ::= pitch note_length?;
pitch ::= accidental? basenote octave?;
octave ::= "'"+ | ","+;
note_length ::= (digit+)? ("/" (digit+)?)?;
note_length_strict ::= digit+ "/" digit+;

// "^" is sharp, "_" is flat, and "=" is neutral
accidental ::= "^" | "^^" | "_" | "__" | "=";

basenote ::= "C" | "D" | "E" | "F" | "G" | "A" | "B" | "c" | "d" | "e" | "f" | "g" | "a" | "b";

// rests
rest_element ::= "z" note_length?;

// tuplets
tuplet_element ::= tuplet_spec note_element+;
tuplet_spec ::= "(" digit ;

// chords
chord ::= "[" note+ "]";

barline ::= "|" | "||" | "[|" | "|]" | ":|" | "|:";
nth_repeat ::= "[1" | "[2";

// A voice field might reappear in the middle of a piece
// to indicate the change of a voice
middle_of_body_field ::= field_voice;

lyric ::= "w:" lyrical_element*;
lyrical_element ::= " "+ | "-" | "_" | "*" | "~" | backslash_hyphen | "|" | lyric_text;
// lyric_text should be defined appropriately
lyric_text ::= [.]*;

backslash_hyphen ::= "\\" "-";
//backslash immediately followed by hyphen

// General

comment ::= space_or_tab* "%" comment_text newline;
//comment_text should be defined appropriately
comment_text ::= [.]*;

end_of_line ::= newline | comment;

digit ::= [0-9];
newline ::= "\n" | "\r" "\n"?;
space_or_tab ::= " " | "\t";

text ::= .*;

但我对这种方法有一个问题。对于任何有效的.abc文件，我在文件的最后一行得到一个错误。解析器试图匹配到\u行的结尾，但遇到了字符串的结尾。这就意味着在最后一次之后，还需要一条新的线路。有什么建议或解决这个问题的方法吗？

一种方法是重构语法（即，不改变语言），将最后的

换行符

从

abc\u行

中分离出来，也就是说，到您可以编写的程度

abc\u行：：=abc\u行内容新行

然后，改变：

abc\u正文：：=abc\u行+

致：

abc\u正文：：=abc\u行内容（换行abc\u行内容）*

（如有必要，也就是说，如果某些文件在最后一行的末尾有一个换行符，则添加

换行符？

。

文本文件总是以换行符结尾（至少，这是标准要求的），因此从理论上讲，这不应该是一个问题。在实践中，有时文件的最后一行没有尾随的换行符，但由于文本编辑器不便于创建这样的文件，因此在实践中它们非常罕见。当您从字符串文本解析单行测试字符串时，是否真的会出现此问题？@rici因此此语法过于简化。有一个头解析器没有给出任何问题，但体解析器给出了问题。如果我给一个只有一个空行的正文，我仍然会得到同样的错误。你的语法不接受新行，而且我理解abc格式，空行是调整分隔符，所以它们不能出现在正文中。无论如何，我认为您需要提供一个更清晰的问题描述，并提供一个真正的（可能是简化的）语法。