Python 如何检查与正则表达式不匹配的字符序列_Python_Regex_Compiler Construction_Lexical Analysis

Python 如何检查与正则表达式不匹配的字符序列

python regex compiler-construction

Python 如何检查与正则表达式不匹配的字符序列,python,regex,compiler-construction,lexical-analysis,Python,Regex,Compiler Construction,Lexical Analysis,我目前正在尝试实现一个词法扫描程序，它将成为编译器的一部分。程序使用正则表达式匹配输入程序文件。如果一系列非空白字符与正则表达式匹配，那么匹配的输入部分将转换为令牌，该令牌与其他令牌一起发送给解析器。我让代码正常工作，以便正确输出正确的标记，但我需要这样做：如果发现一系列非空白字符与给定的任何正则表达式都不匹配，则扫描程序将引发异常（由方法no_token（）调用）。这是我在这里的第一篇帖子，所以如果你有任何关于我如何改进帖子的建议，请让我知道，或者如果你需要关于问题或代码的更多信息，请询问 d

我目前正在尝试实现一个词法扫描程序，它将成为编译器的一部分。程序使用正则表达式匹配输入程序文件。如果一系列非空白字符与正则表达式匹配，那么匹配的输入部分将转换为令牌，该令牌与其他令牌一起发送给解析器。我让代码正常工作，以便正确输出正确的标记，但我需要这样做：如果发现一系列非空白字符与给定的任何正则表达式都不匹配，则扫描程序将引发异常（由方法

no_token（）

调用）。这是我在这里的第一篇帖子，所以如果你有任何关于我如何改进帖子的建议，请让我知道，或者如果你需要关于问题或代码的更多信息，请询问

def get_token(self):
    '''Returns the next token and the part of input_string it matched.
       The returned token is None if there is no next token.
       The characters up to the end of the token are consumed.
       Raise an exception by calling no_token() if the input contains
       extra non-white-space characters that do not match any token.'''
    self.skip_white_space()
    # find the longest prefix of input_string that matches a token
    token, longest = None, ''
    for (t, r) in Token.token_regexp:
        match = re.match(r, self.input_string[self.current_char_index:])
        if match is None:
            self.no_token()
        elif match and match.end() > len(longest):
            token, longest = t, match.group()
    self.current_char_index += len(longest)
    return (token, longest)

正如你所看到的，我试着使用

if match is None:
    self.no_token()

但这会产生异常，并在开始时退出程序，并且不会返回任何标记，但如果我对此进行注释，代码就可以正常工作。显然，如果非空白字符与任何正则表达式不匹配，或者它将在开发的后期阶段导致问题，那么我需要本节生成一个异常

方法

skip_white_space（），
正则表达式存储在token_regexp和self中。input_string[self.current_char\u index:]）
给出当前字符
对于作为.txt文件的程序：
z := 2;
if z < 3 then
  z := 1
end

这是正确的，但当我尝试实现no_token（）调用时，我得到：
lexical error: no token found at the start of z := 2;
if z < 3 then
  z := 1
end

词法错误：在z:=2的开头未找到标记；
如果z<3，则
z:=1
结束

如果有一系列字符与我在扫描器中实现的正则表达式不匹配，则no_token（）
方法将输出该字符，但此输入不是这种情况。这里的所有字符序列都有效。
已将其全部排序。干杯
def get_token(self):
    '''Returns the next token and the part of input_string it matched.
       The returned token is None if there is no next token.
       The characters up to the end of the token are consumed.
       Raise an exception by calling no_token() if the input contains
       extra non-white-space characters that do not match any token.'''
    self.skip_white_space()
    # find the longest prefix of input_string that matches a token
    token, longest = None, ''
    for (t, r) in Token.token_regexp:
        match = re.match(r, self.input_string[self.current_char_index:])
        if match and match.end() > len(longest):
            token, longest = t, match.group()

    self.current_char_index += len(longest)
    if token == None and self.current_char_index < len(self.input_string):
        self.no_token()
    return (token, longest)

def get_令牌（self）：
''返回下一个标记及其匹配的输入字符串部分。
如果没有下一个令牌，则返回的令牌为None。
将使用标记末尾的字符。
如果输入包含，则通过调用no_token（）引发异常
与任何标记都不匹配的额外非空白字符。“”
self.skip_white_space（）
#查找与令牌匹配的输入字符串的最长前缀
令牌，最长=无，''
对于Token.Token\u regexp中的（t，r）：
match=re.match（r，self.input\u字符串[self.current\u char\u index:]
如果match和match.end（）>len（最长）：
令牌，最长=t，match.group（）
self.current\u char\u index+=len（最长）
如果标记==无且self.current\u char\u index

是最后的工作代码吗。干杯
def get_token(self):
    '''Returns the next token and the part of input_string it matched.
       The returned token is None if there is no next token.
       The characters up to the end of the token are consumed.
       Raise an exception by calling no_token() if the input contains
       extra non-white-space characters that do not match any token.'''
    self.skip_white_space()
    # find the longest prefix of input_string that matches a token
    token, longest = None, ''
    for (t, r) in Token.token_regexp:
        match = re.match(r, self.input_string[self.current_char_index:])
        if match and match.end() > len(longest):
            token, longest = t, match.group()

    self.current_char_index += len(longest)
    if token == None and self.current_char_index < len(self.input_string):
        self.no_token()
    return (token, longest)

def get_令牌（self）：
''返回下一个标记及其匹配的输入字符串部分。
如果没有下一个令牌，则返回的令牌为None。
将使用标记末尾的字符。
如果输入包含，则通过调用no_token（）引发异常
与任何标记都不匹配的额外非空白字符。“”
self.skip_white_space（）
#查找与令牌匹配的输入字符串的最长前缀
令牌，最长=无，''
对于Token.Token\u regexp中的（t，r）：
match=re.match（r，self.input\u字符串[self.current\u char\u index:]
如果match和match.end（）>len（最长）：
令牌，最长=t，match.group（）
self.current\u char\u index+=len（最长）
如果标记==无且self.current\u char\u index

是最后的工作代码
要回答您的问题，如果确实需要，您可以使用。比起冗长的解释和大量不相关的代码，A对于获得答案来说真的更有用。谢谢你的快速回复。我会读一读，看看这对我是否有帮助。当你说一个可验证的最小完整答案时，你是指一个输入和预期输出的例子吗？我不确定代码是如何无关的，因为这是整个程序中对no_token（）的唯一调用，也是发生错误的原因。输入示例和预期输出是最小值，是的。要回答您的问题，如果确实需要，可以使用。比起冗长的解释和大量不相关的代码，A对于获得答案来说真的更有用。谢谢你的快速回复。我会读一读，看看这对我是否有帮助。当你说一个可验证的最小完整答案时，你是指一个输入和预期输出的例子吗？我不确定代码是如何无关的，因为这是整个程序中对no_token（）的唯一调用，也是发生错误的原因输入示例和预期输出是最小值，是的。