Parsing Python3-为编译器创建扫描程序并在测试时出错_Parsing_Loops_Python 3.x_Runtime Error_Scanning

Parsing Python3-为编译器创建扫描程序并在测试时出错

parsing loops python-3.x

Parsing Python3-为编译器创建扫描程序并在测试时出错,parsing,loops,python-3.x,runtime-error,scanning,Parsing,Loops,Python 3.x,Runtime Error,Scanning,我正在尝试为一个编译器创建一个扫描程序，它可以读取一种简单的语言。我创建了一个名为程序的测试文件，其中包含： z := 2; if z < 3 then z := 1 end z:=2；如果z=len（self.input\u string）-1）：返回 while self.input_string[self.current_char_index].isspace（）： self.current\u char\u index+=1 def get_令牌（自身）： ''返回下一

我正在尝试为一个编译器创建一个扫描程序，它可以读取一种简单的语言。我创建了一个名为程序的测试文件，其中包含：

z := 2; if z < 3 then z := 1 end

z:=2；如果z<3，则 z:=1 结束
要运行程序，我使用terminal并运行命令行：
python3 scanner.py程序令牌
我希望将输出放入文本文件标记中，但执行此操作时不会显示任何内容。在运行时，程序运行但不执行任何操作。我试图把程序放在周围，但我得到了一个值错误：需要超过1个值才能解包
我的代码如下：

import re import sys class Scanner: '''The interface comprises the methods lookahead and consume. Other methods should not be called from outside of this class.''' def __init__(self, input_file): '''Reads the whole input_file to input_string, which remains constant. current_char_index counts how many characters of input_string have been consumed. current_token holds the most recently found token and the corresponding part of input_string.''' # source code of the program to be compiled self.input_string = input_file.read() # index where the unprocessed part of input_string starts self.current_char_index = 0 # a pair (most recently read token, matched substring of input_string) self.current_token = self.get_token() def skip_white_space(self): '''Consumes all characters in input_string up to the next non-white-space character.''' if (self.current_char_index >= len(self.input_string) - 1): return while self.input_string[self.current_char_index].isspace(): self.current_char_index += 1 def get_token(self): '''Returns the next token and the part of input_string it matched. The returned token is None if there is no next token. The characters up to the end of the token are consumed.''' self.skip_white_space() # find the longest prefix of input_string that matches a token token, longest = None, '' for (t, r) in Token.token_regexp: match = re.match(r, self.input_string[self.current_char_index:]) if match and match.end() > len(longest): token, longest = t, match.group() # consume the token by moving the index to the end of the matched part self.current_char_index += len(longest) return (token, longest) def lookahead(self): '''Returns the next token without consuming it. Returns None if there is no next token.''' return self.current_token[0] def consume(self, *tokens): '''Returns the next token and consumes it, if it is in tokens. Raises an exception otherwise. If the token is a number or an identifier, its value is returned instead of the token.''' current = self.current_token if (len(self.input_string[self.current_char_index:]) == 0): self.current_token = (None, '') # catches the end-of-file errors so lookahead returns none. else: self.current_token = self.get_token() # otherwise we consume the token if current[0] in tokens: # tokens could be a single token, or it could be group of tokens. if current[0] is Token.ID or current[0] is Token.NUM: # if token is ID or NUM return current[1] # return the value of the ID or NUM else: # otherwise return current[0] # return the token else: # if current_token is not in tokens raise Exception('non-token detected') # raise non-token error class Token: # The following enumerates all tokens. DO = 'DO' ELSE = 'ELSE' READ = 'READ' WRITE = 'WRITE' END = 'END' IF = 'IF' THEN = 'THEN' WHILE = 'WHILE' SEM = 'SEM' BEC = 'BEC' LESS = 'LESS' EQ = 'EQ' GRTR = 'GRTR' LEQ = 'LEQ' NEQ = 'NEQ' GEQ = 'GEQ' ADD = 'ADD' SUB = 'SUB' MUL = 'MUL' DIV = 'DIV' LPAR = 'LPAR' RPAR = 'RPAR' NUM = 'NUM' ID = 'ID' # The following list gives the regular expression to match a token. # The order in the list matters for mimicking Flex behaviour. # Longer matches are preferred over shorter ones. # For same-length matches, the first in the list is preferred. token_regexp = [ (DO, 'do'), (ELSE, 'else'), (READ, 'read'), (WRITE, 'write'), (END, 'end'), (IF, 'if'), (THEN, 'then'), (WHILE, 'while'), (SEM, ';'), (BEC, ':='), (LESS, '<'), (EQ, '='), (NEQ, '!='), (GRTR, '>'), (LEQ, '<='), (GEQ, '>='), (ADD, '[+]'), # + is special in regular expressions (SUB, '-'), (MUL, '[*]'), (DIV, '[/]'), (LPAR, '[(]'), # ( is special in regular expressions (RPAR, '[)]'), # ) is special in regular expressions (ID, '[a-z]+'), (NUM, '[0-9]+'), ] def indent(s, level): return ' '*level + s + '\n' # Initialise scanner. scanner = Scanner(sys.stdin) # Show all tokens in the input. token = scanner.lookahead() test = '' while token != None: if token in [Token.NUM, Token.ID]: token, value = scanner.consume(token) print(token, value) else: print(scanner.consume(token)) token = scanner.lookahead()

重新导入导入系统类别扫描程序： ''接口包括前瞻和消费方法。不应从此类之外调用其他方法。“” def_uuuinit_uuu（自，输入文件）： ''读取整个输入文件以输入字符串，该字符串保持不变。当前字符索引统计输入字符串的字符数已经被消耗掉了。当前_令牌保存最近找到的令牌和输入字符串的相应部分。“” #要编译的程序的源代码 self.input\u string=input\u file.read（） #输入字符串未处理部分开始的索引 self.current\u char\u index=0 #一对（最近读取的标记，输入字符串的匹配子字符串） self.current\u token=self.get\u token（） def跳过空白（自）： ''使用输入字符串中的所有字符直到下一个非空白字符。“” 如果（self.current\u char\u index>=len（self.input\u string）-1）：返回 while self.input_string[self.current_char_index].isspace（）： self.current\u char\u index+=1 def get_令牌（自身）： ''返回下一个标记及其匹配的输入字符串部分。如果没有下一个令牌，则返回的令牌为None。将使用标记末尾的字符。“” self.skip_white_space（） #查找与令牌匹配的输入字符串的最长前缀令牌，最长=无，'' 对于Token.Token\u regexp中的（t，r）： match=re.match（r，self.input\u字符串[self.current\u char\u index:] 如果match和match.end（）>len（最长）：令牌，最长=t，match.group（） #通过将索引移动到匹配部分的末尾来使用令牌 self.current\u char\u index+=len（最长）返回（令牌，最长） def前瞻（自我）： ''返回下一个令牌而不使用它。如果没有下一个令牌，则返回None。“” 返回自身。当前_令牌[0] def消耗（自身，*代币）： ''返回下一个令牌并使用它，如果它在令牌中。否则会引发异常。如果令牌是数字或标识符，则返回其值而不是代币。” 当前=自身当前\u令牌如果（len（self.input_string[self.current_char_index:]）==0）： self.current_token=（None，，）#捕获文件结尾错误，因此lookahead返回None。其他： self.current_token=self.get_token（）#否则我们将使用该令牌如果令牌中的当前[0]：#令牌可以是单个令牌，也可以是一组令牌。如果当前[0]是Token.ID或当前[0]是Token.NUM:#如果Token是ID或NUM 返回当前值[1]#返回ID或NUM的值否则：#否则返回当前[0]#返回令牌 else:#如果当前_令牌不在令牌中引发异常（“检测到非令牌”）#引发非令牌错误类令牌： #下面列举了所有令牌。 DO='DO' ELSE='ELSE' READ='READ' WRITE='WRITE' END='END' 如果='IF' 然后='THEN' WHILE='WHILE' SEM='SEM' BEC=‘BEC’ LESS=‘LESS’ EQ=‘EQ’ GRTR='GRTR' LEQ=‘LEQ’ NEQ=‘NEQ’ GEQ=‘GEQ’ ADD='ADD' SUB='SUB' MUL=‘MUL’ DIV='DIV' LPAR='LPAR' RPAR='RPAR' NUM='NUM' ID='ID' #下表给出了与标记匹配的正则表达式。 #列表中的顺序对于模仿Flex行为很重要。 #长比赛比短比赛更受欢迎。 #对于相同长度的匹配，首选列表中的第一个。令牌\u regexp=[ (DO,"DO"),，（ELSE，“ELSE”），（读‘读’），（写，'写'），（结束，'结束'），（如果，'如果'），（然后，'然后'），（WHILE，“WHILE”），（SEM，“；”），（BEC，“：=”），（减，），（LEQ，“=”），（添加“[+]”），#+在正则表达式中是特殊的 (第"一"节),，（MUL，[*]'）， (第"部",，（LPAR，[（]'），#（在正则表达式中是特殊的）（RPAR，“[）]”，#）在正则表达式中是特殊的（ID，“[a-z]+”），（数字，[0-9]+）， ] def缩进（个，级别）：返回''*level+s+'\n' #初始化扫描仪。扫描仪=扫描仪（sys.stdin） #显示输入中的所有标记。令牌=scanner.lookahead（）测试=“” 当代币！=无：如果[token.NUM，token.ID]中有令牌：令牌，值=scanner.consume（令牌）打印（令牌、值）其他：打印（扫描仪.消费（令牌））令牌=scanner.lookahead（）
如果解释得不好，很抱歉。任何关于出错原因的帮助都是非常好的。谢谢。
解决方案1a 我弄明白了为什么它没有打印到文件令牌。我需要将我的测试代码更改为

while token != None: print(scanner.consume(token)) token = scanner.lookahead()
现在唯一的问题是，当它是一个ID或NUM时，我无法读取，它只打印出标识或数字，而不说明它是哪个。现在，它打印出这个：
z
BEC
2
扫描电镜
如果
z
小于
3
然后
z
BEC
1
结束
我需要它来打印这个
NUMz
BEC
ID2
扫描电镜
如果
IDz
小于
NUM3
然后
IDz
BEC
while self.input_string[self.current_char_index] == '\s': self.current_char_index += 1
if current[0] in tokens: if current[0] in Token.ID: return 'ID' + current[1] elif current[0] in Token.NUM: return 'NUM' + current[1] else: return current[0] else: raise Exception('Error in compiling non-token(not apart of token list)')