Python 如何检查令牌是否与正则表达式模式匹配？_Python_Regex

Python 如何检查令牌是否与正则表达式模式匹配？

python regex

Python 如何检查令牌是否与正则表达式模式匹配？,python,regex,Python,Regex,我正在用python编写一个c/c++词法分析器它有很多工作要做，但到目前为止我被卡住了。我想用正则表达式模式检查变量名：？：\w+\s+[a-zA-Z_][a-zA-Z0-9]+/g 尽管这种模式在不同的时间都能正常工作我的代码是：总结：检查字符串是否与模式匹配的代码： regex = re.compile(r'(?:\w+\s+)([a-zA-Z_][a-zA-Z0-9]+)\b') if re.search(regex, token) == True: #if token mat

我正在用python编写一个c/c++词法分析器

它有很多工作要做，但到目前为止我被卡住了。我想用正则表达式模式检查变量名：

？：\w+\s+[a-zA-Z_][a-zA-Z0-9]+/g

尽管这种模式在不同的时间都能正常工作

我的代码是：

总结：检查字符串是否与模式匹配的代码：

regex = re.compile(r'(?:\w+\s+)([a-zA-Z_][a-zA-Z0-9]+)\b')
if re.search(regex, token) == True: #if token matches the pattern
            print(token + ' : Variable Name')

进口稀土字典： 1运营商运算符={'='：“赋值”， “+”：“附加项”， “-”：“减法”， “/”：“分部”， “*”：“乘法”， “++”：“增量”， “-”：“减量”，：'大于'} optr_keys=operators.keys 2关键词关键字={'int'：'Integer数据类型指示符'， '浮点'：'浮点数据类型指示符'， '字符'：'字符数据类型指示符'， “long”：“long Int数据类型指示符”， “双精度”：“双精度数据类型指示器”， 'short'：'short Integer数据类型指示符'， '无符号'：'无符号整数数据类型指示符'， “void”：“void数据类型指示符”， 'struct'：'Structure Datatype Indicator'， “return”：“return”， '如果'：'条件如果关键字'， 'else'：'Condition else关键字'， 'while'：'while循环指示器'， 'do'：'do While循环指示器'， “中断”：“中断关键字”， “continue”：“continue关键字”， “开关”：“开关关键字”， '案例'：'案例关键字'， 'sizeof'：'Variable Size Indicator'， 'typedef'：'Function Type Indicator'， “static”：“static Type关键字”， “转到”：“转到行关键字”， “包含”：“标题包含指示符” } keyword\u keys=keywords.keys 3个分隔符分隔符={'；'：'Line Terminator分号'， ''：'单个空空间'} delimiter\u keys=delimiters.keys 4评论指标注释={r'/'：“单行注释”， r'/*'：“多行注释开始”， r'*/'：'多行注释结束'， “/**/”：“空多行注释”} comment\u keys=comments.keys 5个内置头文件头文件={：‘标准输入输出头’，：“字符串操作库”} header\u keys=header\u files.keys 6个街区 blocks={'{'：'Blocked语句体打开'， '}'：'Blocked Statement Body Closed'} blocks\u keys=blocks.keys 7个预定义功能内置函数={'printf'：'Prints To Console'， “cout”：“标准输出函数”， “cin”：“标准输入函数”} 内置功能键=内置功能键 8个数字数字={'0'：'0'， “1”：“1”， "2":"2",， "三":"三",， "4":"4",， ‘五’：‘五’， ‘6’：‘6’， ‘7’：‘Se7en’， "八":"八",， “9”：“9”} numbers\u keys=numbers.keys 计数=0 cfile='/some/path/to/sample/file.c' f=opencfile“r”。读取行=f.split'\n' regex=re.compiler'？：\w+\s+[a-zA-Z_2;][a-zA-Z0-9]+\b' 对于行中的行：计数=计数+1 打印'\n\n行号'，strcount+'\n' 标记=行。拆分“” 打印“代币是”，代币对于令牌中的令牌：如果令牌中有“\n”：位置=令牌。查找“\n” 令牌=令牌[：位置] 如果optr_密钥中有令牌： printtoken'：运算符=>，运算符[token] 关键字_密钥中的elif令牌： printtoken'：关键字=>'，关键字[标记] 注释密钥中的elif标记： printtoken+'：Comment=>'，comments[token] 标记中的elif“.h”： printtoken+'：头文件=>'，头文件[token] 块中的elif令牌\u密钥： printtoken+'：块指示符=>'，块[token] 内置功能密钥中的elif令牌： printtoken+'：内置函数=>'，内置函数[token] 以数字表示的elif令牌： printtoken+'：数字=>'，数字\键[标记] 其他： if boolre.searchregex，token==True：如果token与模式匹配 printtoken+'：变量名' 样本输出：

###Line Number 1

Tokens Are  ['#include', '<stdio.h>', '//', 'This', 'is', 'a', 'header', 'file']
#include  : Keyword =>  Header Include Indicator <stdio.h> : Header File =>  Standard Input Output Header // : Comment =>  Single Line Comment


###Line Number 2

Tokens Are  ['int', 'main()'] int  : Keyword =>  Integer Datatype Indicator


###Line Number 3

Tokens Are  ['{'] { : Block Indicator =>  Blocked Statement Body Open


###Line Number 4

Tokens Are  ['', '', '', '', 'int', 'a;'] int  : Keyword =>  Integer Datatype Indicator


###Line Number 5

Tokens Are  ['', '', '', '', 'a', '=', '10;']
=  : Operator =>  Assignment


###Line Number 6

Tokens Are  ['', '', '', '', 'printf("The', 'value', 'of', 'a', 'is', '%d', '",a);']


###Line Number 7

Tokens Are  ['', '', '', '', 'return', '0;'] return  : Keyword =>  Return


###Line Number 8

Tokens Are  ['}'] } : Block Indicator =>  Blocked Statement Body Closed


###Line Number 9

Tokens Are  ['']

我希望代码在输出中包含变量名，但它只是忽略它们，因为匹配过程失败。我想我尝试匹配标记/字符串的方式有问题。

问题在于我的正则表达式本身

以下模式有效：

[a-zA-Z_][a-zA-Z0-9_]

信用证：

您的regexp[a-zA-Z_][a-zA-Z0-9]+除了名称的第一个字符外，似乎不允许使用下划线。您还可以使用单字母变量名，因此“+”应该是一个字母，以指示零次或多次出现。我建议改为[a-zA-Z_u][a-zA-Z0-9_u]，也会重新搜索ever==True吗？搜索返回一个匹配的对象。@9769953 right。删除了re。IGNORECASE@MarkMeyer编辑。boolre.search会导致相同的结果。您只需使用if re.search：