Python 将正则表达式解析为单词_Python_Pyparsing

Python 将正则表达式解析为单词

python

Python 将正则表达式解析为单词,python,pyparsing,Python,Pyparsing,我正在构建一个语法分析器，用于对使用点表示法识别的对象执行简单操作，如下所示： DISABLE ALL; ENABLE A.1 B.1.1 C 但是在DISABLE ALL中，关键字ALL被匹配为3Regex（r'[a-zA-Z]'）=>a'，L'，L'我用来匹配参数如何使用正则表达式生成单词？抱歉，我无法使用Word获取A.1.1 请参见下面的示例 import pyparsing as pp def toggle_item_action(s, loc, tokens): 'en

我正在构建一个语法分析器，用于对使用点表示法识别的对象执行简单操作，如下所示：

DISABLE ALL;
ENABLE A.1 B.1.1 C

但是在

DISABLE ALL

中，关键字

ALL

被匹配为3

Regex（r'[a-zA-Z]'）=>a'，L'，L'

我用来匹配参数

如何使用正则表达式生成单词？抱歉，我无法使用Word获取
A.1.1
请参见下面的示例

import pyparsing as pp def toggle_item_action(s, loc, tokens): 'enable / disable a sequence of items' action = True if tokens[0].lower() == "enable" else False for token in tokens[1:]: print "it[%s].active = %s" % (token, action) def toggle_all_items_action(s, loc, tokens): 'enable / disable ALL items' action = True if tokens[0].lower() == "enable" else False print "it.enable_all(%s)" % action expr_separator = pp.Suppress(';') #match A area = pp.Regex(r'[a-zA-Z]') #match A.1 category = pp.Regex(r'[a-zA-Z]\.\d{1,2}') #match A.1.1 criteria = pp.Regex(r'[a-zA-Z]\.\d{1,2}\.\d{1,2}') #match any of the above item = area ^ category ^ criteria #keyword to perform action on ALL items all_ = pp.CaselessLiteral("all") #actions enable = pp.CaselessKeyword('enable') disable = pp.CaselessKeyword('disable') toggle = enable | disable #toggle item expression toggle_item = (toggle + item + pp.ZeroOrMore(item) ).setParseAction(toggle_item_action) #toggle ALL items expression toggle_all_items = (toggle + all_).setParseAction(toggle_all_items_action) #swapping order to `toggle_all_items ^ toggle_item` works #but seems to weak to me and error prone for future maintenance expr = toggle_item ^ toggle_all_items #expr = toggle_all_items ^ toggle_item more = expr + pp.ZeroOrMore(expr_separator + expr) more.parseString(""" ENABLE A.1 B.1.1; DISABLE ALL """, parseAll=True)
这就是问题所在吗

#match any of the above item = area ^ category ^ criteria #keyword to perform action on ALL items all_ = pp.CaselessLiteral("all")
应该是：

#keyword to perform action on ALL items all_ = pp.CaselessLiteral("all") #match any of the above item = area ^ category ^ criteria ^ all_
编辑-如果你有兴趣
你的正则表达式非常相似，我想我应该看看把它们合并成一个正则表达式会是什么样子。下面是一个片段，用于使用单个正则表达式解析出三个虚线符号，然后使用解析操作找出您得到的类型：

import pyparsing as pp dotted_notation = pp.Regex(r'[a-zA-Z](\.\d{1,2}(\.\d{1,2})?)?') def name_notation_type(tokens): name = { 0 : "area", 1 : "category", 2 : "criteria"}[tokens[0].count('.')] # assign results name to results - tokens[name] = tokens[0] dotted_notation.setParseAction(name_notation_type) # test each individually tests = "A A.1 A.2.2".split() for t in tests: print t val = dotted_notation.parseString(t) print val.dump() print val[0], 'is a', val.getName() print # test all at once tests = "A A.1 A.2.2" val = pp.OneOrMore(dotted_notation).parseString(tests) print val.dump()
印刷品：

A ['A'] - area: A A is a area A.1 ['A.1'] - category: A.1 A.1 is a category A.2.2 ['A.2.2'] - criteria: A.2.2 A.2.2 is a criteria ['A', 'A.1', 'A.2.2'] - area: A - category: A.1 - criteria: A.2.2
EDIT2-我看到了原来的问题
搞砸你的是pyparsing的隐式空格跳过。Pyparsing将跳过已定义标记之间的空格，但反之则不正确-Pyparsing不需要单独的解析器表达式之间的空格。所以在你的全无版本中，“all”看起来像3个区域，“A”，“L”和“L”。这不仅适用于正则表达式，也适用于任何pyparsing类。查看pyparsing WordEnd类是否有助于实现这一点
EDIT3-那么可能是这样的

toggle_item = (toggle + pp.OneOrMore(item)).setParseAction(toggle_item_action) toggle_all = (toggle + all_).setParseAction(toggle_all_action) toggle_directive = toggle_all | toggle_item
按照命令的格式化方式，在查找单个区域之前，必须首先让解析器查看是否所有区域都已切换。如果需要支持可能为“ENABLE A.1 ALL”的内容，则对
item
：
item=~ALL.+（area^等）
。
（请注意，我将
item+pp.ZeroOrMore（item）
替换为
pp.OneOrMore（item）
）
其他注释/答案发生了什么？我认为他们对讨论很有帮助。@Paul:有一个关于meta的讨论，抱怨声誉很高的用户删除了不好的答案以避免被否决^^首先感谢Paul的出色分析，我必须将事情分开，因为在
标准
上允许一些操作，但不允许
区域
或
类别
等等，所以我想区分<代码>所有不能与项一起使用，因为这是一种不同的解析操作。如果您在上面运行我的代码，您会得到错误的结果，但是如果您只是取消注释第42行，您可以看到正确的结果（因为
ALL
在
A
L
之前匹配）。这是实现这一点的唯一方法吗？非常感谢，我现在对操作数顺序有了更多的了解（我一直坚持逻辑或行为），也了解了到目前为止被忽略的- 操作数。ehm。。。我被卡住了，不是被卡住了