Python pyparsing如何跳过缩进块的末尾？_Python_Pyparsing

Python pyparsing如何跳过缩进块的末尾？

python

Python pyparsing如何跳过缩进块的末尾？,python,pyparsing,Python,Pyparsing,我试图用pyparsing解析这样的结构： identifier: some description text here which will wrap on to the next line. the follow-on text should be indented. it may contain identifier: and any text at all is allowed next_identifier: more description, short th

我试图用

pyparsing

解析这样的结构：

identifier: some description text here which will wrap
    on to the next line. the follow-on text should be
    indented. it may contain identifier: and any text
    at all is allowed
next_identifier: more description, short this time
last_identifier: blah blah

我需要像这样的东西：

将pyparsing导入为pp
冒号=pp.Suppress（“：”）
术语=pp.Word（pp.alphanums+“？”）
description=pp.SkipTo（下一个标识符）
定义=术语+冒号+描述
语法=pp.OneOrMore（定义）

但是我正在努力定义

SkipTo

子句的

next\u标识符

，因为标识符可能会自由出现在描述文本中

似乎我需要在语法中包含缩进，这样我就可以跳过下一个非缩进行

我试过：

description=pp.联合收割机(
pp.SkipTo（pp.LineEnd（））+
缩进块(
pp.ZeroOrMore(
pp.SkipTo（pp.LineEnd（））
),
缩进栈
)
)

但我得到了一个错误：

ParseException: not a subentry (at char 55), (line:2, col:1)

Char 55处于联机运行的最开始：

...will wrap\n    on to the next line...
              ^

这似乎有点奇怪，因为char位置后面紧跟着空白，这使它成为缩进子项

我在ipdb中的回溯如下所示：

   5311     def checkSubIndent(s,l,t):
   5312         curCol = col(l,s)
   5313         if curCol > indentStack[-1]:
   5314             indentStack.append( curCol )
   5315         else:
-> 5316             raise ParseException(s,l,"not a subentry")
   5317

ipdb> indentStack
[1]
ipdb> curCol
1

我应该补充一点，上面我匹配的整个结构可能也会缩进（缩进量未知），因此类似以下的解决方案：

description=pp.联合收割机(
pp.SkipTo（pp.LineEnd（））+pp.LineEnd（）+
pp.ZeroOrMore(
pp.White（“”）+pp.SkipTo（pp.LineEnd（））+pp.LineEnd（）
)
)

…在我的例子中，它不起作用，因为它将使用后续的定义。

当您使用

indentedBlock

时，您传入的参数是块中每一行的表达式，因此它不应该是

indentedBlock（ZeroOrMore（line_expression），stack）

，而只是

indentedBlock（line_expression，stack）

。Pyparsing包含一个内置表达式，用于“从这里到行尾的所有内容”，标题为

restOfLine

，因此我们仅将其用于缩进块中每一行的表达式：

import pyparsing as pp

NL = pp.LineEnd().suppress()

label = pp.ungroup(pp.Word(pp.alphas, pp.alphanums+'_') + pp.Suppress(":"))

indent_stack = [1]
# see corrected version below
#description = pp.Group((pp.Empty() 
#                    + pp.restOfLine + NL
#                    + pp.ungroup(pp.indentedBlock(pp.restOfLine, indent_stack))))

description = pp.Group(pp.restOfLine + NL
                       + pp.Optional(pp.ungroup(~pp.StringEnd() 
                                                + pp.indentedBlock(pp.restOfLine, 
                                                                   indent_stack))))

labeled_text = pp.Group(label("label") + pp.Empty() + description("description"))

我们使用ungroup来删除由

indentedBlock

创建的额外嵌套级别，但我们还需要删除在

indentedBlock

中内部创建的每行嵌套。我们通过解析操作来完成此操作：

def combine_parts(tokens):
    # recombine description parts into a single list
    tt = tokens[0]
    new_desc = [tt.description[0]]
    new_desc.extend(t[0] for t in tt.description[1:])

    # reassign rebuild description into the parsed token structure 
    tt['description'] = new_desc
    tt[1][:] = new_desc

labeled_text.addParseAction(combine_parts)

至此，我们已经基本完成了。以下是解析和转储的示例文本：

parsed_data = (pp.OneOrMore(labeled_text)).parseString(sample)    
print(parsed_data[0].dump())

['identifier', ['some description text here which will wrap', 'on to the next line. the follow-on text should be', 'indented. it may contain identifier: and any text', 'at all is allowed']]
- description: ['some description text here which will wrap', 'on to the next line. the follow-on text should be', 'indented. it may contain identifier: and any text', 'at all is allowed']
- label: 'identifier'

或使用此代码拉出标签和说明字段：

for item in parsed_data:
    print(item.label)
    print('..' + '\n..'.join(item.description))
    print()

identifier
..some description text here which will wrap
..on to the next line. the follow-on text should be
..indented. it may contain identifier: and any text
..at all is allowed

next_identifier
..more description, short this time

last_identifier
..blah blah

行是用制表符缩进的，还是用固定数量的空格缩进的？你必须处理多个缩进级别，还是只处理一个缩进级别？我基本上是在解析docstring，所以缩进的深度未知，并且会有多个级别。我在这里做了一个测试套件，其中有一些示例需要处理。我意识到我上面的部分问题是我需要处理使用

LineEnd（）后面的空格

在进入

缩进块之前

但是我一整天都在碰壁，试图满足所有的情况。看起来应该很简单！顺便说一句，我真的很喜欢这个问题，因为它涉及一种非常灵活的缩进格式，类似于许多降价格式中的缩进格式。我认为它还突出了一个minor InentedBlock的bug/feature缺点在于它不能很好地处理StringEnd或空块。非常感谢您的回答！这似乎只提取了第一个标识符-我将尝试将缩进组

设置为可选的。我也很困惑为什么您将定义定义为组，然后我使用了一个解析操作来展平各个部分，因此我尝试将组更改为组合，结果是ParseException:不是子条目（在char 55处），（第2行，第1列）
我不明白为什么。你是对的，缩进块需要设置为可选的。但是，当我刚刚设置此选项时，它在字符串末尾被卡在无限循环中，因此我还必须添加~StringEnd（）
negative lookahead。要了解我为什么要平展解析操作中的数据，请注释掉addParseAction行并重新运行（只有dump（）代码有效，“..”。join代码将引发异常）-你应该看到在缩进块中有两个额外的行嵌套级别。最后-组和组合是不可互换的，请参阅文档了解更多信息。非常感谢你的帮助！我意识到我自己的所有尝试都被卡住了，因为我试图将描述
的所有内容包装在一个外部C中ombine
layer（天真地实现解析操作）…如果我理解正确，这不起作用，因为Combine正在默默地忽略子组？类似的事情？我可以得到多行描述的第一行，但无法理解为什么我的可选（缩进块）
不匹配。此问题+答案可能有助于您澄清问题：