Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/sharepoint/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 段落的语法分析_Python_Parsing_Pyparsing - Fatal编程技术网

Python 段落的语法分析

Python 段落的语法分析,python,parsing,pyparsing,Python,Parsing,Pyparsing,我在pyparsing方面遇到了一个似乎无法解决的小问题。我想写一个规则,为我解析一个多行段落。最终目标是以递归语法结束,该语法将解析如下内容: Heading: awesome This is a paragraph and then a line break is inserted then we have more text but this is also a different line with more lines attached

我在pyparsing方面遇到了一个似乎无法解决的小问题。我想写一个规则,为我解析一个多行段落。最终目标是以递归语法结束,该语法将解析如下内容:

Heading: awesome
    This is a paragraph and then
    a line break is inserted
    then we have more text

    but this is also a different line
    with more lines attached

    Other: cool
        This is another indented block
        possibly with more paragraphs

        This is another way to keep this up
        and write more things

    But then we can keep writing at the old level
    and get this
转换成类似HTML的东西:所以可能(当然,通过解析树,我可以将其转换成我喜欢的任何格式)


但这似乎对我不起作用。任何想法都会很棒:)

所以我设法解决了这个问题,为将来遇到这个问题的任何人。你可以这样定义段落。虽然它肯定不理想,也不完全符合我描述的语法。有关守则如下:

line = OneOrMore(CharsNotIn('\n')) + Suppress(lineEnd)
emptyline = ~line
paragraph = OneOrMore(line) + emptyline
paragraph.setParseAction(join_lines)
其中,
连接线
定义为:

def join_lines(tokens):
    stripped = [t.strip() for t in tokens]
    joined = " ".join(stripped)
    return joined
如果这符合你的需要,这将为你指明正确的方向:)我希望这会有所帮助

更好的空行 上面给出的空行的定义肯定不理想,可以大大改进。我发现最好的方法是:

empty_line = Suppress(LineStart() + ZeroOrMore(" ") + LineEnd())
empty_line.setWhitespaceChars("")
这允许您在不破坏匹配的情况下,使用空格填充空行

def join_lines(tokens):
    stripped = [t.strip() for t in tokens]
    joined = " ".join(stripped)
    return joined
empty_line = Suppress(LineStart() + ZeroOrMore(" ") + LineEnd())
empty_line.setWhitespaceChars("")