Python 如何基于模板将文件拆分为多个部分？_Python_Python 3.x_Regex_Text_Split

Python 如何基于模板将文件拆分为多个部分？

python python-3.x regex text

Python 如何基于模板将文件拆分为多个部分？,python,python-3.x,regex,text,split,Python,Python 3.x,Regex,Text,Split,我正在分析具有已知结构的文本文件。在使用Rube-Goldberg机器这类解决方案之前，我想检查一下是否有解决此类问题的标准方法文件结构是 whatever text, empty lines more text long text empty lines whatever ← one empty line line 1 of final block of text line 2 of final block of text line 3 of final block of text ← m

我正在分析具有已知结构的文本文件。在使用Rube-Goldberg机器这类解决方案之前，我想检查一下是否有解决此类问题的标准方法

文件结构是

whatever text, empty lines more text

long text empty lines
whatever
← one empty line
line 1 of final block of text
line 2 of final block of text
line 3 of final block of text
← more lines, the number is not defined
← new line and end of file

所以这是自由文本，直到一个空行，然后是一块没有空行的单行文本，并在一个新行上结束文件

我想把这个文件分成两个主要部分：自由文本PAR和块部分。然后，将对这两者进行独立分析

我的第一个希望是会有某种类型的“文件模式匹配”模块，在该模块中，我将以类似于上面的方式描述文件，并检索我的两个部分。我刚刚发现了模板（另一种方法是：描述文件的内容以创建它）

下一个想到的解决方案是正则表达式。我正在努力解决的部分是描述“一个只有单行返回的文本块”。这是怎么描述的？

一般来说，这个问题有没有更简单的解决方案？（只要指出这一点就好了，很可能我从来没有遇到过这种方法）

我的直觉是，应该从下到上分析文件-如果没有更明显的解决方案，这可能是我将要开发的解决方案。

这对我很有用：

>>> a = '''whatever text, empty lines more text
... 
... long text empty lines
... whatever
... ← one empty line
... line 1 of final block of text
... line 2 of final block of text
... line 3 of final block of text
... ← more lines, the number is not defined
... ← new line and end of file
... '''
>>> a.rsplit('\n\n', 1)
['whatever text, empty lines more text', 'long text empty lines\nwhatever\n\xe2\x86\x90 one empty line\nline 1 of final block of text\nline 2 of final block of text\nline 3 of final block of text\n\xe2\x86\x90 more lines, the number is not defined\n\xe2\x86\x90 new line and end of file\n']
>>>

这对我很有用：

>>> a = '''whatever text, empty lines more text
... 
... long text empty lines
... whatever
... ← one empty line
... line 1 of final block of text
... line 2 of final block of text
... line 3 of final block of text
... ← more lines, the number is not defined
... ← new line and end of file
... '''
>>> a.rsplit('\n\n', 1)
['whatever text, empty lines more text', 'long text empty lines\nwhatever\n\xe2\x86\x90 one empty line\nline 1 of final block of text\nline 2 of final block of text\nline 3 of final block of text\n\xe2\x86\x90 more lines, the number is not defined\n\xe2\x86\x90 new line and end of file\n']
>>>

坦率地说，我不理解这个问题。你写“然后是一块没有空行的一行文字”-什么是“一行文字”？！你怎么知道一个文件片段是“一个块”？@Błotosmętek:a block of one（“single”会更好）line“是没有被一行或多行空行分割的行。换句话说，只有一行分隔符是可能的。坦白地说，我不理解这个问题。你写的是“那么一个一行的块没有空行的文本”-什么是“一行一块”？！如何识别文件片段是“一个块”@Błotosmętek:one块（“单”行更好）是不被一个或多个空行分割的行。换句话说，只有一个单行分隔符是可能的。啊，这是一个很好的主意-在

\n\n

和

.pop（-1）上分割

给了我块，留下自由文本来分析。非常感谢。@WoJ做了一个小小的改变：

。rsplit（…，1）

将确保您得到正好2个块，不必使用

[-1]

，只需

[0]

和

[1]

啊，这是一个很好的主意-在

\n\n

和

.pop（-1）上拆分

给了我块，留下自由文本供分析。非常感谢。@WoJ做了一个小改动：

。rsplit（…，1）

将确保您正好得到2个块，不必使用

[-1]

，只需

[0]

和

[1]