Python 提取文件中重复的标题之间的行_Python_Python 3.x

Python 提取文件中重复的标题之间的行

python python-3.x

Python 提取文件中重复的标题之间的行,python,python-3.x,Python,Python 3.x,我正在尝试修改一个包含~43k行的txt文件。在文件中给出命令*Nset之后，我需要提取并保存该命令后面的所有行，当它到达文件中的下一个*命令时停止。每个命令后都有不同数量的行和字符。例如，下面是该文件的一个示例部分： *Nset 1, 2, 3, 4, 5, 6, 7, 12, 13, 14, 15, 16, 17, 52, 75, 86, 92, 90, 91, 92 93, 94, 95.... *NEXT COMMAND blah blah blah *Nset nu

我正在尝试修改一个包含~43k行的txt文件。在文件中给出命令*Nset之后，我需要提取并保存该命令后面的所有行，当它到达文件中的下一个*命令时停止。每个命令后都有不同数量的行和字符。例如，下面是该文件的一个示例部分：

*Nset

1, 2, 3, 4, 5, 6, 7,

12, 13, 14, 15, 16,

17, 52, 75, 86, 92,

90, 91, 92 93, 94, 95....

*NEXT COMMAND

 blah blah blah

*Nset

 numbers

*Nset

 numbers

*Command

 irrelevant text

当我需要的数字不在两个*Nset之间时，我当前拥有的代码可以工作。当一个*Nset跟随另一个的数字时，它会同时跳过该命令和进行中的行，我不知道为什么。当下一个命令不是*Nset时，它会找到下一个命令并很好地提取数据

import re

# read in the input deck
deck_name = 'master.txt'
deck = open(deck_name,'r')

#initialize variables
nset_data = []
matched_nset_lines = []
nset_count = 0

for line in deck:
     # loop to extract all nset names and node numbers
     important_line = re.search(r'\*Nset,.*',line)
     if important_line :
         line_value = important_line.group() #name for nset
         matched_nset_lines.insert(nset_count,line_value) #name for nset
         temp = []

        # read lines from the found match up until the next *command
         for line_x in deck :
             if not re.match(r'\*',line_x):
                 temp.append(line_x)
             else : 
                 break

         nset_data.append(temp)

     nset_count = nset_count + 1

我正在使用Python 3.5。谢谢您的帮助。

如果您只想提取

*nset

之间的行，以下方法应该可以工作：

In [5]: with open("master.txt") as f:
   ...:     data = []
   ...:     gather = False
   ...:     for line in f:
   ...:         line = line.strip()
   ...:         if line.startswith("*Nset"):
   ...:             gather = True
   ...:         elif line.startswith("*"):
   ...:             gather = False
   ...:         elif line and gather:
   ...:             data.append(line)
   ...:

In [6]: data
Out[6]:
['1, 2, 3, 4, 5, 6, 7,',
 '12, 13, 14, 15, 16,',
 '17, 52, 75, 86, 92,',
 '90, 91, 92 93, 94, 95....',
 'numbers',
 'numbers']

而且，如果您需要更多信息，可以很简单地扩展上述内容：

In [7]: with open("master.txt") as f:
   ...:     nset_lines = []
   ...:     nset_count = 0
   ...:     data = []
   ...:     gather = False
   ...:     for i, line in enumerate(f):
   ...:         line = line.strip()
   ...:         if line.startswith("*Nset"):
   ...:             gather = True
   ...:             nset_lines.append(i)
   ...:             nset_count += 1
   ...:         elif line.startswith("*"):
   ...:             gather = False
   ...:         elif line and gather:
   ...:             data.append(line)
   ...:

In [8]: nset_lines
Out[8]: [0, 14, 18]

In [9]: nset_count
Out[9]: 3

In [10]: data
Out[10]:
['1, 2, 3, 4, 5, 6, 7,',
 '12, 13, 14, 15, 16,',
 '17, 52, 75, 86, 92,',
 '90, 91, 92 93, 94, 95....',
 'numbers',
 'numbers']

某个命令是否总是在行首，以“*”开头？@juanpa.arrivillaga，是的。有各种各样的命令，但紧接着在每个命令之前是“*”。然后下一行是数字。这有关系吗？