Python：找到字符串后读取行_Python

Python：找到字符串后读取行

python

Python：找到字符串后读取行,python,Python,我有一个文件，其中包含我想分离的行块。每个块在块的标题中包含一个数字标识符：“块X”是第X行块的标题行。像这样： Block X #L E C A F X M N 11.2145 15 27 29.444444 7.6025229 1539742 29.419783 11.21451 13 28 24.607143 6.8247935 1596787 24.586264 ... Block Y #L E C A F X M N 11.2145 15 27 29.444

我有一个文件，其中包含我想分离的行块。每个块在块的标题中包含一个数字标识符：“块X”是第X行块的标题行。像这样：

Block X
#L E  C  A  F  X  M  N 
11.2145 15 27 29.444444 7.6025229 1539742 29.419783
11.21451 13 28 24.607143 6.8247935 1596787 24.586264
...
Block Y
#L E  C  A  F  X  M  N 
11.2145 15 27 29.444444 7.6025229 1539742 29.419783
11.21451 13 28 24.607143 6.8247935 1596787 24.586264
...

我可以使用“枚举”查找块的标题行，如下所示：

with open(filename,'r') as indata:
        for num, line in enumerate(indata):
            if 'Block X' in line:
                startblock=num
                print startblock

这将产生块#X的第一行的行号。
但是，我的问题是识别块的最后一行。为此，我可以找到标题行的下一个匹配项（即下一个块）并减去几个数字

我的问题是：如何在下一个条件出现时（即，在满足某个条件之后）找到a的行号

我再次尝试使用enumerate，这次指示起始值，如下所示：

with open(filename,'r') as indata:
        for num, line in enumerate(indata,startblock):
            if 'Block Y ' in line:
                endscan=num
                break            
    print endscan

这不起作用，因为它仍然从第0行开始读取文件，而不是从行号“startblock”开始。相反，通过从不同的数字启动“enumerate”计数器，计数器的结果值（在本例中为“endscan”）从0移位“startblock”

求求你，救命！如何告诉python忽略“startblock”前面的行

可以使用文件对象的.tell（）和.seek（）方法来移动。例如：

with open(filename, 'r') as infile:
    start = infile.tell()
    end = 0
    for line in infile:
        if line.startswith('Block'):
            end = infile.tell()
            infile.seek(start)
            # print all the bytes in the block
            print infile.read(end - start)
            # now go back to where we were so we iterate correctly
            infile.seek(end)
            # we finished a block, mark the start
            start = end

如果希望组使用

Block

作为每个节的分隔符，可以使用

itertools.groupby

：

from itertools import groupby

with open('test.txt') as f:
    grp = groupby(f,key=lambda x: x.startswith("Block "))
    for k,v in grp:
        if k:
           print(list(v) + list(next(grp, ("", ""))[1]))

输出：

['Block X\n', '#L E  C  A  F  X  M  N \n', '11.2145 15 27 29.444444 7.6025229 1539742 29.419783\n', '11.21451 13 28 24.607143 6.8247935 1596787 24.586264\n']
['Block Y\n', '#L E  C  A  F  X  M  N \n', '11.2145 15 27 29.444444 7.6025229 1539742 29.419783\n', '11.21451 13 28 24.607143 6.8247935 1596787 24.586264']

If块可以出现在其他位置，但只有在后跟空格和单个字符时才需要它：

import re

with open('test.txt') as f:
    r = re.compile("^Block \w$")
    grp = groupby(f, key=lambda x: r.search(x))
    for k, v in grp:
        if k:
            print(list(v) + list(next(grp, ("", ""))[1]))

如果标题行之间的差异在整个文件中是一致的，只需使用距离相应地增加索引变量

    file1 = open('file_name','r')
    lines = file1.readlines()
    numlines = len(lines)
    i=0
    for line in file:
        if line == 'specific header 1':
           line_num1 = i
        if line == 'specific header 2':
           line_num2 = i
    i+=1 
   diff = line_num2 - line_num1

现在我们知道了用于循环获取数据的行号之间的差异

    k=0
    array = np.zeros([numlines, diff])
    for i in range(numlines):
        if k % diff == 0:            
           for j in range(diff):
               array[i][j] = lines[i+j]
        k+=1

%是mod运算符，仅当k是文件中两个标题行之间行号差的倍数时才返回0，这仅在该行对应于a标题行时才会发生。一旦行被固定，我们就进入填充数组的第二个for循环，这样我们就有了一个矩阵，即numlines行数和diff列数。非零行将包含标题行之间的数据

我还没有试过，我只是写下了我的头顶。希望能有帮助

保留所有行，直到在列表中找到块标题。找到标题后，从存储的行中找出所需内容并清除列表