在python中使用类似的开始和结束字符串切片文本文件的一部分_Python_Indexing_Text_Slice

在python中使用类似的开始和结束字符串切片文本文件的一部分

python indexing text

在python中使用类似的开始和结束字符串切片文本文件的一部分,python,indexing,text,slice,Python,Indexing,Text,Slice,我想把下面的文本切分 Per MPI rank memory allocation (min/avg/max) = 7.017 | 13.33 | 32.25 Mbytes Step Temp TotEng KinEng PotEng Fmax Press 0 500 -130649.32 1490.405 -132139.72 460.81189 -300.90016 100 341.08362 -130532.

我想把下面的文本切分

Per MPI rank memory allocation (min/avg/max) = 7.017 | 13.33 | 32.25 Mbytes
Step Temp TotEng KinEng PotEng Fmax Press 
       0          500   -130649.32     1490.405   -132139.72    460.81189   -300.90016 
     100    341.08362   -130532.82    1016.7055   -131549.53    386.15965     -2581.88 
Loop time of 36971.3 on 4 procs for 200 steps with 1001 atoms
Per MPI rank memory allocation (min/avg/max) = 7.018 | 13.34 | 32.31 Mbytes
Step Temp TotEng KinEng PotEng Fmax Press 
  300000    488.81974   -134278.39    1457.0788   -135735.47    365.77279   -499.63638 
  300100     497.0212   -134247.19    1481.5258   -135728.72    239.86708    550.74065 
Loop time of 36971.3 on 4 procs for 200 steps with 1001 atoms

它应该如下所示

Step Temp TotEng KinEng PotEng Fmax Press 
       0          500   -130649.32     1490.405   -132139.72    460.81189   -300.90016 
     100    341.08362   -130532.82    1016.7055   -131549.53    386.15965     -2581.88
  300000    488.81974   -134278.39    1457.0788   -135735.47    365.77279   -499.63638 
  300100     497.0212   -134247.19    1481.5258   -135728.72    239.86708    550.74065

或

也就是说，我想从“步骤”的开始切到“循环”字符串，这些字符串以上述格式在文本中多次出现。我试着把它切成薄片

start_str = 'Step'
end_str = 'Loop'
f = open("log.lammps","r").read()
lines = f [ f.find(start_str) : f.find(end_str) ]
print(lines)

但它只打印前半部分（步到循环）并停止。

您可以使用

re

模块执行任务：

import re

with open('log.lammps', 'r') as f_in:
    s = f_in.read()

all_data = []
for part in re.findall(r'^Step.*?\n(.*?)\n^Loop', s, flags=re.S|re.M):
    for line in part.splitlines():
        all_data.append(line.split())

print(('{:<12}'*7).format('Step', 'Temp', 'TotEng', 'KinEng', 'PotEng', 'Fmax', 'Press'))
for row in all_data:
    print(('{:<12}'*7).format(*row))

或将所有数据放入数据框中：

import pandas as pd

df = pd.DataFrame(all_data, columns=['Step', 'Temp', 'TotEng', 'KinEng', 'PotEng', 'Fmax', 'Press'])
print(df)

印刷品：

Step        Temp        TotEng      KinEng      PotEng      Fmax        Press       
0           500         -130649.32  1490.405    -132139.72  460.81189   -300.90016  
100         341.08362   -130532.82  1016.7055   -131549.53  386.15965   -2581.88    
300000      488.81974   -134278.39  1457.0788   -135735.47  365.77279   -499.63638  
300100      497.0212    -134247.19  1481.5258   -135728.72  239.86708   550.74065

     Step       Temp      TotEng     KinEng      PotEng       Fmax       Press
0       0        500  -130649.32   1490.405  -132139.72  460.81189  -300.90016
1     100  341.08362  -130532.82  1016.7055  -131549.53  386.15965    -2581.88
2  300000  488.81974  -134278.39  1457.0788  -135735.47  365.77279  -499.63638
3  300100   497.0212  -134247.19  1481.5258  -135728.72  239.86708   550.74065

所有的数据线似乎都是从某种空间开始的。您可以使用此选项来识别它们：

with open("log.lammps", "r") as f:
    lines = f.read().splitlines()

# print header
print(lines[1])

# print all lines starting with a space. (It could also be a "\t" (tab), you have to try)
data_lines = [line for line in lines if line.startswith(" ")]

for line in data_lines:
    print(line)

with open("log.lammps", "r") as f:
    lines = f.read().splitlines()

# print header
print(lines[1])

# print all lines starting with a space. (It could also be a "\t" (tab), you have to try)
data_lines = [line for line in lines if line.startswith(" ")]

for line in data_lines:
    print(line)