在python中使用类似的开始和结束字符串切片文本文件的一部分
我想把下面的文本切分在python中使用类似的开始和结束字符串切片文本文件的一部分,python,indexing,text,slice,Python,Indexing,Text,Slice,我想把下面的文本切分 Per MPI rank memory allocation (min/avg/max) = 7.017 | 13.33 | 32.25 Mbytes Step Temp TotEng KinEng PotEng Fmax Press 0 500 -130649.32 1490.405 -132139.72 460.81189 -300.90016 100 341.08362 -130532.
Per MPI rank memory allocation (min/avg/max) = 7.017 | 13.33 | 32.25 Mbytes
Step Temp TotEng KinEng PotEng Fmax Press
0 500 -130649.32 1490.405 -132139.72 460.81189 -300.90016
100 341.08362 -130532.82 1016.7055 -131549.53 386.15965 -2581.88
Loop time of 36971.3 on 4 procs for 200 steps with 1001 atoms
Per MPI rank memory allocation (min/avg/max) = 7.018 | 13.34 | 32.31 Mbytes
Step Temp TotEng KinEng PotEng Fmax Press
300000 488.81974 -134278.39 1457.0788 -135735.47 365.77279 -499.63638
300100 497.0212 -134247.19 1481.5258 -135728.72 239.86708 550.74065
Loop time of 36971.3 on 4 procs for 200 steps with 1001 atoms
它应该如下所示
Step Temp TotEng KinEng PotEng Fmax Press
0 500 -130649.32 1490.405 -132139.72 460.81189 -300.90016
100 341.08362 -130532.82 1016.7055 -131549.53 386.15965 -2581.88
300000 488.81974 -134278.39 1457.0788 -135735.47 365.77279 -499.63638
300100 497.0212 -134247.19 1481.5258 -135728.72 239.86708 550.74065
或
也就是说,我想从“步骤”的开始切到“循环”字符串,这些字符串以上述格式在文本中多次出现。我试着把它切成薄片
start_str = 'Step'
end_str = 'Loop'
f = open("log.lammps","r").read()
lines = f [ f.find(start_str) : f.find(end_str) ]
print(lines)
但它只打印前半部分(步到循环)并停止。您可以使用
re
模块执行任务:
import re
with open('log.lammps', 'r') as f_in:
s = f_in.read()
all_data = []
for part in re.findall(r'^Step.*?\n(.*?)\n^Loop', s, flags=re.S|re.M):
for line in part.splitlines():
all_data.append(line.split())
print(('{:<12}'*7).format('Step', 'Temp', 'TotEng', 'KinEng', 'PotEng', 'Fmax', 'Press'))
for row in all_data:
print(('{:<12}'*7).format(*row))
或将所有数据放入数据框中:
import pandas as pd
df = pd.DataFrame(all_data, columns=['Step', 'Temp', 'TotEng', 'KinEng', 'PotEng', 'Fmax', 'Press'])
print(df)
印刷品:
Step Temp TotEng KinEng PotEng Fmax Press
0 500 -130649.32 1490.405 -132139.72 460.81189 -300.90016
100 341.08362 -130532.82 1016.7055 -131549.53 386.15965 -2581.88
300000 488.81974 -134278.39 1457.0788 -135735.47 365.77279 -499.63638
300100 497.0212 -134247.19 1481.5258 -135728.72 239.86708 550.74065
Step Temp TotEng KinEng PotEng Fmax Press
0 0 500 -130649.32 1490.405 -132139.72 460.81189 -300.90016
1 100 341.08362 -130532.82 1016.7055 -131549.53 386.15965 -2581.88
2 300000 488.81974 -134278.39 1457.0788 -135735.47 365.77279 -499.63638
3 300100 497.0212 -134247.19 1481.5258 -135728.72 239.86708 550.74065
所有的数据线似乎都是从某种空间开始的。您可以使用此选项来识别它们:
with open("log.lammps", "r") as f:
lines = f.read().splitlines()
# print header
print(lines[1])
# print all lines starting with a space. (It could also be a "\t" (tab), you have to try)
data_lines = [line for line in lines if line.startswith(" ")]
for line in data_lines:
print(line)
with open("log.lammps", "r") as f:
lines = f.read().splitlines()
# print header
print(lines[1])
# print all lines starting with a space. (It could also be a "\t" (tab), you have to try)
data_lines = [line for line in lines if line.startswith(" ")]
for line in data_lines:
print(line)