如何使用Python删除数据文件中的损坏事件?

如何使用Python删除数据文件中的损坏事件?,python,regex,corrupt,Python,Regex,Corrupt,我有一个数据文件,其中包含以下形式的事件条目: <event> 4 0 0.5048900E-01 0.1915537E+03 0.7546771E-02 0.1157067E+00 21 -1 0 0 503 502 0.00000000000E+00 0.00000000000E+00 0.20916118194E+03 0.20916118194E+03 0.00000000000E+00 0. 1. 2

我有一个数据文件,其中包含以下形式的事件条目:

<event>
 4   0  0.5048900E-01  0.1915537E+03  0.7546771E-02  0.1157067E+00
       21   -1    0    0  503  502  0.00000000000E+00  0.00000000000E+00  0.20916118194E+03  0.20916118194E+03  0.00000000000E+00 0.  1.
       21   -1    0    0  501  503  0.00000000000E+00  0.00000000000E+00 -0.19069665391E+03  0.19069665391E+03  0.00000000000E+00 0.  1.
        6    1    1    2  501    0  0.64272189331E+02  0.51311781060E+02 -0.47339360468E+02  0.19731656861E+03  0.17300000000E+03 0. -1.
       -6    1    1    2    0  502 -0.64272189331E+02 -0.51311781060E+02  0.65803888495E+02  0.20254126725E+03  0.17300000000E+03 0. -1.
</event>
<event>
 4   0  0.5048900E-01  0.1923878E+03  0.7546771E-02  0.1156325E+00
       21   -1    0    0  503  502  0.00000000000E+00  0.00000000000E+00  0.24573125562E+02  0.24573125562E+02  0.00000000000E+00 0.  1.
       21   -1    0    0  501  503  0.00000000000E+00  0.00000000000E+00 -0.15553273337E+04  0.15553273337E+04  0.00000000000E+00 0. -1.
        6    1    1    2  501    0  0.98476452980E+01  0.83588711195E+02 -0.62504106700E+03  0.65397965120E+03  0.17300000000E+03 0.  1.
       -6    1    1    2    0  502 -0.98476452980E+01 -0.83588711195E+02 -0.90571314110E+03  0.92592080802E+03  0.17300000000E+03 0. -1.
</event>
<event>
 4   0  0.5048900E-01  0.1782060E+03  0.7546771E-02  0.1169551E+00
       21   -1    0    0  501  502  0.00000000000E+00  0.00000000000E+00  0.17068413103E+02  0.17068413103E+02  0.00000000000E+00 0.  1.
       21   -1    0    0  502  503  0.00000000000E+00  0.00000000000E+00 -0.19878188087E+04  0.19878188087E+04  0.00000000000E+00 0.  1.
        6    1    1    2  501    0  0.40928013982E+02 -0.12380831554E+02 -0.73177042255E+03  0.75315691502E+03  0.17300000000E+03 0.  1.
       -6    1    1    2    0  503 -0.40928013982E+02  0.12380831554E+02 -0.12389799731E+04  0.12517303068E+04  0.17300000000E+03 0.  1.
</event>
<event>
 4   0  0.5048900E-01  0.1748201E+03  0.7546771E-02  0.1172912E+00
       21   -1    0    0  501  502  0.00000000000E+00  0.00000000000E+00  0.50201908406E+02  0.50201908406E+02  0.00000000000E+00 0. -1.
       21   -1    0    0  502  503  0.00000000000E+00  0.00000000000E+00 -0.81442244278E+03  0.81442244278E+03  0.00000000000E+00 0. -1.
        6    1    1    2  501    0 -0.76531495601E+01 -0.23968586903E+02 -0.16487721432E+03  0.24030513864E+03  0.17300000000E+03 0. -1.
       -6    1    1    2    0  503  0.76531495601E+01  0.23968586903E+02 -0.59934332005E+03  0.62431921254E+03  0.17300000000E+03 0. -1.
</event>
<event>
 4   0  0.5048900E-01  0.2161793E+03  0.7546771E-02  0.1136764E+00
       21   -1    0    0  501  502  0.00000000000E+00  0.00000000000E+00  0.44614769518E+03  0.44614769518E+03  0.00000000000E+00 0. -1.
       21   -1    0    0  502  503  0.00000000000E+00  0.00000000000E+00 -0.11252245546E+03  0.11252245546E+03  0.00000000000E+00 0.  1.
        6    1    1    2  501    0  0.12142710736E+03 -0.45386865351E+02  0.24023253309E+03  0.32317979501E+03  0.17300000000E+03 0. -1.
       -6    1    1    2    0  503 -0.12142710736E+03  0.45386865351E+02  0.93392706626E+02  0.23549035564E+03  0.17300000000E+03 0.  1.
</event>

4 0 0.5048900E-01 0.1915537E+03 0.7546771E-02 0.1157067E+00
21-1 0 0 503 502 0.00000000000 E+00 0.00000000000 E+00 0.20916118194E+03 0.20916118194E+03 0.00000000000 E+00 0。1.
21-1 0 0 501 503 0.00000000000 E+00 0.00000000000 E+00-0.19069665391E+03 0.19069665391E+03 0.00000000000 E+00 0。1.
6112501 0.64272189331E+02 0.5131178160E+02-0.47339360468E+02 0.19731656861E+03 0.17300000000E+03 0-1.
-61120502-0.64272189331E+02-0.5131178160E+02 0.658038895E+02 0.20254126725E+03 0.17300000000E+03 0-1.
4 0 0.5048900E-01 0.1923878E+03 0.7546771E-02 0.1156325E+00
21-1 0 0 503 502 0.00000000000 E+00 0.00000000000 E+00 0.24573125562E+02 0.24573125562E+02 0.00000000000 E+00 0。1.
21-1 0 0 501 503 0.00000000000 E+00 0.00000000000 E+00-0.15553273337E+04 0.15553273337E+04 0.00000000000 E+00 0-1.
6112501 0.98476452980E+01 0.83588711195E+02-0.62504106700E+03 0.65397965120E+03 0.17300000000E+03 0。1.
-61120502-0.98476452980E+01-0.83588711195E+02-0.90571314110E+03 0.92592080802E+03 0.17300000000E+03 0-1.
4 0 0.5048900E-01 0.1782060E+03 0.7546771E-02 0.1169551E+00
21-1 0 0 501 502 0.00000000000 E+00 0.00000000000 E+00 0.17068413103E+02 0.17068413103E+02 0.00000000000 E+00 0。1.
21-100502503000000000E+000000000E+00-0.19878188087E+040.19878188087E+040.00000000000E+000。1.
6112501 0.40928013982E+02-0.12380831554E+02-0.73177042255E+03 0.75315691502E+03 0.17300000000E+03 0。1.
-61120503-0.40928013982E+02 0.12380831554E+02-0.12389799731E+04 0.12517303068E+04 0.17300000000E+03 0。1.
4 0 0.5048900E-01 0.1748201E+03 0.7546771E-02 0.1172912E+00
21-1 0 0 501 502 0.00000000000 E+00 0.00000000000 E+00 0.50201908406E+02 0.50201908406E+02 0.00000000000 E+00 0-1.
21-1 0 0 502 503 0.00000000000 E+00 0.00000000000 E+00-0.81442244278E+03 0.81442244278E+03 0.00000000000 E+00 0-1.
61125010-0.76531495601E+01-0.23968586903E+02-0.16487721432E+03 0.24030513864E+03 0.17300000000E+03 0-1.
-611205030.76531495601E+01 0.23968586903E+02-0.59934332005E+03 0.62431921254E+03 0.17300000000E+03 0-1.
4 0 0.5048900E-01 0.2161793E+03 0.7546771E-02 0.1136764E+00
21-1 0 0 501 502 0.00000000000 E+00 0.00000000000 E+00 0.44614769518E+03 0.44614769518E+03 0.00000000000 E+00 0-1.
21-1 0 0 502 503 0.00000000000 E+00 0.00000000000 E+00-0.11252245546E+03 0.11252245546E+03 0.00000000000 E+00 0。1.
61125010.12142710736E+03-0.45386865351E+02 0.24023253309E+03 0.32317979501E+03 0.17300000000E+03 0-1.
-61120503-0.12142710736E+03 0.45386865351E+02 0.93392706626E+02 0.23549035564E+03 0.17300000000E+03 0。1.
我要生成这样的东西,生成的过程不会是无错误的;一些事件条目将出现格式错误和损坏。如何使用Python检测和删除非上述形式的事件条目

如何检测和删除不属于上述形式的事件条目

您的活动格式规范是什么

我已经猜到了输入数据的一些要求,并提出了一个不太复杂但非常混乱的正则表达式:

import re

rx = re.compile(r'<event>$'
                r'(?P<body>\s*\d\s+\d'
                r'(\s+(\+|-)?\d+\.\d+(e|E)(\+|-)\d+){4}$'
                r'((\s+[-\d]+){6}(\s+(\+|-)?\d+\.\d+(e|E)(\+|-)\d+){5}'
                r'(\s+[-\d.]+){2}$)+)', re.M)

for match in rx.finditer(your_input_data):
    print(match.group('body'))
重新导入
rx=重新编译(r'$)
r'(?P\s*\d\s+\d'
r'(\s+(\+\124;-)?\ d+.\d+(e | e)(\+\124;-)\ d+{4}$'
r'(\s++-\d]+){6}(\s++(\++-)?\d++.\d++(e | e)(\++-)\d+{5}
r'(\s+[-\d.]+{2}$)+',re.M)
对于rx.finditer中的匹配(您的输入数据):
打印(match.group('body'))

查看正则表达式的交互式解释。您很可能需要进行大量的微调,但这可能只是一个开始。

非常感谢您对在Python中使用正则表达式的指导。您的解决方案非常清楚,在线正则表达式测试仪看起来非常有用(尽管我认为我需要更多地了解正则表达式,以便更好地理解它)。