如何在Python中删除损坏的数据

如何在Python中删除损坏的数据,python,python-2.7,Python,Python 2.7,所以我有一个文件,它获取上传脚本的统计数据。这就是我得到的 0K .......... .......... .......... .......... .......... 4% 72.83 KiB/s 470K .......... .......... .......... .......... .......... 9% 84.67 KiB/s 1088K .......... .......... .......... .......... ..........

所以我有一个文件,它获取上传脚本的统计数据。这就是我得到的

    0K .......... .......... .......... .......... ..........   4% 72.83 KiB/s
  470K .......... .......... .......... .......... ..........   9% 84.67 KiB/s
 1088K .......... .......... .......... .......... ..........  15% 91.78 KiB/s
 1708K .......... .......... .......... .......... ..........  20% 90.17 KiB/s
  1250K .......... .......... ....   5% 85.29 KiB/s
  6150K .........  10% 64.32 KiB/s
  8350K .......... .......... ........... .......... .....  15% 55.12 KiB/s
...==> STOR test10.zip ... ....  20% 59.38 KiB/s
    0K ................. .......... ... ............. ............. ............  25% 66.21 KiB/s
 2845K ...... .......... ........... .............. . ................... ...   4% 32.62 KiB/s
  464K ................ ... ..................  29% 59.62 KiB/s
 3371K .. ................ ....... ...... ................ ....... ........... ......   8% 38.75 KiB/s
  963K ............ ................ ....  34% 51.58 KiB/s
2253K .......... .......... .......... .......... ..........  24% 99.92 KiB/s
 2787K .......... .......... .......... .......... ..........  29% 92.12 KiB/s
 3291K .......... .......... .......... .......... ..........  33% 84.42 KiB/s
 3821K .......... .......... .......... .......... ..........  38% 75.88 KiB/s
 4342K .......... .......... .......... .......... ..........  43% 73.12 KiB/s
有些人破坏了由互联网问题引起的数据。 因此,我想要的是删除这些损坏的数据,它将如下所示:

    0K .......... .......... .......... .......... ..........   4% 72.83 KiB/s
  470K .......... .......... .......... .......... ..........   9% 84.67 KiB/s
 1088K .......... .......... .......... .......... ..........  15% 91.78 KiB/s
 1708K .......... .......... .......... .......... ..........  20% 90.17 KiB/s
 2253K .......... .......... .......... .......... ..........  24% 99.92 KiB/s
 2787K .......... .......... .......... .......... ..........  29% 92.12 KiB/s
 3291K .......... .......... .......... .......... ..........  33% 84.42 KiB/s
 3821K .......... .......... .......... .......... ..........  38% 75.88 KiB/s
 4342K .......... .......... .......... .......... ..........  43% 73.12 KiB/s

注意:并非所有情况下,数据都与上述示例相同。

一种方法是:

$ cat foo.py
data = '''
    0K .......... .......... .......... .......... ..........   4% 72.83 KiB/s
  470K .......... .......... .......... .......... ..........   9% 84.67 KiB/s
 1088K .......... .......... .......... .......... ..........  15% 91.78 KiB/s
 1708K .......... .......... .......... .......... ..........  20% 90.17 KiB/s
  1250K .......... .......... ....   5% 85.29 KiB/s
  6150K .........  10% 64.32 KiB/s
  8350K .......... .......... ........... .......... .....  15% 55.12 KiB/s
...==> STOR test10.zip ... ....  20% 59.38 KiB/s
    0K ................. .......... ... ............. ............. ............  25% 66.21 KiB/s
 2845K ...... .......... ........... .............. . ................... ...   4% 32.62 KiB/s
  464K ................ ... ..................  29% 59.62 KiB/s
 3371K .. ................ ....... ...... ................ ....... ........... ......   8% 38.75 KiB/s
  963K ............ ................ ....  34% 51.58 KiB/s
2253K .......... .......... .......... .......... ..........  24% 99.92 KiB/s
 2787K .......... .......... .......... .......... ..........  29% 92.12 KiB/s
 3291K .......... .......... .......... .......... ..........  33% 84.42 KiB/s
 3821K .......... .......... .......... .......... ..........  38% 75.88 KiB/s
 4342K .......... .......... .......... .......... ..........  43% 73.12 KiB/s
'''

import re
pattern = r".*\d+K\s+(\.{10}\s){5}\s*\d+%\s+\d+\.\d+\s+KiB\/s.*"

for line in data.split('\n'):
    if re.match(pattern, line) is not None:
        print(line)
$
$ python foo.py
    0K .......... .......... .......... .......... ..........   4% 72.83 KiB/s
  470K .......... .......... .......... .......... ..........   9% 84.67 KiB/s
 1088K .......... .......... .......... .......... ..........  15% 91.78 KiB/s
 1708K .......... .......... .......... .......... ..........  20% 90.17 KiB/s
2253K .......... .......... .......... .......... ..........  24% 99.92 KiB/s
 2787K .......... .......... .......... .......... ..........  29% 92.12 KiB/s
 3291K .......... .......... .......... .......... ..........  33% 84.42 KiB/s
 3821K .......... .......... .......... .......... ..........  38% 75.88 KiB/s
 4342K .......... .......... .......... .......... ..........  43% 73.12 KiB/s
$ 
执行结果:

$ cat foo.py
data = '''
    0K .......... .......... .......... .......... ..........   4% 72.83 KiB/s
  470K .......... .......... .......... .......... ..........   9% 84.67 KiB/s
 1088K .......... .......... .......... .......... ..........  15% 91.78 KiB/s
 1708K .......... .......... .......... .......... ..........  20% 90.17 KiB/s
  1250K .......... .......... ....   5% 85.29 KiB/s
  6150K .........  10% 64.32 KiB/s
  8350K .......... .......... ........... .......... .....  15% 55.12 KiB/s
...==> STOR test10.zip ... ....  20% 59.38 KiB/s
    0K ................. .......... ... ............. ............. ............  25% 66.21 KiB/s
 2845K ...... .......... ........... .............. . ................... ...   4% 32.62 KiB/s
  464K ................ ... ..................  29% 59.62 KiB/s
 3371K .. ................ ....... ...... ................ ....... ........... ......   8% 38.75 KiB/s
  963K ............ ................ ....  34% 51.58 KiB/s
2253K .......... .......... .......... .......... ..........  24% 99.92 KiB/s
 2787K .......... .......... .......... .......... ..........  29% 92.12 KiB/s
 3291K .......... .......... .......... .......... ..........  33% 84.42 KiB/s
 3821K .......... .......... .......... .......... ..........  38% 75.88 KiB/s
 4342K .......... .......... .......... .......... ..........  43% 73.12 KiB/s
'''

import re
pattern = r".*\d+K\s+(\.{10}\s){5}\s*\d+%\s+\d+\.\d+\s+KiB\/s.*"

for line in data.split('\n'):
    if re.match(pattern, line) is not None:
        print(line)
$
$ python foo.py
    0K .......... .......... .......... .......... ..........   4% 72.83 KiB/s
  470K .......... .......... .......... .......... ..........   9% 84.67 KiB/s
 1088K .......... .......... .......... .......... ..........  15% 91.78 KiB/s
 1708K .......... .......... .......... .......... ..........  20% 90.17 KiB/s
2253K .......... .......... .......... .......... ..........  24% 99.92 KiB/s
 2787K .......... .......... .......... .......... ..........  29% 92.12 KiB/s
 3291K .......... .......... .......... .......... ..........  33% 84.42 KiB/s
 3821K .......... .......... .......... .......... ..........  38% 75.88 KiB/s
 4342K .......... .......... .......... .......... ..........  43% 73.12 KiB/s
$ 

每个有效数据项都遵循特定的模式。将该模式定义为正则表达式。读取接收到的数据,并根据模式匹配每一行。丢弃不匹配的内容。保留任何匹配项。最后合并保留的数据以获得过滤后的未损坏数据。@AndrewGuy您好,我使用正则表达式在python中提取了该数据,我试图找出解决该问题的方法,但我发现很难解决。@Sharad您好,我明白您的意思了。我只是不清楚如何编码。那么,请告诉我我的想法是否有误,我需要将模式存储到变量中?@CCISIT如果您使用正则表达式获取初始数据,发布此正则表达式可能会有所帮助。这可能是对模式的简单更改,以排除不需要的数据。但是,如果你要求人们仅仅为你编写一个完整的解决方案,你将得到更少的帮助。