如何在Python中删除损坏的数据
所以我有一个文件,它获取上传脚本的统计数据。这就是我得到的如何在Python中删除损坏的数据,python,python-2.7,Python,Python 2.7,所以我有一个文件,它获取上传脚本的统计数据。这就是我得到的 0K .......... .......... .......... .......... .......... 4% 72.83 KiB/s 470K .......... .......... .......... .......... .......... 9% 84.67 KiB/s 1088K .......... .......... .......... .......... ..........
0K .......... .......... .......... .......... .......... 4% 72.83 KiB/s
470K .......... .......... .......... .......... .......... 9% 84.67 KiB/s
1088K .......... .......... .......... .......... .......... 15% 91.78 KiB/s
1708K .......... .......... .......... .......... .......... 20% 90.17 KiB/s
1250K .......... .......... .... 5% 85.29 KiB/s
6150K ......... 10% 64.32 KiB/s
8350K .......... .......... ........... .......... ..... 15% 55.12 KiB/s
...==> STOR test10.zip ... .... 20% 59.38 KiB/s
0K ................. .......... ... ............. ............. ............ 25% 66.21 KiB/s
2845K ...... .......... ........... .............. . ................... ... 4% 32.62 KiB/s
464K ................ ... .................. 29% 59.62 KiB/s
3371K .. ................ ....... ...... ................ ....... ........... ...... 8% 38.75 KiB/s
963K ............ ................ .... 34% 51.58 KiB/s
2253K .......... .......... .......... .......... .......... 24% 99.92 KiB/s
2787K .......... .......... .......... .......... .......... 29% 92.12 KiB/s
3291K .......... .......... .......... .......... .......... 33% 84.42 KiB/s
3821K .......... .......... .......... .......... .......... 38% 75.88 KiB/s
4342K .......... .......... .......... .......... .......... 43% 73.12 KiB/s
有些人破坏了由互联网问题引起的数据。
因此,我想要的是删除这些损坏的数据,它将如下所示:
0K .......... .......... .......... .......... .......... 4% 72.83 KiB/s
470K .......... .......... .......... .......... .......... 9% 84.67 KiB/s
1088K .......... .......... .......... .......... .......... 15% 91.78 KiB/s
1708K .......... .......... .......... .......... .......... 20% 90.17 KiB/s
2253K .......... .......... .......... .......... .......... 24% 99.92 KiB/s
2787K .......... .......... .......... .......... .......... 29% 92.12 KiB/s
3291K .......... .......... .......... .......... .......... 33% 84.42 KiB/s
3821K .......... .......... .......... .......... .......... 38% 75.88 KiB/s
4342K .......... .......... .......... .......... .......... 43% 73.12 KiB/s
注意:并非所有情况下,数据都与上述示例相同。一种方法是:
$ cat foo.py
data = '''
0K .......... .......... .......... .......... .......... 4% 72.83 KiB/s
470K .......... .......... .......... .......... .......... 9% 84.67 KiB/s
1088K .......... .......... .......... .......... .......... 15% 91.78 KiB/s
1708K .......... .......... .......... .......... .......... 20% 90.17 KiB/s
1250K .......... .......... .... 5% 85.29 KiB/s
6150K ......... 10% 64.32 KiB/s
8350K .......... .......... ........... .......... ..... 15% 55.12 KiB/s
...==> STOR test10.zip ... .... 20% 59.38 KiB/s
0K ................. .......... ... ............. ............. ............ 25% 66.21 KiB/s
2845K ...... .......... ........... .............. . ................... ... 4% 32.62 KiB/s
464K ................ ... .................. 29% 59.62 KiB/s
3371K .. ................ ....... ...... ................ ....... ........... ...... 8% 38.75 KiB/s
963K ............ ................ .... 34% 51.58 KiB/s
2253K .......... .......... .......... .......... .......... 24% 99.92 KiB/s
2787K .......... .......... .......... .......... .......... 29% 92.12 KiB/s
3291K .......... .......... .......... .......... .......... 33% 84.42 KiB/s
3821K .......... .......... .......... .......... .......... 38% 75.88 KiB/s
4342K .......... .......... .......... .......... .......... 43% 73.12 KiB/s
'''
import re
pattern = r".*\d+K\s+(\.{10}\s){5}\s*\d+%\s+\d+\.\d+\s+KiB\/s.*"
for line in data.split('\n'):
if re.match(pattern, line) is not None:
print(line)
$
$ python foo.py
0K .......... .......... .......... .......... .......... 4% 72.83 KiB/s
470K .......... .......... .......... .......... .......... 9% 84.67 KiB/s
1088K .......... .......... .......... .......... .......... 15% 91.78 KiB/s
1708K .......... .......... .......... .......... .......... 20% 90.17 KiB/s
2253K .......... .......... .......... .......... .......... 24% 99.92 KiB/s
2787K .......... .......... .......... .......... .......... 29% 92.12 KiB/s
3291K .......... .......... .......... .......... .......... 33% 84.42 KiB/s
3821K .......... .......... .......... .......... .......... 38% 75.88 KiB/s
4342K .......... .......... .......... .......... .......... 43% 73.12 KiB/s
$
执行结果:
$ cat foo.py
data = '''
0K .......... .......... .......... .......... .......... 4% 72.83 KiB/s
470K .......... .......... .......... .......... .......... 9% 84.67 KiB/s
1088K .......... .......... .......... .......... .......... 15% 91.78 KiB/s
1708K .......... .......... .......... .......... .......... 20% 90.17 KiB/s
1250K .......... .......... .... 5% 85.29 KiB/s
6150K ......... 10% 64.32 KiB/s
8350K .......... .......... ........... .......... ..... 15% 55.12 KiB/s
...==> STOR test10.zip ... .... 20% 59.38 KiB/s
0K ................. .......... ... ............. ............. ............ 25% 66.21 KiB/s
2845K ...... .......... ........... .............. . ................... ... 4% 32.62 KiB/s
464K ................ ... .................. 29% 59.62 KiB/s
3371K .. ................ ....... ...... ................ ....... ........... ...... 8% 38.75 KiB/s
963K ............ ................ .... 34% 51.58 KiB/s
2253K .......... .......... .......... .......... .......... 24% 99.92 KiB/s
2787K .......... .......... .......... .......... .......... 29% 92.12 KiB/s
3291K .......... .......... .......... .......... .......... 33% 84.42 KiB/s
3821K .......... .......... .......... .......... .......... 38% 75.88 KiB/s
4342K .......... .......... .......... .......... .......... 43% 73.12 KiB/s
'''
import re
pattern = r".*\d+K\s+(\.{10}\s){5}\s*\d+%\s+\d+\.\d+\s+KiB\/s.*"
for line in data.split('\n'):
if re.match(pattern, line) is not None:
print(line)
$
$ python foo.py
0K .......... .......... .......... .......... .......... 4% 72.83 KiB/s
470K .......... .......... .......... .......... .......... 9% 84.67 KiB/s
1088K .......... .......... .......... .......... .......... 15% 91.78 KiB/s
1708K .......... .......... .......... .......... .......... 20% 90.17 KiB/s
2253K .......... .......... .......... .......... .......... 24% 99.92 KiB/s
2787K .......... .......... .......... .......... .......... 29% 92.12 KiB/s
3291K .......... .......... .......... .......... .......... 33% 84.42 KiB/s
3821K .......... .......... .......... .......... .......... 38% 75.88 KiB/s
4342K .......... .......... .......... .......... .......... 43% 73.12 KiB/s
$
每个有效数据项都遵循特定的模式。将该模式定义为正则表达式。读取接收到的数据,并根据模式匹配每一行。丢弃不匹配的内容。保留任何匹配项。最后合并保留的数据以获得过滤后的未损坏数据。@AndrewGuy您好,我使用正则表达式在python中提取了该数据,我试图找出解决该问题的方法,但我发现很难解决。@Sharad您好,我明白您的意思了。我只是不清楚如何编码。那么,请告诉我我的想法是否有误,我需要将模式存储到变量中?@CCISIT如果您使用正则表达式获取初始数据,发布此正则表达式可能会有所帮助。这可能是对模式的简单更改,以排除不需要的数据。但是,如果你要求人们仅仅为你编写一个完整的解决方案,你将得到更少的帮助。