在python中匹配和添加reg表达式值
我有一个包含到服务器的跟踪路由的文件。我想提取最后一次接触的路由器并编译平均延迟 我尝试了以下操作,但它只显示单个延迟值。如何添加延迟并获得延迟的平均值 文件包含跟踪:在python中匹配和添加reg表达式值,python,regex,Python,Regex,我有一个包含到服务器的跟踪路由的文件。我想提取最后一次接触的路由器并编译平均延迟 我尝试了以下操作,但它只显示单个延迟值。如何添加延迟并获得延迟的平均值 文件包含跟踪: traceroute to 34.233.68.171 (34.233.68.171), 30 hops max, 60 byte packets 1 192.168.1.1 (192.168.1.1) 1.458 ms 1.319 ms 1.236 ms 2 173.230.125.21 (173.230.125
traceroute to 34.233.68.171 (34.233.68.171), 30 hops max, 60 byte packets
1 192.168.1.1 (192.168.1.1) 1.458 ms 1.319 ms 1.236 ms
2 173.230.125.21 (173.230.125.21) 9.911 ms 9.308 ms 9.702 ms
3 99.82.176.202 (99.82.176.202) 9.616 ms 10.239 ms 10.095 ms
4 54.239.104.28 (54.239.104.28) 31.762 ms 31.663 ms 54.239.104.88 (54.239.104.88) 32.679 ms
5 54.239.104.23 (54.239.104.23) 28.090 ms 54.239.104.99 (54.239.104.99) 26.883 ms 54.239.104.63 (54.239.104.63) 30.373 ms
6 * * *
7 54.239.43.176 (54.239.43.176) 22.007 ms 54.240.229.173 (54.240.229.173) 27.092 ms 54.239.42.188 (54.239.42.188) 34.865 ms
8 * * *
9 * * *
10 * * *
11 * * *
12 * * *
13 * * *
14 * * *
15 * * *
16 * * *
17 * * *
18 52.93.28.172 (52.93.28.172) 22.837 ms 52.93.28.194 (52.93.28.194) 31.958 ms 52.93.28.154 (52.93.28.154) 27.522 ms
19 * * *
20 * * *
21 * * *
22 * * *
23 * * *
24 * * *
25 * * *
36 * * *
import re
rgexpress = re.compile(r'\s\s\d\d?.\d\d\d+\s+ms')
with open("new2") as f:
for line in f:
result = rgexpress.search(line)
print(result)
我的代码:
traceroute to 34.233.68.171 (34.233.68.171), 30 hops max, 60 byte packets
1 192.168.1.1 (192.168.1.1) 1.458 ms 1.319 ms 1.236 ms
2 173.230.125.21 (173.230.125.21) 9.911 ms 9.308 ms 9.702 ms
3 99.82.176.202 (99.82.176.202) 9.616 ms 10.239 ms 10.095 ms
4 54.239.104.28 (54.239.104.28) 31.762 ms 31.663 ms 54.239.104.88 (54.239.104.88) 32.679 ms
5 54.239.104.23 (54.239.104.23) 28.090 ms 54.239.104.99 (54.239.104.99) 26.883 ms 54.239.104.63 (54.239.104.63) 30.373 ms
6 * * *
7 54.239.43.176 (54.239.43.176) 22.007 ms 54.240.229.173 (54.240.229.173) 27.092 ms 54.239.42.188 (54.239.42.188) 34.865 ms
8 * * *
9 * * *
10 * * *
11 * * *
12 * * *
13 * * *
14 * * *
15 * * *
16 * * *
17 * * *
18 52.93.28.172 (52.93.28.172) 22.837 ms 52.93.28.194 (52.93.28.194) 31.958 ms 52.93.28.154 (52.93.28.154) 27.522 ms
19 * * *
20 * * *
21 * * *
22 * * *
23 * * *
24 * * *
25 * * *
36 * * *
import re
rgexpress = re.compile(r'\s\s\d\d?.\d\d\d+\s+ms')
with open("new2") as f:
for line in f:
result = rgexpress.search(line)
print(result)
我的结果是:
None
<re.Match object; span=(28, 38), match=' 1.458 ms'>
<re.Match object; span=(35, 45), match=' 9.911 ms'>
<re.Match object; span=(33, 43), match=' 9.616 ms'>
<re.Match object; span=(33, 44), match=' 31.762 ms'>
<re.Match object; span=(33, 44), match=' 28.090 ms'>
None
<re.Match object; span=(33, 44), match=' 22.007 ms'>
None
None
None
None
None
None
None
None
None
None
<re.Match object; span=(31, 42), match=' 22.837 ms'>
None
None
None
None
None
None
None
None
无
没有一个
没有一个
没有一个
没有一个
没有一个
没有一个
没有一个
没有一个
没有一个
没有一个
没有一个
没有一个
没有一个
没有一个
没有一个
没有一个
没有一个
没有一个
没有一个
预期结果:
'''
(22.837+31.958+27.522)/3 = 27.439
'''
平均值=27.439强>
import re
rgexpress = re.compile(r'\s\s\d\d?.\d\d\d+\s+ms')
sum = 0
count = 0
with open("new2") as f:
for line in f:
result = rgexpress.search(line)
if result:
sum += float(result.group(0))
count += 1
print(sum/count)
我希望编辑后的版本更好。用于查找一行中所有匹配的延迟,将它们转换为float并计算平均值:
import re
# some sample data
data = [
"7 54.239.43.176 (54.239.43.176) 22.007 ms 54.240.229.173 (54.240.229.173) 27.092 ms 54.239.42.188 (54.239.42.188) 34.865 ms",
"8 * * *"
]
re_delay = re.compile(r'\d+\.\d{3}(?= ms)')
for line in data:
delays = [float(delay) for delay in re_delay.findall(line)]
if delays:
mean = sum(delays)/len(delays)
print(mean)
# 27.988
请注意,您应该使用\d+
,就像我在这里所做的那样,而不是\d\d
作为延迟的整数部分,否则任何大于100ms的延迟都不会匹配
编辑以回答评论中的问题: 您还可以建立一个方法列表:
re_delay = re.compile(r'\d+\.\d{3}(?= ms)')
out = []
for line in data:
delays = [float(delay) for delay in re_delay.findall(line)]
if delays:
mean = sum(delays)/len(delays)
out.append(mean)
并使用
print(out[-1])
为什么不使用类似于
csv
(可能是熊猫?)的库来实现这一点?你能描述一下数据的格式吗?哪一列包含“延迟值”?我同意Alexander的观点,在pandas中,您可以进行各种计算。这里没有CSV,数据是traceroute命令的输出,与问题中所示完全相同。非常感谢。如果我必须只提取第18行(又名last router touched),而不使用文字字符,该怎么办?您可以将输出放在列表中并访问最后一行,请参见编辑。