打开一个文件,读取内容,使用regex将内容生成一个列表,然后用python打印列表
我正在使用“导入re和sys” 在终端上,当我键入“1.py a.txt”时 我想让它读“a.txt”,它有以下内容:打开一个文件,读取内容,使用regex将内容生成一个列表,然后用python打印列表,python,regex,Python,Regex,我正在使用“导入re和sys” 在终端上,当我键入“1.py a.txt”时 我想让它读“a.txt”,它有以下内容: 17:18:42.525964 IP 66.185.85.146.80 > 192.168.0.15.34436: Flags [.], seq 1:1449, ack 2555, win 1320, options [nop,nop,TS val 3551057710 ecr 43002332], length 1448 17:18:42.526623 IP 66.18
17:18:42.525964 IP 66.185.85.146.80 > 192.168.0.15.34436: Flags [.], seq 1:1449, ack 2555, win 1320, options [nop,nop,TS val 3551057710 ecr 43002332], length 1448
17:18:42.526623 IP 66.185.85.146.80 > 192.168.0.15.34436: Flags [.], seq 1449:2897, ack 2555, win 1320, options [nop,nop,TS val 3551057710 ecr 43002332], length 1448
17:18:42.526900 IP 192.168.0.15.34436 > 66.185.85.146.80: Flags [.], ack 2897, win 1444, options [nop,nop,TS val 43002448 ecr 3551057710], length 0
17:18:42.527694 IP 66.185.85.146.80 > 192.168.0.15.34436: Flags [.], seq 2897:14481, ack 2555, win 1320, options [nop,nop,TS val 3551057710 ecr 43002332], length 11584
17:18:42.527716 IP 192.168.0.15.34436 > 66.185.85.146.80: Flags [.], ack 14481, win 1444, options [nop,nop,TS val 43002448 ecr 3551057710], length 0
17:18:42.528794 IP 66.185.85.146.80 > 192.168.0.15.34436: Flags [.], seq 14481:23169, ack 2555, win 1320, options [nop,nop,TS val 3551057710 ecr 43002332], length 8688
17:18:42.528813 IP 192.168.0.15.34436 > 66.185.85.146.80: Flags [.], ack 23169, win 1444, options [nop,nop,TS val 43002448 ecr 3551057710], length 0
17:18:42.545191 IP 192.168.0.15.60030 > 52.2.63.29.80: Flags [.], seq 4113773418:4113774866, ack 850072640, win 270, options [nop,nop,TS val 43002452 ecr 9849626], length 1448
然后使用regex删除除ip地址和长度(总计)以外的所有内容,并将其打印为:
source: 66.185.85.146 dest: 192.168.0.15 total:1448
source: 66.185.85.146 dest: 192.168.0.15 total:1448
source: 192.168.0.15 dest: 66.185.85.146 total:0
但如果存在重复项,则其内容如下,其中将添加重复项的总量:
source: 66.185.85.146 dest: 192.168.0.15 total:2896
source: 192.168.0.15 dest: 66.185.85.146 total:0
此外,如果我像这样在终端中键入“-s”:
"1.py -s a.txt"
或
它应该排序,对于第一个-s,它将排序并打印内容,如果是-s ip,则对ip进行排序
目前,这是我为每一个项目,我想知道如何使用它们一起
#!/usr/bin/python3
import re
import sys
file = sys.argv[1]
a = open(file, "r")
for line in a:
line = line.rstrip()
c = re.findall(r'^(?:[0-9]{1,3}\.){3}[0-9]{1,3}$',line) #Yes I know its not the best regex for this, but I am testing it out for now
d = re.findall(r'\b(\d+)$\b',line)
if len(c) > 0 and len(d) > 0:
print("source:", c[0],"\t","dest:",c[1],"\t", "total:",d[0])
这就是我目前所拥有的,我不知道如何使用“-s”或如何排序,以及如何删除重复项,并在删除重复项时添加总数。要阅读
-s
,您可能需要一个库来解析参数,就像标准一样。它允许您指定脚本所需的参数及其描述,并解析这些参数并确保其格式
要对列表进行排序,可以使用排序(我的列表)
功能
最后,为了确保没有重复,您可以使用集合
。这将丢失列表排序,但由于您稍后将对其进行排序,因此应该不会有问题
另外,还有专门用于添加分组值并对其进行排序的集合
from collections import Counter
results = Counter()
for line in a:
line = line.rstrip()
c = re.findall(r'^(?:[0-9]{1,3}\.){3}[0-9]{1,3}$',line) #Yes I know its not the best regex for this, but I am testing it out for now
d = re.findall(r'\b(\d+)$\b',line)
if len(c) > 0 and len(d) > 0:
source, destination, length = c[0], c[1], d[0]
results[(source, destination)] += int(length)
# Print the sorted items.
for (source, destination), length in results.most_common():
print("source:", source, "\t", "dest:", destination, "\t", "total:", length)
对于
-s
参数,您需要的是ArgumentParser
,例如:
import argparse
...
def main():
parser = argparse.ArgumentParser()
parser.add_argument('-s', '--sort', action='append',
help='sort specific IP')
parser.add_argument('-s2', '--sortall', action='store_true',
help='sort all the IPs')
args = parser.parse_args()
if args.sortall:
# store all Ips
for ip in args.sort:
# store by ip
if __name__ == '__main__':
main()
现在,您可以使用以下脚本:
1.py a.txt -s 192.168.0.15
或
除此之外,关于如何将所有内容组合在一起,看起来像是一个家庭作业,因此您应该阅读更多关于python的内容来了解它。要添加的ArgumentParser-顺便说一句,对于输入文件路径,代码可以很好地工作-
import re
from collections import defaultdict
with open(r"C:\ips.txt",'rb') as ip_file:
txt = ip_file.read()
ip=re.findall(r'[0-9.]+[\s]+[>][\s0-9.]+',txt)
ip1 = ['>'.join(re.findall(r'[0-9.]+(?=[.])',i)) for i in ip]
packs = re.findall(r'(?<=length )[0-9]+',txt)
data = zip(ip1,packs)
d = defaultdict(list)
for k, v in data:
d[k].append(v)
for i,j in d.items():
source,destination = i.split('>')[0],i.split('>')[1]
print "source: {0} destination: {1} total: {2}".format(source,destination,sum(map(int,j)))
谢谢,试一下这个。这个长度加起来不够长。相反,它将添加多个不同长度的源/目标组合,并将丢弃已看到长度的源/目标组合。很好。马上修好。我认为使用
计数器
更适合这份工作。我更新了我的答案以反映这一点。您的赋值行为变量赋值错误(或者您喜欢混淆变量名^-)。它应该是source,destination,length=c[0],c[1],d[0]
我在args.sort中的“for-ip”处得到缩进错误:“@eLRuLLI在并没有端口的情况下尝试了它,正如你们所做的那个样,但我得到了一个错误,”indexer:list-index-out-range“@sislamp请记住接受一个对你们有用的答案。
1.py a.txt -s2
import re
from collections import defaultdict
with open(r"C:\ips.txt",'rb') as ip_file:
txt = ip_file.read()
ip=re.findall(r'[0-9.]+[\s]+[>][\s0-9.]+',txt)
ip1 = ['>'.join(re.findall(r'[0-9.]+(?=[.])',i)) for i in ip]
packs = re.findall(r'(?<=length )[0-9]+',txt)
data = zip(ip1,packs)
d = defaultdict(list)
for k, v in data:
d[k].append(v)
for i,j in d.items():
source,destination = i.split('>')[0],i.split('>')[1]
print "source: {0} destination: {1} total: {2}".format(source,destination,sum(map(int,j)))
source: 192.168.0.15 destination: 66.185.85.146 total: 0
source: 66.185.85.146 destination: 192.168.0.15 total: 23168
source: 192.168.0.15 destination: 52.2.63.29 total: 1448