Python字典和
嗨,伙计们,我有一个问题,如何在字典里对相同的IP地址求和。 我有一个输入文件,该文件如下所示:Python字典和,python,regex,parsing,csv,dictionary,Python,Regex,Parsing,Csv,Dictionary,嗨,伙计们,我有一个问题,如何在字典里对相同的IP地址求和。 我有一个输入文件,该文件如下所示: IP , Byte 10.180.176.61,3669 10.164.134.193,882 10.164.132.209,4168 10.120.81.141,4297 10.180.176.61,100 #!/usr/bin/python # -*- coding: utf-8 -*- import re,sys, os from collections import
IP , Byte
10.180.176.61,3669
10.164.134.193,882
10.164.132.209,4168
10.120.81.141,4297
10.180.176.61,100
#!/usr/bin/python
# -*- coding: utf-8 -*-
import re,sys, os
from collections import defaultdict
f = open('splited/small_file_1000000.csv','r')
o = open('gotovo1.csv','w')
list_of_dictionaries = {}
for line in f:
if re.search(r'\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}.*',line):
line_ip = re.findall(r'\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}',line)[0]
line_by = re.findall(r'\,\d+',line)[0]
line_b = re.sub(r'\,','',line_by)
list_of_dictionaries['IP'] = line_ip
list_of_dictionaries['VAL'] = int(line_b)
c = defaultdict(int)
for d in list_of_dictionaries:
c[d['IP']] += d['VAL']
print c
我的动作是打开那个文件,用逗号后的数字解析IP地址,这样我就可以对一个IP地址的所有字节求和。所以我可以得到如下结果:
IP 10.180.176.61 , 37669
我的代码如下所示:
IP , Byte
10.180.176.61,3669
10.164.134.193,882
10.164.132.209,4168
10.120.81.141,4297
10.180.176.61,100
#!/usr/bin/python
# -*- coding: utf-8 -*-
import re,sys, os
from collections import defaultdict
f = open('splited/small_file_1000000.csv','r')
o = open('gotovo1.csv','w')
list_of_dictionaries = {}
for line in f:
if re.search(r'\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}.*',line):
line_ip = re.findall(r'\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}',line)[0]
line_by = re.findall(r'\,\d+',line)[0]
line_b = re.sub(r'\,','',line_by)
list_of_dictionaries['IP'] = line_ip
list_of_dictionaries['VAL'] = int(line_b)
c = defaultdict(int)
for d in list_of_dictionaries:
c[d['IP']] += d['VAL']
print c
任何想法都很好 使用模块读取您的文件并汇总每个IP地址的总数:
from collections import Counter
import csv
def read_csv(fn):
with open(fn, 'r') as csvfile:
reader = csv.reader(csvfile, delimiter=',')
reader.next() # Skip header
for row in reader:
ip, bytes = row
yield ip, int(bytes)
totals = Counter()
for ip, bytes in read_csv('data.txt'):
totals[ip] += bytes
print totals
输出:
Counter({'10.120.81.141': 4297, '10.164.132.209': 4168, '10.180.176.61': 3769, '10.164.134.193': 882})
如果您的文件看起来像您提供的示例,则不需要正则表达式来解析它。只需使用逗号拆分行: