Python 解析CSV文件并聚合值
我想解析一个CSV文件并聚合这些值。城市行具有重复值(示例): 解析后,结果应该类似于:Python 解析CSV文件并聚合值,python,file,Python,File,我想解析一个CSV文件并聚合这些值。城市行具有重复值(示例): 解析后,结果应该类似于: CITY, AMOUNT London,75 Tokyo,45 New York,25 我编写了以下代码来提取唯一的城市名称: def main(): contrib_data = list(csv.DictReader(open('contributions.csv','rU'))) combined = [] for row in contrib_data: if
CITY, AMOUNT
London,75
Tokyo,45
New York,25
我编写了以下代码来提取唯一的城市名称:
def main():
contrib_data = list(csv.DictReader(open('contributions.csv','rU')))
combined = []
for row in contrib_data:
if row['OFFICE'] not in combined:
combined.append(row['OFFICE'])
然后如何聚合值?在Python 3.2.2中测试:
import csv
from collections import defaultdict
reader = csv.DictReader(open('test.csv', newline=''))
cities = defaultdict(int)
for row in reader:
cities[row["CITY"]] += int(row["AMOUNT"])
writer = csv.writer(open('out.csv', 'w', newline = ''))
writer.writerow(["CITY", "AMOUNT"])
writer.writerows([city, cities[city]] for city in cities)
结果:
CITY,AMOUNT
New York,25
London,75
Tokyo,45
至于你增加的要求:
import csv
from collections import defaultdict
def default_factory():
return [0, None, None, 0]
reader = csv.DictReader(open('test.csv', newline=''))
cities = defaultdict(default_factory)
for row in reader:
amount = int(row["AMOUNT"])
cities[row["CITY"]][0] += amount
max = cities[row["CITY"]][1]
cities[row["CITY"]][1] = amount if max is None else amount if amount > max else max
min = cities[row["CITY"]][2]
cities[row["CITY"]][2] = amount if min is None else amount if amount < min else min
cities[row["CITY"]][3] += 1
for city in cities:
cities[city][3] = cities[city][0]/cities[city][3] # calculate mean
writer = csv.writer(open('out.csv', 'w', newline = ''))
writer.writerow(["CITY", "AMOUNT", "max", "min", "mean"])
writer.writerows([city] + cities[city] for city in cities)
请注意,在Python2下,您需要在顶部的uuu future uuuu import division中添加一行
,以获得正确的结果。使用一个dict,其值与金额相同,这可能会起到作用。类似于下面的内容-
假设您一次读取一行,city
表示当前城市,amount
表示当前金额-
main_dict = {}
---for loop here---
if city in main_dict:
main_dict[city] = main_dict[city] + amount
else:
main_dict[city] = amount
---end for loop---
循环结束时,您将在main\u dict
中获得聚合值提示:使用字典而不是列表。城市作为键,总和(金额)作为值当我尝试这种方法时,我总是得到一个键错误。不应该是。你能在这里展示你的代码的关键部分吗?在Python2.7上进行了测试,结果很好。我想知道为什么常规Dict不起作用,为什么我必须使用defaultdict()?defaultdict(int)
允许您使用尚未定义的键,并自动为它们指定值0
。因此,您只需执行cities[“Bolton”]+=10
,它将创建一个值为10
的新键“Bolton”
,或者如果该键已经存在,则将10
添加到该值中。如果你用一个普通的dict
,你会得到很多KeyError
s。谢谢你的反馈。是否可能在相同的for.loop中获得最大值、最小值和平均值。您不能以同样的方式使用defaultdict
,因为现在您必须存储的不仅仅是单个整数,而是整个列表。
CITY,AMOUNT,max,min,mean
New York,25,25,25,25.0
London,75,55,20,37.5
Tokyo,45,45,45,45.0
main_dict = {}
---for loop here---
if city in main_dict:
main_dict[city] = main_dict[city] + amount
else:
main_dict[city] = amount
---end for loop---