Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/file/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 解析CSV文件并聚合值_Python_File - Fatal编程技术网

Python 解析CSV文件并聚合值

Python 解析CSV文件并聚合值,python,file,Python,File,我想解析一个CSV文件并聚合这些值。城市行具有重复值(示例): 解析后,结果应该类似于: CITY, AMOUNT London,75 Tokyo,45 New York,25 我编写了以下代码来提取唯一的城市名称: def main(): contrib_data = list(csv.DictReader(open('contributions.csv','rU'))) combined = [] for row in contrib_data: if

我想解析一个CSV文件并聚合这些值。城市行具有重复值(示例):

解析后,结果应该类似于:

CITY, AMOUNT
London,75
Tokyo,45
New York,25
我编写了以下代码来提取唯一的城市名称:

def main():
    contrib_data = list(csv.DictReader(open('contributions.csv','rU')))
    combined = []
    for row in contrib_data:
      if row['OFFICE'] not in combined:
        combined.append(row['OFFICE'])

然后如何聚合值?

在Python 3.2.2中测试:

import csv
from collections import defaultdict
reader = csv.DictReader(open('test.csv', newline=''))
cities = defaultdict(int)
for row in reader:
    cities[row["CITY"]] += int(row["AMOUNT"])

writer = csv.writer(open('out.csv', 'w', newline = ''))
writer.writerow(["CITY", "AMOUNT"])
writer.writerows([city, cities[city]] for city in cities)
结果:

CITY,AMOUNT
New York,25
London,75
Tokyo,45
至于你增加的要求:

import csv
from collections import defaultdict

def default_factory():
    return [0, None, None, 0]

reader = csv.DictReader(open('test.csv', newline=''))
cities = defaultdict(default_factory)
for row in reader:
    amount = int(row["AMOUNT"])
    cities[row["CITY"]][0] += amount
    max = cities[row["CITY"]][1]
    cities[row["CITY"]][1] = amount if max is None else amount if amount > max else max
    min = cities[row["CITY"]][2]
    cities[row["CITY"]][2] = amount if min is None else amount if amount < min else min
    cities[row["CITY"]][3] += 1
for city in cities:
    cities[city][3] = cities[city][0]/cities[city][3] # calculate mean

writer = csv.writer(open('out.csv', 'w', newline = ''))
writer.writerow(["CITY", "AMOUNT", "max", "min", "mean"])
writer.writerows([city] + cities[city] for city in cities)

请注意,在Python2下,您需要在顶部的uuu future uuuu import division中添加一行
,以获得正确的结果。

使用一个dict,其值与金额相同,这可能会起到作用。类似于下面的内容-

假设您一次读取一行,
city
表示当前城市,
amount
表示当前金额-

main_dict = {}

---for loop here---
if city in main_dict:
    main_dict[city] = main_dict[city] + amount
else:
    main_dict[city] = amount
---end for loop---

循环结束时,您将在
main\u dict

中获得聚合值提示:使用字典而不是列表。城市作为键,总和(金额)作为值当我尝试这种方法时,我总是得到一个键错误。不应该是。你能在这里展示你的代码的关键部分吗?在Python2.7上进行了测试,结果很好。我想知道为什么常规Dict不起作用,为什么我必须使用defaultdict()?
defaultdict(int)
允许您使用尚未定义的键,并自动为它们指定值
0
。因此,您只需执行
cities[“Bolton”]+=10
,它将创建一个值为
10
的新键
“Bolton”
,或者如果该键已经存在,则将
10
添加到该值中。如果你用一个普通的
dict
,你会得到很多
KeyError
s。谢谢你的反馈。是否可能在相同的for.loop中获得最大值、最小值和平均值。您不能以同样的方式使用
defaultdict
,因为现在您必须存储的不仅仅是单个整数,而是整个列表。
CITY,AMOUNT,max,min,mean
New York,25,25,25,25.0
London,75,55,20,37.5
Tokyo,45,45,45,45.0
main_dict = {}

---for loop here---
if city in main_dict:
    main_dict[city] = main_dict[city] + amount
else:
    main_dict[city] = amount
---end for loop---