Python 2.7 基于用户搜索汇总csv文件中的列
我有以下csv文件: data.cvsPython 2.7 基于用户搜索汇总csv文件中的列,python-2.7,csv,Python 2.7,Csv,我有以下csv文件: data.cvs school,students,teachers,subs us-school1,10,2,0 us-school2,20,4,2 uk-school1,10,2,0 de-school1,10,3,1 de-school1,15,3,3 我正在尝试使用用户搜索学校所在国(美国或英国或德国) 然后对相应的列进行汇总。(例如,美国所有学生的总和-*等) 到目前为止,我能够使用原始输入进行搜索,并显示对应于国家/地区的列内容,如果有人能给我一些如何实现这一点
school,students,teachers,subs
us-school1,10,2,0
us-school2,20,4,2
uk-school1,10,2,0
de-school1,10,3,1
de-school1,15,3,3
我正在尝试使用用户搜索学校所在国(美国或英国或德国)
然后对相应的列进行汇总。(例如,美国所有学生的总和-*等)
到目前为止,我能够使用原始输入进行搜索,并显示对应于国家/地区的列内容,如果有人能给我一些如何实现这一点的建议,我将不胜感激
期望输出:
国家:美国
学生总数:30人
教师总数:6人
总数:2
--
您的问题可以通过以下方式解决:
import csv
search = raw_input('Enter school (e.g. us: ')
with open('data.csv') as csvfile:
reader = csv.DictReader(csvfile)
result_countrys = {}
for row in reader:
students = int(row['students'])
teachers = int(row['teachers'])
subs = int(row['subs'])
subs = row['subs']
country = school[: 2]
if country in result_countrys:
count = result_countrys[country]
count['students'] = count['students'] + students
count['teachers'] = count['teachers'] + teachers
count['subs'] = count['subs'] + subs
else :
dic = {}
dic['students'] = students
dic['teachers'] = teachers
dic['subs'] = subs
result_countrys[country] = dic
for k, v in result_countrys[search].iteritems():
print("country " + str(search) + " has " + str(v) + " " + str(k))
我尝试使用这组值:
reader = [{'school': 'us-school1', 'students': 20, 'teachers': 6, 'subs': 2}, {'school': 'us-school2', 'students': 20, 'teachers': 6, 'subs': 2}, {'school': 'uk-school1', 'students': 20, 'teachers': 6, 'subs': 2}]
结果是:
Enter school (e.g. us): us
country us has 30 students
country us has 6 teachers
country us has 2 subs
这相对容易做到——你所需要的只是一份记录来记录你的国家,然后把所有的价值加在一起:
import collections
import csv
result = {} # store the results
with open("data.csv", "rb") as f: # open our file
reader = csv.DictReader(f) # use csv.DictReader for convenience
for row in reader:
country = row.pop("school")[:2] # get our country
result[country] = result.get(country, collections.defaultdict(int)) # country group
for column in row: # loop through all other columns
result[country][column] += int(row[column]) # add them together
# Now you can use or print your result by country:
for country in result:
print("Country: {}".format(country))
print("Total students: {}".format(result[country].get("students", 0)))
print("Total teachers: {}".format(result[country].get("teachers", 0)))
print("Total subs: {}\n".format(result[country].get("subs", 0)))
这也是通用的,因为您可以添加额外的数字列(例如,
看门人
:D),它将很高兴地将它们相加,但请记住,它仅适用于整数(如果您想要浮点,请将对int
的引用替换为浮点)它期望除了学校以外的每个领域都是一个数字。你可以用熊猫来实现这一点。查找groupby
和aggregate
,思考如果学校与您的数据不匹配,您希望发生什么。谢谢。。除了熊猫,还有其他方法吗?Hi@zig,让我知道这是否解决了问题:),如果解决了,请不要忘记在分数下方左上方的复选框中标记答案。我得到的结果是这样的:美国有1020名学生,注意它正在做字符串连接。。。所以我加了这个,它很有效students=int(row['students'])
我相信通过csv模块的csv输出始终是str类型,与数据无关。所以必须添加int(row['fieldname'])
@zig太好了,我会将它添加到答案中,很抱歉我错过了这一点太棒了,非常有用,我也喜欢这些评论,对我这样的学习者真的很有帮助……)非常好,非常清楚,还有一点:)我喜欢你的工作
import collections
import csv
result = {} # store the results
with open("data.csv", "rb") as f: # open our file
reader = csv.DictReader(f) # use csv.DictReader for convenience
for row in reader:
country = row.pop("school")[:2] # get our country
result[country] = result.get(country, collections.defaultdict(int)) # country group
for column in row: # loop through all other columns
result[country][column] += int(row[column]) # add them together
# Now you can use or print your result by country:
for country in result:
print("Country: {}".format(country))
print("Total students: {}".format(result[country].get("students", 0)))
print("Total teachers: {}".format(result[country].get("teachers", 0)))
print("Total subs: {}\n".format(result[country].get("subs", 0)))