csv文件中的Python条件过滤
请帮忙!我尝试过不同的东西/软件包来编写一个程序,该程序接受4个输入,并根据csv文件中的输入组合返回组的写作分数统计数据。这是我的第一个项目,因此我将感谢任何见解/提示/提示 以下是csv示例(共有200行): 以下是我到目前为止的情况:csv文件中的Python条件过滤,python,csv,pandas,Python,Csv,Pandas,请帮忙!我尝试过不同的东西/软件包来编写一个程序,该程序接受4个输入,并根据csv文件中的输入组合返回组的写作分数统计数据。这是我的第一个项目,因此我将感谢任何见解/提示/提示 以下是csv示例(共有200行): 以下是我到目前为止的情况: import csv import numpy csv_file_object=csv.reader(open('scores.csv', 'rU')) #reads file header=csv_file_object.next() #skips hea
import csv
import numpy
csv_file_object=csv.reader(open('scores.csv', 'rU')) #reads file
header=csv_file_object.next() #skips header
data=[] #loads data into array for processing
for row in csv_file_object:
data.append(row)
data=numpy.array(data)
#asks for inputs
gender=raw_input('Enter gender [male/female]: ')
schtyp=raw_input('Enter school type [public/private]: ')
ses=raw_input('Enter socioeconomic status [low/middle/high]: ')
prog=raw_input('Enter program status [general/vocation/academic: ')
#makes them lower case and strings
prog=str(prog.lower())
gender=str(gender.lower())
schtyp=str(schtyp.lower())
ses=str(ses.lower())
我所缺少的是如何过滤并只获取特定组的统计数据。例如,假设我输入男性、公共、中等和学术——我想得到该子集的平均写作分数。我尝试了pandas的groupby函数,但这只能获得广泛组的统计数据(例如公共组和私人组)。我还尝试了pandas的DataFrame,但这只让我过滤了一个输入,不知道如何获得写作分数。任何提示都将不胜感激 看看。我认为它将缩短您的csv解析工作,并提供您所要求的子集功能
import pandas as pd
data = pd.read_csv('fileName.txt', delim_whitespace=True)
#get all of the male students
data[data['gender'] == 'male']
同意这一点,Pandas绝对是一个不错的选择,一旦你习惯了它,它就具有非凡的过滤/设置功能。但要想先了解一下(或者至少对我来说是这样!)是很困难的,所以我从我的一些旧代码中找到了一些您需要的子设置示例。下面的变量itu
是一个数据框,包含不同国家随时间变化的数据
# Subsetting by using True/False:
subset = itu['CntryName'] == 'Albania' # returns True/False values
itu[subset] # returns 1x144 DataFrame of only data for Albania
itu[itu['CntryName'] == 'Albania'] # one-line command, equivalent to the above two lines
# Pandas has many built-in functions like .isin() to provide params to filter on
itu[itu.cntrycode.isin(['USA','FRA'])] # returns where itu['cntrycode'] is 'USA' or 'FRA'
itu[itu.year.isin([2000,2001,2002])] # Returns all of itu for only years 2000-2002
# Advanced subsetting can include logical operations:
itu[itu.cntrycode.isin(['USA','FRA']) & itu.year.isin([2000,2001,2002])] # Both of above at same time
# Use .loc with two elements to simultaneously select by row/index & column:
itu.loc['USA','CntryName']
itu.iloc[204,0]
itu.loc[['USA','BHS'], ['CntryName', 'Year']]
itu.iloc[[204, 13], [0, 1]]
# Can do many operations at once, but this reduces "readability" of the code
itu[itu.cntrycode.isin(['USA','FRA']) &
itu.year.isin([2000,2001,2002])].loc[:, ['cntrycode','cntryname','year','mpen','fpen']]
# Finally, if you're comfortable with using map() and list comprehensions,
you can do some advanced subsetting that includes evaluations & functions
to determine what elements you want to select from the whole, such as all
countries whose name begins with "United":
criterion = itu['CntryName'].map(lambda x: x.startswith('United'))
itu[criterion]['CntryName'] # gives us UAE, UK, & US
从这篇文章开始读一读,看看你是怎么做的,基本上你所问的都可以做到,就像一个数据框架中多列布尔索引的典型案例一样。你能试试下面列出的方法吗?谢谢你!成功了。感谢您在我刚刚开始学习本课程时给我一些重要提示:)
# Subsetting by using True/False:
subset = itu['CntryName'] == 'Albania' # returns True/False values
itu[subset] # returns 1x144 DataFrame of only data for Albania
itu[itu['CntryName'] == 'Albania'] # one-line command, equivalent to the above two lines
# Pandas has many built-in functions like .isin() to provide params to filter on
itu[itu.cntrycode.isin(['USA','FRA'])] # returns where itu['cntrycode'] is 'USA' or 'FRA'
itu[itu.year.isin([2000,2001,2002])] # Returns all of itu for only years 2000-2002
# Advanced subsetting can include logical operations:
itu[itu.cntrycode.isin(['USA','FRA']) & itu.year.isin([2000,2001,2002])] # Both of above at same time
# Use .loc with two elements to simultaneously select by row/index & column:
itu.loc['USA','CntryName']
itu.iloc[204,0]
itu.loc[['USA','BHS'], ['CntryName', 'Year']]
itu.iloc[[204, 13], [0, 1]]
# Can do many operations at once, but this reduces "readability" of the code
itu[itu.cntrycode.isin(['USA','FRA']) &
itu.year.isin([2000,2001,2002])].loc[:, ['cntrycode','cntryname','year','mpen','fpen']]
# Finally, if you're comfortable with using map() and list comprehensions,
you can do some advanced subsetting that includes evaluations & functions
to determine what elements you want to select from the whole, such as all
countries whose name begins with "United":
criterion = itu['CntryName'].map(lambda x: x.startswith('United'))
itu[criterion]['CntryName'] # gives us UAE, UK, & US