csv文件中的Python条件过滤_Python_Csv_Pandas

csv文件中的Python条件过滤

python csv pandas

csv文件中的Python条件过滤,python,csv,pandas,Python,Csv,Pandas,请帮忙！我尝试过不同的东西/软件包来编写一个程序，该程序接受4个输入，并根据csv文件中的输入组合返回组的写作分数统计数据。这是我的第一个项目，因此我将感谢任何见解/提示/提示以下是csv示例（共有200行）：以下是我到目前为止的情况： import csv import numpy csv_file_object=csv.reader(open('scores.csv', 'rU')) #reads file header=csv_file_object.next() #skips hea

请帮忙！我尝试过不同的东西/软件包来编写一个程序，该程序接受4个输入，并根据csv文件中的输入组合返回组的写作分数统计数据。这是我的第一个项目，因此我将感谢任何见解/提示/提示

以下是csv示例（共有200行）：

以下是我到目前为止的情况：

import csv
import numpy
csv_file_object=csv.reader(open('scores.csv', 'rU')) #reads file
header=csv_file_object.next() #skips header
data=[] #loads data into array for processing
for row in csv_file_object:
    data.append(row)
data=numpy.array(data)

#asks for inputs 
gender=raw_input('Enter gender [male/female]: ')
schtyp=raw_input('Enter school type [public/private]: ')
ses=raw_input('Enter socioeconomic status [low/middle/high]: ')
prog=raw_input('Enter program status [general/vocation/academic: ')

#makes them lower case and strings
prog=str(prog.lower())
gender=str(gender.lower())
schtyp=str(schtyp.lower())
ses=str(ses.lower())

我所缺少的是如何过滤并只获取特定组的统计数据。例如，假设我输入男性、公共、中等和学术——我想得到该子集的平均写作分数。我尝试了pandas的groupby函数，但这只能获得广泛组的统计数据（例如公共组和私人组）。我还尝试了pandas的DataFrame，但这只让我过滤了一个输入，不知道如何获得写作分数。任何提示都将不胜感激

看看。我认为它将缩短您的csv解析工作，并提供您所要求的子集功能

import pandas as pd
data = pd.read_csv('fileName.txt', delim_whitespace=True)

#get all of the male students
data[data['gender'] == 'male']

同意这一点，Pandas绝对是一个不错的选择，一旦你习惯了它，它就具有非凡的过滤/设置功能。但要想先了解一下（或者至少对我来说是这样！）是很困难的，所以我从我的一些旧代码中找到了一些您需要的子设置示例。下面的变量

itu

是一个数据框，包含不同国家随时间变化的数据

# Subsetting by using True/False:
subset = itu['CntryName'] == 'Albania'  # returns True/False values
itu[subset]  # returns 1x144 DataFrame of only data for Albania
itu[itu['CntryName'] == 'Albania']  # one-line command, equivalent to the above two lines

# Pandas has many built-in functions like .isin() to provide params to filter on    
itu[itu.cntrycode.isin(['USA','FRA'])]  # returns where itu['cntrycode'] is 'USA' or 'FRA'
itu[itu.year.isin([2000,2001,2002])]  # Returns all of itu for only years 2000-2002
# Advanced subsetting can include logical operations:
itu[itu.cntrycode.isin(['USA','FRA']) & itu.year.isin([2000,2001,2002])]  # Both of above at same time

# Use .loc with two elements to simultaneously select by row/index & column:
itu.loc['USA','CntryName']
itu.iloc[204,0]
itu.loc[['USA','BHS'], ['CntryName', 'Year']]
itu.iloc[[204, 13], [0, 1]]

# Can do many operations at once, but this reduces "readability" of the code
itu[itu.cntrycode.isin(['USA','FRA']) & 
    itu.year.isin([2000,2001,2002])].loc[:, ['cntrycode','cntryname','year','mpen','fpen']]

# Finally, if you're comfortable with using map() and list comprehensions, 
you can do some advanced subsetting that includes evaluations & functions 
to determine what elements you want to select from the whole, such as all 
countries whose name begins with "United":
criterion = itu['CntryName'].map(lambda x: x.startswith('United'))
itu[criterion]['CntryName']  # gives us UAE, UK, & US

从这篇文章开始读一读，看看你是怎么做的，基本上你所问的都可以做到，就像一个数据框架中多列布尔索引的典型案例一样。你能试试下面列出的方法吗？谢谢你！成功了。感谢您在我刚刚开始学习本课程时给我一些重要提示：）

# Subsetting by using True/False:
subset = itu['CntryName'] == 'Albania'  # returns True/False values
itu[subset]  # returns 1x144 DataFrame of only data for Albania
itu[itu['CntryName'] == 'Albania']  # one-line command, equivalent to the above two lines

# Pandas has many built-in functions like .isin() to provide params to filter on    
itu[itu.cntrycode.isin(['USA','FRA'])]  # returns where itu['cntrycode'] is 'USA' or 'FRA'
itu[itu.year.isin([2000,2001,2002])]  # Returns all of itu for only years 2000-2002
# Advanced subsetting can include logical operations:
itu[itu.cntrycode.isin(['USA','FRA']) & itu.year.isin([2000,2001,2002])]  # Both of above at same time

# Use .loc with two elements to simultaneously select by row/index & column:
itu.loc['USA','CntryName']
itu.iloc[204,0]
itu.loc[['USA','BHS'], ['CntryName', 'Year']]
itu.iloc[[204, 13], [0, 1]]

# Can do many operations at once, but this reduces "readability" of the code
itu[itu.cntrycode.isin(['USA','FRA']) & 
    itu.year.isin([2000,2001,2002])].loc[:, ['cntrycode','cntryname','year','mpen','fpen']]

# Finally, if you're comfortable with using map() and list comprehensions, 
you can do some advanced subsetting that includes evaluations & functions 
to determine what elements you want to select from the whole, such as all 
countries whose name begins with "United":
criterion = itu['CntryName'].map(lambda x: x.startswith('United'))
itu[criterion]['CntryName']  # gives us UAE, UK, & US