如何在python中对多个列使用groupby?
具有具有以下值的df:如何在python中对多个列使用groupby?,python,pandas,group-by,pivot,aggregate,Python,Pandas,Group By,Pivot,Aggregate,具有具有以下值的df: name numb exam marks tom 2546 math 25 tom 2546 science 25 tom 2546 env 25 mark 2547 math 15 mark 2547 env 10 sam
name numb exam marks
tom 2546 math 25
tom 2546 science 25
tom 2546 env 25
mark 2547 math 15
mark 2547 env 10
sam 2548 env 18
如何使用groupby和form值
name numb total_exams_attended total_maths_exam_attended total_marks_scored_in_maths total_marks_scored
tom 2546 3 1 25 75
mark 2547 2 1 15 25
sam 2548 1 0 18
我试过这个:
df=df.groupby(['name']).agg({'total_exams_attended': 'count','total_marks_scored': lambda x: sum(x == True)})
但在数学专栏中,却陷入了总分数的困境。如何仅对特定列值(如数学)执行groupby/聚合此处考虑
pivot\u table
,由于层次结构和聚合名称,使用一些列名操作:
pivot_df = df.pivot_table(index='name', columns='exam', values='marks', aggfunc=['count', 'sum'],
margins=True, margins_name='total')
pivot_df.columns = [i+'_'+j.replace('count', 'exams_attended').replace('sum', 'marks_scored')
for i, j in zip(pivot_df.columns.get_level_values(1),
pivot_df.columns.get_level_values(0))]
输出
pivot_df
# env_exams_attended math_exams_attended science_exams_attended total_exams_attended env_marks_scored math_marks_scored science_marks_scored total_marks_scored
# name
# mark 1.0 1.0 0.0 2 10.0 15.0 0.0 25
# sam 1.0 0.0 0.0 1 18.0 0.0 0.0 18
# tom 1.0 1.0 1.0 3 25.0 25.0 25.0 75
# total 3.0 2.0 1.0 6 53.0 40.0 25.0 118
如果需要向下过滤到数学和总计列,请使用.loc
:
math_pvt_df = pivot_df.loc[df['name'].unique(),
["math_exams_attended", "total_exams_attended",
"math_marks_scored", "total_marks_scored"]]
math_pvt_df
# math_exams_attended total_exams_attended math_marks_scored total_marks_scored
# name
# mark 1.0 2 15.0 25
# sam 0.0 1 0.0 18
# tom 1.0 3 25.0 75