Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/319.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何在python中对多个列使用groupby?_Python_Pandas_Group By_Pivot_Aggregate - Fatal编程技术网

如何在python中对多个列使用groupby?

如何在python中对多个列使用groupby?,python,pandas,group-by,pivot,aggregate,Python,Pandas,Group By,Pivot,Aggregate,具有具有以下值的df: name numb exam marks tom 2546 math 25 tom 2546 science 25 tom 2546 env 25 mark 2547 math 15 mark 2547 env 10 sam

具有具有以下值的df:

name   numb         exam       marks   

tom     2546        math         25     

tom     2546        science      25 

tom     2546        env         25 

mark    2547        math        15 

mark    2547        env         10


sam    2548         env         18

如何使用groupby和form值

name   numb       total_exams_attended       total_maths_exam_attended  total_marks_scored_in_maths  total_marks_scored
 
tom    2546           3                               1                       25                          75
mark   2547           2                               1                       15                          25
sam    2548           1                               0                                                   18                          
我试过这个:

df=df.groupby(['name']).agg({'total_exams_attended': 'count','total_marks_scored': lambda x: sum(x == True)})




但在数学专栏中,却陷入了总分数的困境。如何仅对特定列值(如数学)执行groupby/聚合此处

考虑
pivot\u table
,由于层次结构和聚合名称,使用一些列名操作:

pivot_df = df.pivot_table(index='name', columns='exam', values='marks', aggfunc=['count', 'sum'], 
                          margins=True, margins_name='total')

pivot_df.columns = [i+'_'+j.replace('count', 'exams_attended').replace('sum', 'marks_scored') 
                            for i, j in zip(pivot_df.columns.get_level_values(1),
                                            pivot_df.columns.get_level_values(0))]
输出

pivot_df
#        env_exams_attended  math_exams_attended  science_exams_attended  total_exams_attended  env_marks_scored  math_marks_scored  science_marks_scored  total_marks_scored
# name
# mark                  1.0                  1.0                     0.0                     2              10.0               15.0                   0.0                  25
# sam                   1.0                  0.0                     0.0                     1              18.0                0.0                   0.0                  18
# tom                   1.0                  1.0                     1.0                     3              25.0               25.0                  25.0                  75
# total                 3.0                  2.0                     1.0                     6              53.0               40.0                  25.0                 118
如果需要向下过滤到数学和总计列,请使用
.loc

math_pvt_df = pivot_df.loc[df['name'].unique(),
                           ["math_exams_attended", "total_exams_attended", 
                            "math_marks_scored", "total_marks_scored"]]

math_pvt_df
#       math_exams_attended  total_exams_attended  math_marks_scored  total_marks_scored
# name
# mark                  1.0                     2               15.0                  25
# sam                   0.0                     1                0.0                  18
# tom                   1.0                     3               25.0                  75