Python 熊猫:分层列和行-使用多索引/集索引、堆栈/取消堆栈、熔化或其他方式?

Python 熊猫:分层列和行-使用多索引/集索引、堆栈/取消堆栈、熔化或其他方式?,python,pandas,hierarchical-data,Python,Pandas,Hierarchical Data,我在熊猫数据框中有以下数据: high_school_code = [110]*6 + [201]*6 + [360]*6 high_school_name = ['Jefferson High']*6 + ['Venice High']*6 + ['Beverly High']*6 subject_name = (['Math']*2 + ['Biology']*2 +['English']*2)*3 assessment_type = ['SAT', 'GPA']*9 mean_score

我在熊猫数据框中有以下数据:

high_school_code = [110]*6 + [201]*6 + [360]*6
high_school_name = ['Jefferson High']*6 + ['Venice High']*6 + ['Beverly High']*6
subject_name = (['Math']*2 + ['Biology']*2 +['English']*2)*3
assessment_type = ['SAT', 'GPA']*9
mean_score = [560, 2.9, 620, 3.1, 600, 3.0, 680, 3.4, 590, 3.2, 710, 3.5, 640, 3.3, 570, 3.1, 730, 3.7]
standard_error = [50, 0.21, 60, 0.19, 70, 0.23, 40, 0.34, 30, 0.29, 50, 0.46, 70, 0.42, 60, 0.39, 80, 0.51]
N = [883]*6 + [1106]*6 + [978]*6
column_names = ['High_School_Code', 'High_School_Name', 'Subject_Name', 'Assessment_Type', 'Mean_Score', 'Standard_Error', 'N']

data = list(zip(high_school_code, high_school_name, subject_name, assessment_type, mean_score, standard_error, N))

df = pd.DataFrame(data, columns=column_names)

我尝试过使用Pandas multi-index、set-index、unstack和groupby,但都没有用,非常感谢您的帮助!谢谢大家!


这两个都是很好的例子,其中
.set_index()
.unstack()
对于重塑非常有用:

In [92]: (
    ...:     df
    ...:     .set_index(['High_School_Code', 'High_School_Name', 'Subject_Name', 'Assessment_Type'])
    ...:     .unstack("Assessment_Type")
    ...:     .swaplevel(axis=1)
    ...:     .sort_index(axis=1, level=0, sort_remaining=False)
    ...: )
Out[92]:
Assessment_Type                                       GPA                             SAT
                                               Mean_Score Standard_Error     N Mean_Score Standard_Error     N
High_School_Code High_School_Name Subject_Name
110              Jefferson High   Biology             3.1           0.19   883      620.0           60.0   883
                                  English             3.0           0.23   883      600.0           70.0   883
                                  Math                2.9           0.21   883      560.0           50.0   883
201              Venice High      Biology             3.2           0.29  1106      590.0           30.0  1106
                                  English             3.5           0.46  1106      710.0           50.0  1106
                                  Math                3.4           0.34  1106      680.0           40.0  1106
360              Beverly High     Biology             3.1           0.39   978      570.0           60.0   978
                                  English             3.7           0.51   978      730.0           80.0   978
                                  Math                3.3           0.42   978      640.0           70.0   978
对于第一个,这里是第二个(列的顺序有点混乱,但您可以将其分配给某个对象,然后重新排序,如果这很重要的话):


你能将你的数据样本粘贴到问题中吗?如果人们不得不将你的图像转换成他们自己的数据帧,这将很难提供帮助。是的!非常感谢。杰出的非常感谢,我真的很感激!
In [110]: (
     ...:     df
     ...:     .set_index(['High_School_Code', 'High_School_Name', 'Subject_Name', 'Assessment_Type'])
     ...:     .unstack(["Assessment_Type", "Subject_Name"])
     ...:     .swaplevel(0, 2, axis=1)
     ...:     .sort_index(axis=1, level=[0,1], sort_remaining=False)
     ...: )
Out[110]:
Subject_Name                         Biology                                  ...  Math
Assessment_Type                          GPA                             SAT  ...   GPA        SAT
                                  Mean_Score Standard_Error     N Mean_Score  ...     N Mean_Score Standard_Error     N
High_School_Code High_School_Name                                             ...
110              Jefferson High          3.1           0.19   883      620.0  ...   883      560.0           50.0   883
201              Venice High             3.2           0.29  1106      590.0  ...  1106      680.0           40.0  1106
360              Beverly High            3.1           0.39   978      570.0  ...   978      640.0           70.0   978