Python 熊猫:分层列和行-使用多索引/集索引、堆栈/取消堆栈、熔化或其他方式?
我在熊猫数据框中有以下数据:Python 熊猫:分层列和行-使用多索引/集索引、堆栈/取消堆栈、熔化或其他方式?,python,pandas,hierarchical-data,Python,Pandas,Hierarchical Data,我在熊猫数据框中有以下数据: high_school_code = [110]*6 + [201]*6 + [360]*6 high_school_name = ['Jefferson High']*6 + ['Venice High']*6 + ['Beverly High']*6 subject_name = (['Math']*2 + ['Biology']*2 +['English']*2)*3 assessment_type = ['SAT', 'GPA']*9 mean_score
high_school_code = [110]*6 + [201]*6 + [360]*6
high_school_name = ['Jefferson High']*6 + ['Venice High']*6 + ['Beverly High']*6
subject_name = (['Math']*2 + ['Biology']*2 +['English']*2)*3
assessment_type = ['SAT', 'GPA']*9
mean_score = [560, 2.9, 620, 3.1, 600, 3.0, 680, 3.4, 590, 3.2, 710, 3.5, 640, 3.3, 570, 3.1, 730, 3.7]
standard_error = [50, 0.21, 60, 0.19, 70, 0.23, 40, 0.34, 30, 0.29, 50, 0.46, 70, 0.42, 60, 0.39, 80, 0.51]
N = [883]*6 + [1106]*6 + [978]*6
column_names = ['High_School_Code', 'High_School_Name', 'Subject_Name', 'Assessment_Type', 'Mean_Score', 'Standard_Error', 'N']
data = list(zip(high_school_code, high_school_name, subject_name, assessment_type, mean_score, standard_error, N))
df = pd.DataFrame(data, columns=column_names)
我尝试过使用Pandas multi-index、set-index、unstack和groupby,但都没有用,非常感谢您的帮助!谢谢大家!
这两个都是很好的例子,其中
.set_index()
和.unstack()
对于重塑非常有用:
In [92]: (
...: df
...: .set_index(['High_School_Code', 'High_School_Name', 'Subject_Name', 'Assessment_Type'])
...: .unstack("Assessment_Type")
...: .swaplevel(axis=1)
...: .sort_index(axis=1, level=0, sort_remaining=False)
...: )
Out[92]:
Assessment_Type GPA SAT
Mean_Score Standard_Error N Mean_Score Standard_Error N
High_School_Code High_School_Name Subject_Name
110 Jefferson High Biology 3.1 0.19 883 620.0 60.0 883
English 3.0 0.23 883 600.0 70.0 883
Math 2.9 0.21 883 560.0 50.0 883
201 Venice High Biology 3.2 0.29 1106 590.0 30.0 1106
English 3.5 0.46 1106 710.0 50.0 1106
Math 3.4 0.34 1106 680.0 40.0 1106
360 Beverly High Biology 3.1 0.39 978 570.0 60.0 978
English 3.7 0.51 978 730.0 80.0 978
Math 3.3 0.42 978 640.0 70.0 978
对于第一个,这里是第二个(列的顺序有点混乱,但您可以将其分配给某个对象,然后重新排序,如果这很重要的话):
你能将你的数据样本粘贴到问题中吗?如果人们不得不将你的图像转换成他们自己的数据帧,这将很难提供帮助。是的!非常感谢。杰出的非常感谢,我真的很感激!
In [110]: (
...: df
...: .set_index(['High_School_Code', 'High_School_Name', 'Subject_Name', 'Assessment_Type'])
...: .unstack(["Assessment_Type", "Subject_Name"])
...: .swaplevel(0, 2, axis=1)
...: .sort_index(axis=1, level=[0,1], sort_remaining=False)
...: )
Out[110]:
Subject_Name Biology ... Math
Assessment_Type GPA SAT ... GPA SAT
Mean_Score Standard_Error N Mean_Score ... N Mean_Score Standard_Error N
High_School_Code High_School_Name ...
110 Jefferson High 3.1 0.19 883 620.0 ... 883 560.0 50.0 883
201 Venice High 3.2 0.29 1106 590.0 ... 1106 680.0 40.0 1106
360 Beverly High 3.1 0.39 978 570.0 ... 978 640.0 70.0 978