python中多列的长到宽格式
我有一个学生考试数据集,如下所示python中多列的长到宽格式,python,python-2.7,pandas,Python,Python 2.7,Pandas,我有一个学生考试数据集,如下所示 userid grade examid subject numberofcorrectanswers numberofwronganswers 4 5 8 Synonyms NULL NULL 4 5 8 Sentence NULL
userid grade examid subject numberofcorrectanswers numberofwronganswers
4 5 8 Synonyms NULL NULL
4 5 8 Sentence NULL NULL
4 5 8 Whole Numbers 6 15
4 5 8 Decimals 4 10
5 5 9 Synonyms NULL NULL
5 5 9 Sentence NULL NULL
5 5 9 Whole Numbers 5 12
5 5 9 Decimals 3 1
我想把这个长格式转换成宽格式,这样我就可以把数据
userid grade examid Synonyms_numberofcorrectanswers Synonyms_numberofwronganswers Sentence_numberofcorrectanswers Sentence_numberofwronganswers Whole_numbers_numberofcorrectanswers Whole_numbers_numberofwronganswers Decimals_numberofcorrectanswers Decimals_numberofwronganswers
4 5 8 NULL NULL NULL NULL 6 15 4 10
5 5 9 NULL NULL NULL NULL 5 12 3 1
以下是我的努力,
data_subset.set_index(['userid', 'grade','examid','subject']).unstack('subject').reset_index()
但这不是一个单一的平面数据帧。它内部有几个层次结构。有人能帮我把它做成一个平面数据框吗
谢谢诸如此类的事
>>> df.groupby(['userid', 'grade','examid','subject']).sum().unstack('subject')
numberofcorrectanswers numberofwronganswers
subject Decimals Sentence Synonyms Whole Numbers Decimals Sentence Synonyms Whole Numbers
userid grade examid
4 5 8 4 NaN NaN 6 10 NaN NaN 15
5 5 9 3 NaN NaN 5 1 NaN NaN 12
我将扩展亚历山大的答案。说我们有
df2 = df.groupby(['userid', 'grade','examid','subject']).sum().unstack('subject')
我们将两级列索引的名称作为带有df2.columns.get_values()
的两元组列表。要将其展平并合并名称,请执行以下操作:
new_col_names = ['_'.join((b,a)) for a,b in df2.columns.get_values()]
df2.columns = new_col_names
如有需要:
- 对列进行排序:例如
df2.reindex(columns=sorted(df2.columns))
- 要将
等设置为列而不是多索引:userid
df2.reset\u index()