Python 如何将数据帧的一列转换为列标题,并将其余的转换为长格式?
我有三个类似格式的数据帧。每一个都来自一个熊猫群组,根据不同来源的数据Python 如何将数据帧的一列转换为列标题,并将其余的转换为长格式?,python,pandas,dataframe,merge,reshape,Python,Pandas,Dataframe,Merge,Reshape,我有三个类似格式的数据帧。每一个都来自一个熊猫群组,根据不同来源的数据 df_17 = pd.DataFrame( [['Students',550, 75, 325, 100, 2017], ['Staff',10, 3, 7, 6, 2017], ['Teachers',21, 8, 16, 13, 2017]], columns = ['Category', 'Main', 'Pre-K', 'North', 'Downtown', 'Year']).set_inde
df_17 = pd.DataFrame(
[['Students',550, 75, 325, 100, 2017], ['Staff',10, 3, 7, 6, 2017], ['Teachers',21, 8, 16, 13, 2017]],
columns = ['Category', 'Main', 'Pre-K', 'North', 'Downtown', 'Year']).set_index('Category')
df_18 = pd.DataFrame(
[['Students',565, 70, 321, 2018], ['Staff',11, 3, 6, 2018], ['Teachers',22, 8, 17, 2018]],
columns = ['Category', 'Main', 'Pre-K', 'North', 'Year']).set_index('Category')
df_19 = pd.DataFrame(
[['Students',610, 75, 12, 110, 2019], ['Staff',10, 4, 0, 6, 2019], ['Teachers',24, 9, 1, 16, 2019]],
columns = ['Category', 'Main', 'Pre-K', 'Park', 'Downtown', 'Year']).set_index('Category')
df_17
Main Pre-K North Downtown Year
Category
Students 550 75 325 100 2017
Staff 10 3 7 6 2017
Teachers 21 8 16 13 2017
df_18
Main Pre-K North Year
Category
Students 565 70 321 2018
Staff 11 3 6 2018
Teachers 22 8 17 2018
df_19
Main Pre-K Park Downtown Year
Category
Students 610 75 12 110 2019
Staff 10 4 0 6 2019
Teachers 24 9 1 16 2019
我想把它们合并成一个长格式的数据帧,每年有不同的列。像这样的
Category Campus 2017 2018 2019
0 Students Main 550 565 610
1 Students Pre-K 75 70 75
2 Students North 325 321 NaN
3 Students Downtown 100 NaN 110
4 Students Park NaN NaN 12
5 Staff Main 10 11 10
6 Staff Pre-K 3 3 4
7 Staff North 7 6 NaN
8 Staff Downtown 6 NaN 6
9 Staff Park NaN NaN 0
10 Teachers Main 21 22 24
11 Teachers Pre-K 8 8 9
12 Teachers North 16 17 NaN
13 Teachers Downtown 13 NaN 16
14 Teachers Park NaN NaN 1
我尝试过合并、融化、堆叠、取消堆叠、透视等多种组合,但都没能找到正确的组合
到目前为止,最接近的是:
df = pd.merge(df_17, df_18, on = ['Category', 'Main', 'Pre-K', 'North', 'Year'], how = 'outer')
df = pd.merge(df, df_19, on = ['Category', 'Main', 'Pre-K', 'Downtown'], how = 'outer')
df = df.stack()
Category
Students Main 550.0
Pre-K 75.0
North 325.0
Downtown 100.0
Year_x 2017.0
Staff Main 10.0
Pre-K 3.0
North 7.0
Downtown 6.0
Year_x 2017.0
Teachers Main 21.0
Pre-K 8.0
North 16.0
Downtown 13.0
Year_x 2017.0
Students Main 565.0
Pre-K 70.0
North 321.0
Year_x 2018.0
Staff Main 11.0
Pre-K 3.0
North 6.0
Year_x 2018.0
Teachers Main 22.0
Pre-K 8.0
North 17.0
Year_x 2018.0
Students Main 610.0
Pre-K 75.0
Downtown 110.0
Park 12.0
Year_y 2019.0
Staff Main 10.0
Pre-K 4.0
Downtown 6.0
Park 0.0
Year_y 2019.0
Teachers Main 24.0
Pre-K 9.0
Downtown 16.0
Park 1.0
Year_y 2019.0
dtype: float64
我遗漏了什么?您可以
pd.concat
数据帧,熔化它们,并使用.pivot\u table
df = pd.concat([df_17,df_18,df_19]).reset_index()
df = pd.melt(df, id_vars=['Category', 'Year'], var_name = 'Campus') \
.pivot_table(index=['Category', 'Campus'], columns='Year', values='value') \
.reset_index()
df.columns.name = None #This just cleans up the index name
df
输出:
Category Campus 2017 2018 2019
0 Staff Downtown 6.0 NaN 6.0
1 Staff Main 10.0 11.0 10.0
2 Staff North 7.0 6.0 NaN
3 Staff Park NaN NaN 0.0
4 Staff Pre-K 3.0 3.0 4.0
5 Students Downtown 100.0 NaN 110.0
6 Students Main 550.0 565.0 610.0
7 Students North 325.0 321.0 NaN
8 Students Park NaN NaN 12.0
9 Students Pre-K 75.0 70.0 75.0
10 Teachers Downtown 13.0 NaN 16.0
11 Teachers Main 21.0 22.0 24.0
12 Teachers North 16.0 17.0 NaN
13 Teachers Park NaN NaN 1.0
14 Teachers Pre-K 8.0 8.0 9.0
非常感谢。这通常是有效的,我只是没有达到目的。在我实际的(稍大一点的)应用程序中,我最终得到了额外的“索引”行。我毫不费力地把它们过滤掉了,但我很好奇为什么会发生这种情况。@kepr这可能与pd.melt()
以及您的索引是什么有关。我不能不看实际数据就说出来,但我很高兴过滤掉这些数据就解决了问题。