Pandas 折叠数据帧的重复行_Pandas_Python 3.8

Pandas 折叠数据帧的重复行

pandas

Pandas 折叠数据帧的重复行,pandas,python-3.8,Pandas,Python 3.8,我有数据帧df，如下所示： Col1 Col2 Col3 StartDate EndDate Qty 24HR A1 B1 1/1/2020 1/31/2020 4.2 24HR A1 B1 2/1/2020 2/29/2020 11 asd A2 B2 2/1/2020 2/29/2020 35 asd A2 B2 3/1/20

我有数据帧

df

，如下所示：

Col1    Col2    Col3    StartDate   EndDate     Qty
24HR    A1      B1      1/1/2020    1/31/2020   4.2
24HR    A1      B1      2/1/2020    2/29/2020   11
asd     A2      B2      2/1/2020    2/29/2020   35
asd     A2      B2      3/1/2020    3/31/2020   23
asd     A2      B2      4/1/2020    4/30/2020   35
asd     A2      B2      5/1/2020    5/31/2020   46

Col1    Col2    Col3    StartDate   EndDate     Jan  Feb    Mar  Apr    May
24HR    A1      B1      1/1/2020    2/29/2020   4.2  11         
asd     A2      B2      2/1/2020    5/31/2020        35     23    35    46

df['MnthName'] = df['StartDate'].dt.strftime('%b')
df = df.pivot_table(index=['Col1', 'Col2', 'Col3'], values='Qty', columns='MnthName')

我需要根据

Col1、Col2、Col3

中的重复来折叠行，以获得以下内容：

Col1    Col2    Col3    StartDate   EndDate     Qty
24HR    A1      B1      1/1/2020    1/31/2020   4.2
24HR    A1      B1      2/1/2020    2/29/2020   11
asd     A2      B2      2/1/2020    2/29/2020   35
asd     A2      B2      3/1/2020    3/31/2020   23
asd     A2      B2      4/1/2020    4/30/2020   35
asd     A2      B2      5/1/2020    5/31/2020   46

Col1    Col2    Col3    StartDate   EndDate     Jan  Feb    Mar  Apr    May
24HR    A1      B1      1/1/2020    2/29/2020   4.2  11         
asd     A2      B2      2/1/2020    5/31/2020        35     23    35    46

df['MnthName'] = df['StartDate'].dt.strftime('%b')
df = df.pivot_table(index=['Col1', 'Col2', 'Col3'], values='Qty', columns='MnthName')

上面的

StartDate

和

EndDate

是所有列的最小值和最大值。i、 e.对于值为

24小时、A1、B1

的列，最小

StartDate

为

1/1/2020

，最大

EndDate

为

2/29/2020

我尝试了以下方法：

Col1    Col2    Col3    StartDate   EndDate     Qty
24HR    A1      B1      1/1/2020    1/31/2020   4.2
24HR    A1      B1      2/1/2020    2/29/2020   11
asd     A2      B2      2/1/2020    2/29/2020   35
asd     A2      B2      3/1/2020    3/31/2020   23
asd     A2      B2      4/1/2020    4/30/2020   35
asd     A2      B2      5/1/2020    5/31/2020   46

Col1    Col2    Col3    StartDate   EndDate     Jan  Feb    Mar  Apr    May
24HR    A1      B1      1/1/2020    2/29/2020   4.2  11         
asd     A2      B2      2/1/2020    5/31/2020        35     23    35    46

df['MnthName'] = df['StartDate'].dt.strftime('%b')
df = df.pivot_table(index=['Col1', 'Col2', 'Col3'], values='Qty', columns='MnthName')

但我不知道如何将其分组，以便为

Col1、Col2、Col3

唯一对中的每一对选择

StartDate

的最小值和

EndDate的最大值。
我们可以将pivot
和agg
然后concat
将它们组合在一起
s1=df.pivot_table(index=['Col1','Col2','Col3'],columns='StartDate',values='Qty')

s2=df.groupby(['Col1','Col2','Col3']).agg({'StartDate':'first','EndDate':'last'})
s1.columns=pd.to_datetime(s1.columns,dayfirst=False).strftime('%b')
s=pd.concat([s2,s1],axis=1).reset_index()
s
   Col1 Col2 Col3 StartDate    EndDate  Jan   Feb   Mar   Apr   May
0  24HR   A1   B1  1/1/2020  2/28/2020  4.2  11.0   NaN   NaN   NaN
1   asd   A2   B2  2/1/2020  5/31/2020  NaN  35.0  23.0  35.0  46.0