Pandas 折叠数据帧的重复行
我有数据帧Pandas 折叠数据帧的重复行,pandas,python-3.8,Pandas,Python 3.8,我有数据帧df,如下所示: Col1 Col2 Col3 StartDate EndDate Qty 24HR A1 B1 1/1/2020 1/31/2020 4.2 24HR A1 B1 2/1/2020 2/29/2020 11 asd A2 B2 2/1/2020 2/29/2020 35 asd A2 B2 3/1/20
df
,如下所示:
Col1 Col2 Col3 StartDate EndDate Qty
24HR A1 B1 1/1/2020 1/31/2020 4.2
24HR A1 B1 2/1/2020 2/29/2020 11
asd A2 B2 2/1/2020 2/29/2020 35
asd A2 B2 3/1/2020 3/31/2020 23
asd A2 B2 4/1/2020 4/30/2020 35
asd A2 B2 5/1/2020 5/31/2020 46
Col1 Col2 Col3 StartDate EndDate Jan Feb Mar Apr May
24HR A1 B1 1/1/2020 2/29/2020 4.2 11
asd A2 B2 2/1/2020 5/31/2020 35 23 35 46
df['MnthName'] = df['StartDate'].dt.strftime('%b')
df = df.pivot_table(index=['Col1', 'Col2', 'Col3'], values='Qty', columns='MnthName')
我需要根据Col1、Col2、Col3
中的重复来折叠行,以获得以下内容:
Col1 Col2 Col3 StartDate EndDate Qty
24HR A1 B1 1/1/2020 1/31/2020 4.2
24HR A1 B1 2/1/2020 2/29/2020 11
asd A2 B2 2/1/2020 2/29/2020 35
asd A2 B2 3/1/2020 3/31/2020 23
asd A2 B2 4/1/2020 4/30/2020 35
asd A2 B2 5/1/2020 5/31/2020 46
Col1 Col2 Col3 StartDate EndDate Jan Feb Mar Apr May
24HR A1 B1 1/1/2020 2/29/2020 4.2 11
asd A2 B2 2/1/2020 5/31/2020 35 23 35 46
df['MnthName'] = df['StartDate'].dt.strftime('%b')
df = df.pivot_table(index=['Col1', 'Col2', 'Col3'], values='Qty', columns='MnthName')
上面的StartDate
和EndDate
是所有列的最小值和最大值。i、 e.对于值为24小时、A1、B1
的列,最小StartDate
为1/1/2020
,最大EndDate
为2/29/2020
我尝试了以下方法:
Col1 Col2 Col3 StartDate EndDate Qty
24HR A1 B1 1/1/2020 1/31/2020 4.2
24HR A1 B1 2/1/2020 2/29/2020 11
asd A2 B2 2/1/2020 2/29/2020 35
asd A2 B2 3/1/2020 3/31/2020 23
asd A2 B2 4/1/2020 4/30/2020 35
asd A2 B2 5/1/2020 5/31/2020 46
Col1 Col2 Col3 StartDate EndDate Jan Feb Mar Apr May
24HR A1 B1 1/1/2020 2/29/2020 4.2 11
asd A2 B2 2/1/2020 5/31/2020 35 23 35 46
df['MnthName'] = df['StartDate'].dt.strftime('%b')
df = df.pivot_table(index=['Col1', 'Col2', 'Col3'], values='Qty', columns='MnthName')
但我不知道如何将其分组,以便为
Col1、Col2、Col3
唯一对中的每一对选择StartDate
的最小值和EndDate的最大值。我们可以将pivot
和agg
然后concat
将它们组合在一起
s1=df.pivot_table(index=['Col1','Col2','Col3'],columns='StartDate',values='Qty')
s2=df.groupby(['Col1','Col2','Col3']).agg({'StartDate':'first','EndDate':'last'})
s1.columns=pd.to_datetime(s1.columns,dayfirst=False).strftime('%b')
s=pd.concat([s2,s1],axis=1).reset_index()
s
Col1 Col2 Col3 StartDate EndDate Jan Feb Mar Apr May
0 24HR A1 B1 1/1/2020 2/28/2020 4.2 11.0 NaN NaN NaN
1 asd A2 B2 2/1/2020 5/31/2020 NaN 35.0 23.0 35.0 46.0