Python Pandas Groupby.diff用零填充缺少的行
我肯定这是张贴在某处,或如此简单,我没有看到它,但我一直没有找到一个张贴运气。任何帮助都是非常合适的 正如你所看到的,我正在尝试做一个groupby.diff。如果缺少日期,我需要显示负值Python Pandas Groupby.diff用零填充缺少的行,python,pandas,missing-data,pandas-groupby,Python,Pandas,Missing Data,Pandas Groupby,我肯定这是张贴在某处,或如此简单,我没有看到它,但我一直没有找到一个张贴运气。任何帮助都是非常合适的 正如你所看到的,我正在尝试做一个groupby.diff。如果缺少日期,我需要显示负值 df['delta'] = df.groupby(['ID', 'ticker', 'date'])['shares'].diff() ID ticker date shares delta A AAA 3/31/2012 904180 675010 A AAA
df['delta'] = df.groupby(['ID', 'ticker', 'date'])['shares'].diff()
ID ticker date shares delta
A AAA 3/31/2012 904180 675010
A AAA 12/31/2011 229170 NaN
A BBB 3/31/2012 517756 390117
A BBB 12/31/2011 127639 NaN
A CCC 12/31/2011 1757 NaN
A DDD 12/31/2011 500 NaN
B AAA 3/31/2012 920920 554920
B AAA 12/31/2011 366000 NaN
B BBB 3/31/2012 524 393
B BBB 12/31/2011 131 NaN
我想我需要填充才能得到这个:
ID ticker date shares delta
A AAA 3/31/2012 904180 675010
A AAA 12/31/2011 229170 NaN
A BBB 3/31/2012 517756 390117
A BBB 12/31/2011 127639 NaN
A CCC 3/31/2012 0 -1757
A CCC 12/31/2011 1757 NaN
A DDD 3/31/2012 0 -500
A DDD 12/31/2011 500 NaN
B AAA 3/31/2012 920920 554920
B AAA 12/31/2011 366000 NaN
B BBB 3/31/2012 524 393
B BBB 12/31/2011 131 NaN
再次感谢,使用
unstack
+stack
New_df=df.set_index(['ID','ticker','date']).unstack('date').stack(dropna=False).reset_index().fillna(0)
New_df['delta'] = New_df.groupby(['ID', 'ticker', 'date'])['shares'].diff()
# you should not groupby date, it will return all NaN after you did diff
New_df['delta'] = New_df.groupby(['ID', 'ticker'])['shares'].diff()
#New_df['delta'] = New_df.groupby(['ID', 'ticker','date'])['shares'].diff()
New_df
Out[316]:
ID ticker date shares delta
0 A AAA 12/31/2011 229170.0 NaN
1 A AAA 3/31/2012 904180.0 675010.0
2 A BBB 12/31/2011 127639.0 NaN
3 A BBB 3/31/2012 517756.0 390117.0
4 A CCC 12/31/2011 1757.0 NaN
5 A CCC 3/31/2012 0.0 -1757.0
6 A DDD 12/31/2011 500.0 NaN
7 A DDD 3/31/2012 0.0 -500.0
8 B AAA 12/31/2011 366000.0 NaN
9 B AAA 3/31/2012 920920.0 554920.0
10 B BBB 12/31/2011 131.0 NaN
11 B BBB 3/31/2012 524.0 393.0
排序后
New_df.sort_values(['ID','ticker','date'],ascending=[True,True,False])
Out[318]:
ID ticker date shares delta
1 A AAA 3/31/2012 904180.0 675010.0
0 A AAA 12/31/2011 229170.0 NaN
3 A BBB 3/31/2012 517756.0 390117.0
2 A BBB 12/31/2011 127639.0 NaN
5 A CCC 3/31/2012 0.0 -1757.0
4 A CCC 12/31/2011 1757.0 NaN
7 A DDD 3/31/2012 0.0 -500.0
6 A DDD 12/31/2011 500.0 NaN
9 B AAA 3/31/2012 920920.0 554920.0
8 B AAA 12/31/2011 366000.0 NaN
11 B BBB 3/31/2012 524.0 393.0
10 B BBB 12/31/2011 131.0 NaN
你搞定了,谢谢@TKYYW~:-快乐编码,顺便说一下,如果有帮助,你能考虑接受吗?