Python 取最近N天的平均值,包括NaN

Python 取最近N天的平均值,包括NaN,python,pandas,numpy,Python,Pandas,Numpy,我有这个数据框: ID Date X 123_Var 456_Var 789_Var A 16-07-19 3 777.0 250.0 810.0 A 17-07-19 9 637.0 121.0 529.0 A 20-07-19 2 295.0 272.0 490.0 A 21-07-19 3 778.0 600.0 544.0 A 22-07-19 6 741.0

我有这个数据框:

ID      Date  X  123_Var  456_Var  789_Var
 A  16-07-19  3    777.0    250.0    810.0
 A  17-07-19  9    637.0    121.0    529.0
 A  20-07-19  2    295.0    272.0    490.0
 A  21-07-19  3    778.0    600.0    544.0
 A  22-07-19  6    741.0    792.0    907.0
 A  25-07-19  6    435.0    416.0    820.0
 A  26-07-19  8    590.0    455.0    342.0
 A  27-07-19  6    763.0    476.0    753.0
 A  02-08-19  6    717.0    211.0    454.0
 A  03-08-19  6    152.0    442.0    475.0
 A  05-08-19  6    564.0    340.0    302.0
 A  07-08-19  6    105.0    929.0    633.0
 A  08-08-19  6    948.0    366.0    586.0
 B  07-08-19  4    509.0    690.0    406.0
 B  08-08-19  2    413.0    725.0    414.0
 B  12-08-19  2    170.0    702.0    912.0
 B  13-08-19  3    851.0    616.0    477.0
 B  14-08-19  9    475.0    447.0    555.0
 B  15-08-19  1    412.0    403.0    708.0
 B  17-08-19  2    299.0    537.0    321.0
 B  18-08-19  4    310.0    119.0    125.0
 C  16-07-19  3    777.0    250.0    810.0
 C  17-07-19  9    637.0    121.0    529.0
 C  20-07-19  2      NaN      NaN      NaN
 C  21-07-19  3      NaN      NaN      NaN
 C  22-07-19  6    741.0    792.0    907.0
 C  25-07-19  6      NaN      NaN      NaN
 C  26-07-19  8    590.0    455.0    342.0
 C  27-07-19  6    763.0    476.0    753.0
 C  02-08-19  6    717.0    211.0    454.0
 C  03-08-19  6      NaN      NaN      NaN
 C  05-08-19  6    564.0    340.0    302.0
 C  07-08-19  6      NaN      NaN      NaN
 C  08-08-19  6    948.0    366.0    586.0
我想显示
n
最后几天的
mean
值(使用
Date
列),不包括当天的值

我正在使用此代码(我应该如何修复此代码?)

预期结果:

ID      Date  X  123_Var  456_Var  789_Var   123_Var_4   456_Var_4  789_Var_4
 A  16-07-19  3    777.0    250.0    810.0         NaN         NaN        NaN
 A  17-07-19  9    637.0    121.0    529.0  777.000000  250.000000      810.0
 A  20-07-19  2    295.0    272.0    490.0  707.000000  185.500000      669.5
 A  21-07-19  3    778.0    600.0    544.0  466.000000  196.500000      509.5
 A  22-07-19  6    741.0    792.0    907.0  536.500000  436.000000      517.0
 A  25-07-19  6    435.0    416.0    820.0  759.500000  696.000000      725.5
 A  26-07-19  8    590.0    455.0    342.0  588.000000  604.000000      863.5
 A  27-07-19  6    763.0    476.0    753.0  512.500000  435.500000      581.0
 A  02-08-19  6    717.0    211.0    454.0         NaN         NaN        NaN
 A  03-08-19  6    152.0    442.0    475.0  717.000000  211.000000      454.0
 A  05-08-19  6    564.0    340.0    302.0  434.500000  326.500000      464.5
 A  07-08-19  6    105.0    929.0    633.0  358.000000  391.000000      388.5
 A  08-08-19  6    948.0    366.0    586.0  334.500000  634.500000      467.5
 B  07-08-19  4    509.0    690.0    406.0         NaN         NaN        NaN
 B  08-08-19  2    413.0    725.0    414.0  509.000000  690.000000      406.0
 B  12-08-19  2    170.0    702.0    912.0  413.000000  725.000000      414.0
 B  13-08-19  3    851.0    616.0    477.0  291.500000  713.500000      663.0
 B  14-08-19  9    475.0    447.0    555.0  510.500000  659.000000      694.5
 B  15-08-19  1    412.0    403.0    708.0  498.666667  588.333333      648.0
 B  17-08-19  2    299.0    537.0    321.0  579.333333  488.666667      580.0
 B  18-08-19  4    310.0    119.0    125.0  395.333333  462.333333      528.0
 C  16-07-19  3    777.0    250.0    810.0         NaN         NaN        NaN
 C  17-07-19  9    637.0    121.0    529.0  777.000000  250.000000      810.0
 C  20-07-19  2      NaN      NaN      NaN  707.000000  185.500000      669.5
 C  21-07-19  3      NaN      NaN      NaN  637.000000  121.000000      529.0
 C  22-07-19  6    741.0    792.0    907.0         NaN         NaN        NaN
 C  25-07-19  6      NaN      NaN      NaN  741.000000  792.000000      907.0
 C  26-07-19  8    590.0    455.0    342.0  741.000000  792.000000      907.0
 C  27-07-19  6    763.0    476.0    753.0  590.000000  455.000000      342.0
 C  02-08-19  6    717.0    211.0    454.0         NaN         NaN        NaN
 C  03-08-19  6      NaN      NaN      NaN  717.000000  211.000000      454.0
 C  05-08-19  6    564.0    340.0    302.0  717.000000  211.000000      454.0
 C  07-08-19  6      NaN      NaN      NaN  564.000000  340.000000      302.0
 C  08-08-19  6    948.0    366.0    586.0  564.000000  340.000000      302.0
小数点后的数字不是问题所在

这些线程可能会有帮助:

试试:

df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
df1 = (df.groupby('ID')['Date','123_Var','456_Var','789_Var'].rolling('4D', on='Date', closed='left').mean())
dfx = (df.set_index(['ID','Date'])
       .join(df1.reset_index().set_index(['ID','Date']), rsuffix='_4')
       .reset_index()
       .drop('level_1',axis=1))
print(dfx.to_string())

   ID       Date  X  123_Var  456_Var  789_Var   123_Var_4   456_Var_4  789_Var_4
0   A 2019-07-16  3    777.0    250.0    810.0         NaN         NaN        NaN
1   A 2019-07-17  9    637.0    121.0    529.0  777.000000  250.000000      810.0
2   A 2019-07-20  2    295.0    272.0    490.0  707.000000  185.500000      669.5
3   A 2019-07-21  3    778.0    600.0    544.0  466.000000  196.500000      509.5
4   A 2019-07-22  6    741.0    792.0    907.0  536.500000  436.000000      517.0
5   A 2019-07-25  6    435.0    416.0    820.0  759.500000  696.000000      725.5
6   A 2019-07-26  8    590.0    455.0    342.0  588.000000  604.000000      863.5
7   A 2019-07-27  6    763.0    476.0    753.0  512.500000  435.500000      581.0
8   A 2019-08-02  6    717.0    211.0    454.0         NaN         NaN        NaN
9   A 2019-08-03  6    152.0    442.0    475.0  717.000000  211.000000      454.0
10  A 2019-08-05  6    564.0    340.0    302.0  434.500000  326.500000      464.5
11  A 2019-08-07  6    105.0    929.0    633.0  358.000000  391.000000      388.5
12  A 2019-08-08  6    948.0    366.0    586.0  334.500000  634.500000      467.5
13  B 2019-08-07  4    509.0    690.0    406.0         NaN         NaN        NaN
14  B 2019-08-08  2    413.0    725.0    414.0  509.000000  690.000000      406.0
15  B 2019-08-12  2    170.0    702.0    912.0  413.000000  725.000000      414.0
16  B 2019-08-13  3    851.0    616.0    477.0  170.000000  702.000000      912.0
17  B 2019-08-14  9    475.0    447.0    555.0  510.500000  659.000000      694.5
18  B 2019-08-15  1    412.0    403.0    708.0  498.666667  588.333333      648.0
19  B 2019-08-17  2    299.0    537.0    321.0  579.333333  488.666667      580.0
20  B 2019-08-18  4    310.0    119.0    125.0  395.333333  462.333333      528.0
21  C 2019-07-16  3    777.0    250.0    810.0         NaN         NaN        NaN
22  C 2019-07-17  9    637.0    121.0    529.0  777.000000  250.000000      810.0
23  C 2019-07-20  2      NaN      NaN      NaN  707.000000  185.500000      669.5
24  C 2019-07-21  3      NaN      NaN      NaN  637.000000  121.000000      529.0
25  C 2019-07-22  6    741.0    792.0    907.0         NaN         NaN        NaN
26  C 2019-07-25  6      NaN      NaN      NaN  741.000000  792.000000      907.0
27  C 2019-07-26  8    590.0    455.0    342.0  741.000000  792.000000      907.0
28  C 2019-07-27  6    763.0    476.0    753.0  590.000000  455.000000      342.0
29  C 2019-08-02  6    717.0    211.0    454.0         NaN         NaN        NaN
30  C 2019-08-03  6      NaN      NaN      NaN  717.000000  211.000000      454.0
31  C 2019-08-05  6    564.0    340.0    302.0  717.000000  211.000000      454.0
32  C 2019-08-07  6      NaN      NaN      NaN  564.000000  340.000000      302.0
33  C 2019-08-08  6    948.0    366.0    586.0  564.000000  340.000000      302.0

为什么
C 22-07-19 6 741.0 792.0 907.0 NaN的必要值缺失?谢谢。@jezrael,因为2019年7月18-21日没有价值观
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
df1 = (df.groupby('ID')['Date','123_Var','456_Var','789_Var'].rolling('4D', on='Date', closed='left').mean())
dfx = (df.set_index(['ID','Date'])
       .join(df1.reset_index().set_index(['ID','Date']), rsuffix='_4')
       .reset_index()
       .drop('level_1',axis=1))
print(dfx.to_string())

   ID       Date  X  123_Var  456_Var  789_Var   123_Var_4   456_Var_4  789_Var_4
0   A 2019-07-16  3    777.0    250.0    810.0         NaN         NaN        NaN
1   A 2019-07-17  9    637.0    121.0    529.0  777.000000  250.000000      810.0
2   A 2019-07-20  2    295.0    272.0    490.0  707.000000  185.500000      669.5
3   A 2019-07-21  3    778.0    600.0    544.0  466.000000  196.500000      509.5
4   A 2019-07-22  6    741.0    792.0    907.0  536.500000  436.000000      517.0
5   A 2019-07-25  6    435.0    416.0    820.0  759.500000  696.000000      725.5
6   A 2019-07-26  8    590.0    455.0    342.0  588.000000  604.000000      863.5
7   A 2019-07-27  6    763.0    476.0    753.0  512.500000  435.500000      581.0
8   A 2019-08-02  6    717.0    211.0    454.0         NaN         NaN        NaN
9   A 2019-08-03  6    152.0    442.0    475.0  717.000000  211.000000      454.0
10  A 2019-08-05  6    564.0    340.0    302.0  434.500000  326.500000      464.5
11  A 2019-08-07  6    105.0    929.0    633.0  358.000000  391.000000      388.5
12  A 2019-08-08  6    948.0    366.0    586.0  334.500000  634.500000      467.5
13  B 2019-08-07  4    509.0    690.0    406.0         NaN         NaN        NaN
14  B 2019-08-08  2    413.0    725.0    414.0  509.000000  690.000000      406.0
15  B 2019-08-12  2    170.0    702.0    912.0  413.000000  725.000000      414.0
16  B 2019-08-13  3    851.0    616.0    477.0  170.000000  702.000000      912.0
17  B 2019-08-14  9    475.0    447.0    555.0  510.500000  659.000000      694.5
18  B 2019-08-15  1    412.0    403.0    708.0  498.666667  588.333333      648.0
19  B 2019-08-17  2    299.0    537.0    321.0  579.333333  488.666667      580.0
20  B 2019-08-18  4    310.0    119.0    125.0  395.333333  462.333333      528.0
21  C 2019-07-16  3    777.0    250.0    810.0         NaN         NaN        NaN
22  C 2019-07-17  9    637.0    121.0    529.0  777.000000  250.000000      810.0
23  C 2019-07-20  2      NaN      NaN      NaN  707.000000  185.500000      669.5
24  C 2019-07-21  3      NaN      NaN      NaN  637.000000  121.000000      529.0
25  C 2019-07-22  6    741.0    792.0    907.0         NaN         NaN        NaN
26  C 2019-07-25  6      NaN      NaN      NaN  741.000000  792.000000      907.0
27  C 2019-07-26  8    590.0    455.0    342.0  741.000000  792.000000      907.0
28  C 2019-07-27  6    763.0    476.0    753.0  590.000000  455.000000      342.0
29  C 2019-08-02  6    717.0    211.0    454.0         NaN         NaN        NaN
30  C 2019-08-03  6      NaN      NaN      NaN  717.000000  211.000000      454.0
31  C 2019-08-05  6    564.0    340.0    302.0  717.000000  211.000000      454.0
32  C 2019-08-07  6      NaN      NaN      NaN  564.000000  340.000000      302.0
33  C 2019-08-08  6    948.0    366.0    586.0  564.000000  340.000000      302.0