Python 高效计算滚动函数_Python_Pandas

Python 高效计算滚动函数

python pandas

Python 高效计算滚动函数,python,pandas,Python,Pandas,我需要使用pandas计算移动平均线 ser = pd.Series(np.random.randn(100), index=pd.date_range('1/1/2000', periods=100, freq='1min')) ser.rolling(window=20).mean().tail(5) [Out] 2000-01-01 01:35:00 0.390383 2000-01-01 01:36:00 0.279308 2000-01

我需要使用

pandas

计算移动平均线

ser = pd.Series(np.random.randn(100), 
                index=pd.date_range('1/1/2000', periods=100, freq='1min'))

ser.rolling(window=20).mean().tail(5)

[Out]
2000-01-01 01:35:00    0.390383
2000-01-01 01:36:00    0.279308
2000-01-01 01:37:00    0.173532
2000-01-01 01:38:00    0.194097
2000-01-01 01:39:00    0.194743
Freq: T, dtype: float64

但是在像这样添加了一个新行之后

new_row = pd.Series([1.0], index=[pd.to_datetime("2000-01-01 01:40:00")])
ser = ser.append(new_row)

ser.rolling(window=20).mean().tail(5)

[Out]
2000-01-01 01:36:00    0.279308
2000-01-01 01:37:00    0.173532
2000-01-01 01:38:00    0.194097
2000-01-01 01:39:00    0.194743
2000-01-01 01:40:00    0.201918
dtype: float64

我必须重新计算所有移动数据，就像这样

new_row = pd.Series([1.0], index=[pd.to_datetime("2000-01-01 01:40:00")])
ser = ser.append(new_row)

ser.rolling(window=20).mean().tail(5)

[Out]
2000-01-01 01:36:00    0.279308
2000-01-01 01:37:00    0.173532
2000-01-01 01:38:00    0.194097
2000-01-01 01:39:00    0.194743
2000-01-01 01:40:00    0.201918
dtype: float64

我想我只需要计算最后的

2000-01-01 01:40:00 0.201918

数据，但我找不到只计算最后附加行值的api。熊猫

rolling（）.mean（）

始终计算所有系列数据

这是一个简单的示例，但在我的实际项目中，范围超过1000000系列，并且每次滚动计算都会消耗大量时间

有没有办法解决pandas中的这个问题？

正如Anton vBR在他的评论中所写，在添加行之后，可以使用

ser.tail(20).mean

所需时间与序列长度无关（在您的示例中为1000000）

如果您经常执行此操作，您可以更高效地计算它。追加行后的平均值为：

倒数第二行平均值的20倍
加上最新的附加值
减去最后21个索引处的值
除以20

不过，实现起来更为复杂。

选择最后20个值（例如，使用tail（20））并执行.mean（）？