Python 为什么一些数据帧数学函数花费更多的时间？如何加速它们？_Python_Pandas_Dataframe_Numpy_Math

Python 为什么一些数据帧数学函数花费更多的时间？如何加速它们？

python pandas dataframe numpy math

Python 为什么一些数据帧数学函数花费更多的时间？如何加速它们？,python,pandas,dataframe,numpy,math,Python,Pandas,Dataframe,Numpy,Math,考虑上面的代码，当我想要计算每列的滚动argmax时，代码运行非常慢但当我将argmax更改为max并运行以下代码时，代码可能会在几秒钟内完成： df1 = pd.DataFrame(data=random_state.randint(10000, size=(3774, 3000)), index=pd.date_range('2010-01-01', '2020-05-01', freq='d')) print(df1.rolling(window=20).apply(lambda x:x

考虑上面的代码，当我想要计算每列的滚动argmax时，代码运行非常慢

但当我将argmax更改为max并运行以下代码时，代码可能会在几秒钟内完成：

df1 = pd.DataFrame(data=random_state.randint(10000, size=(3774, 3000)), index=pd.date_range('2010-01-01', '2020-05-01', freq='d'))
print(df1.rolling(window=20).apply(lambda x:x.argmax()))

由于rolling（）对象没有类似argmax（）、prod（）的函数，因此我必须使用apply（lambda x:x.argmax（）/x.prod（））来代替，但这会花费更多的时间

为什么时间相差这么大？如果有更快运行代码的解决方案？

仅限

numpy>=1.20.0

为演示输入数据：

将熊猫作为pd导入
将numpy作为np导入
从numpy.lib.stride\u导入滑动窗口\u视图
窗口大小=3
df=pd.DataFrame（数据=10*np.arange（5*10）.重塑（（5,10）））

>>df
0    1    2    3    4    5    6    7    8    9
0    0   10   20   30   40   50   60   70   80   90
1  100  110  120  130  140  150  160  170  180  190
2  200  210  220  230  240  250  260  270  280  290
3  300  310  320  330  340  350  360  370  380  390
4  400  410  420  430  440  450  460  470  480  490

用于在具有给定窗口形状的阵列中创建滑动窗口视图：

滑动窗口视图（df，（WINDOWSIZE，len（df.columns）））数组（[[0,10,20,30,40,50,60,70,80,90]， [100, 110, 120, 130, 140, 150, 160, 170, 180, 190], [200, 210, 220, 230, 240, 250, 260, 270, 280, 290]]], [[[100, 110, 120, 130, 140, 150, 160, 170, 180, 190], [200, 210, 220, 230, 240, 250, 260, 270, 280, 290], [300, 310, 320, 330, 340, 350, 360, 370, 380, 390]]], [[[200, 210, 220, 230, 240, 250, 260, 270, 280, 290], [300, 310, 320, 330, 340, 350, 360, 370, 380, 390], [400, 410, 420, 430, 440, 450, 460, 470, 480, 490]]]]) 在第三个轴（索引=2）上应用

argmax

，并挤压以获得2D数组（类似于数据帧）：

滑动窗口视图（df，（WINDOWSIZE，len（df.columns））.argmax（轴=2）数组（[[2,2,2,2,2,2,2,2,2,2]]， [[2, 2, 2, 2, 2, 2, 2, 2, 2, 2]], [[2, 2, 2, 2, 2, 2, 2, 2, 2, 2]]]) >>>滑动窗口视图（df，（WINDOWSIZE，len（df.columns））.argmax（轴=2）.squence（）数组（[[2,2,2,2,2,2,2,2,2,2,2]， [2, 2, 2, 2, 2, 2, 2, 2, 2, 2], [2, 2, 2, 2, 2, 2, 2, 2, 2, 2]]) 最后，将数组转换为数据帧：

out=pd.DataFrame（index=df.index，columns=df.columns）
out.iloc[WINDOWSIZE-1::=滑动窗口视图（df，（WINDOWSIZE，len（df.columns）））\
.argmax（轴=2）.挤压（）

性能

WINDOWSIZE=20
df1=pd.DataFrame（data=np.random.randint（10000，size=（37743000）），index=pd.date\u范围（'2010-01-01'，'2020-05-01'，freq='d'））
>>>%timeit滑动窗口视图（df1，（WINDOWSIZE，len（df1.columns））.argmax（axis=2）.squence（）
每个回路1.43 s±5.63 ms（7次运行的平均值±标准偏差，每个回路1次）

Apply是一个方便的函数，但它实际上是一个循环，这就是它速度慢的原因。对于这样一个大小的数组（3774x3000），很有可能

x.prod（）

是

无论选择哪个轴。谢谢你的回答，我还有一个问题。由于

滑动窗口视图（df1，（WINDOWSIZE，len（df1.columns））

的返回数组没有实现一些数学函数，并且我想应用一些用户定义的函数，所以我尝试在返回数组上使用np.apply_沿_轴，但它仍然花费了很多时间。因此，如果有任何解决方案可以更快地应用用户定义的函数，那么“不实现某些数学函数”是什么意思？滑动后，您有一个shape

（len（df1.index）-WINDOWSIZE、len（df1.columns）

数组。您可以不使用

应用方法执行任何操作。
df1 = pd.DataFrame(data=random_state.randint(10000, size=(3774, 3000)), index=pd.date_range('2010-01-01', '2020-05-01', freq='d'))
# print(df1.rolling(window=20).apply(lambda x:x.argmax()))
print(df1.rolling(window=20).max())

>>> out
   0  1  2  3  4  5  6  7  8  9
2  2  2  2  2  2  2  2  2  2  2
3  2  2  2  2  2  2  2  2  2  2
4  2  2  2  2  2  2  2  2  2  2