Python 使用步幅填充numpy滚动窗口操作_Python_Pandas_Numpy_Sliding Window

Python 使用步幅填充numpy滚动窗口操作

python pandas numpy

Python 使用步幅填充numpy滚动窗口操作,python,pandas,numpy,sliding-window,Python,Pandas,Numpy,Sliding Window,我有一个函数f，我想在滑动窗口中有效地计算它 def efficient_f(x): # do stuff wSize=50 return another_f(rolling_window_using_strides(x, wSize), -1) 我在上一篇文章中看到，使用步幅来实现这一点尤其有效：从numpy.lib.stride\u在跨步时导入技巧 def rolling_window_using_strides(a, window): shape = a.sh

我有一个函数f，我想在滑动窗口中有效地计算它

def efficient_f(x):
   # do stuff
   wSize=50
   return another_f(rolling_window_using_strides(x, wSize), -1)

我在上一篇文章中看到，使用步幅来实现这一点尤其有效：从numpy.lib.stride\u在跨步时导入技巧

def rolling_window_using_strides(a, window):
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    print np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides).shape
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

然后我尝试将其应用于df：

df=pd.DataFrame(data=np.random.rand(180000,1),columns=['foo'])
df['bar']=df[['foo']].apply(efficient_f,raw=True)
# note the double [[, otherwise pd.Series.apply
# (not accepting raw, and axis kwargs) will be called instead of pd.DataFrame.

它工作得非常好，确实带来了显著的性能提升。但是，我仍然得到以下错误：

ValueError: Shape of passed values is (1, 179951), indices imply (1, 180000).

这是因为我使用wSize=50，这会产生

rolling_window_using_strides(df['foo'].values,50).shape
(1L, 179951L, 50L)

有没有办法在边界处添加零/np.n填充来获得

(1L, 180000, 50L)

因此与原始向量大小相同，这里有一种方法可以用-

样本运行-

In [95]: np.random.seed(0)

In [96]: a = np.random.rand(8,1)

In [97]: a
Out[97]: 
array([[ 0.55],
       [ 0.72],
       [ 0.6 ],
       [ 0.54],
       [ 0.42],
       [ 0.65],
       [ 0.44],
       [ 0.89]])

In [98]: strided_axis0(a[:,0], fillval=np.nan, L=3)
Out[98]: 
array([[  nan,   nan,  0.55],
       [  nan,  0.55,  0.72],
       [ 0.55,  0.72,  0.6 ],
       [ 0.72,  0.6 ,  0.54],
       [ 0.6 ,  0.54,  0.42],
       [ 0.54,  0.42,  0.65],
       [ 0.42,  0.65,  0.44],
       [ 0.65,  0.44,  0.89]])

在末尾或开始时使用Pad？不确定默认情况下它是如何工作的…我想在rhe startI希望找到numpy函数的参数，但是这个解决方案非常有效

In [95]: np.random.seed(0)

In [96]: a = np.random.rand(8,1)

In [97]: a
Out[97]: 
array([[ 0.55],
       [ 0.72],
       [ 0.6 ],
       [ 0.54],
       [ 0.42],
       [ 0.65],
       [ 0.44],
       [ 0.89]])

In [98]: strided_axis0(a[:,0], fillval=np.nan, L=3)
Out[98]: 
array([[  nan,   nan,  0.55],
       [  nan,  0.55,  0.72],
       [ 0.55,  0.72,  0.6 ],
       [ 0.72,  0.6 ,  0.54],
       [ 0.6 ,  0.54,  0.42],
       [ 0.54,  0.42,  0.65],
       [ 0.42,  0.65,  0.44],
       [ 0.65,  0.44,  0.89]])