Performance 在1D NumPy阵列上创建NaN填充元素的滑动窗口
我有一个时间序列Performance 在1D NumPy阵列上创建NaN填充元素的滑动窗口,performance,numpy,scipy,time-series,vectorization,Performance,Numpy,Scipy,Time Series,Vectorization,我有一个时间序列x[0],x[1]。。。x[n-1],存储为一维numpy数组。我想将其转换为以下矩阵: NaN, ... , NaN , x[0] NaN, ... , x[0], x[1] . . NaN, x[0], ... , x[n-3],x[n-2] x[0], x[1], ... , x[n-2],x[n-1] 我想用这个矩阵来加速时间序列的计算。numpy或scipy中是否有执行此操作的功能?(我不想在python中使用for循环来实现它)
x[0],x[1]。。。x[n-1]
,存储为一维numpy
数组。我想将其转换为以下矩阵:
NaN, ... , NaN , x[0]
NaN, ... , x[0], x[1]
.
.
NaN, x[0], ... , x[n-3],x[n-2]
x[0], x[1], ... , x[n-2],x[n-1]
我想用这个矩阵来加速时间序列的计算。numpy
或scipy
中是否有执行此操作的功能?(我不想在python中使用for循环来实现它)一种方法-
样本运行-
In [41]: a
Out[41]: array([48, 82, 96, 34, 93, 25, 51, 26])
In [42]: nanpad_sliding2D(a)
Out[42]:
array([[ nan, nan, nan, nan, nan, nan, nan, 48.],
[ nan, nan, nan, nan, nan, nan, 48., 82.],
[ nan, nan, nan, nan, nan, 48., 82., 96.],
[ nan, nan, nan, nan, 48., 82., 96., 34.],
[ nan, nan, nan, 48., 82., 96., 34., 93.],
[ nan, nan, 48., 82., 96., 34., 93., 25.],
[ nan, 48., 82., 96., 34., 93., 25., 51.],
[ 48., 82., 96., 34., 93., 25., 51., 26.]])
具有步幅的内存效率
正如@Eric在评论中所提到的,这种基于跨步的方法将是一种内存效率高的方法,因为输出将只是一个进入NaNs padded
1D
版本的视图。让我们来测试一下-
In [158]: a # Sample 1D input
Out[158]: array([37, 95, 87, 10, 35])
In [159]: L = a.size # Run the posted approach
...: a_ext = np.concatenate(( np.full(a.size-1,np.nan) ,a))
...: n = a_ext.strides[0]
...: strided = np.lib.stride_tricks.as_strided
...: out = strided(a_ext, shape=(L,L), strides=(n,n))
...:
In [160]: np.may_share_memory(a_ext,out) O/p might be a view into extended version
Out[160]: True
让我们通过将值赋给a_ext
并将签出来确认输出实际上是一个视图
a_ext
和out
的初始值:
In [161]: a_ext
Out[161]: array([ nan, nan, nan, nan, 37., 95., 87., 10., 35.])
In [162]: out
Out[162]:
array([[ nan, nan, nan, nan, 37.],
[ nan, nan, nan, 37., 95.],
[ nan, nan, 37., 95., 87.],
[ nan, 37., 95., 87., 10.],
[ 37., 95., 87., 10., 35.]])
In [163]: a_ext[:] = 100
In [164]: out
Out[164]:
array([[ 100., 100., 100., 100., 100.],
[ 100., 100., 100., 100., 100.],
[ 100., 100., 100., 100., 100.],
[ 100., 100., 100., 100., 100.],
[ 100., 100., 100., 100., 100.]])
修改a_ext
:
In [161]: a_ext
Out[161]: array([ nan, nan, nan, nan, 37., 95., 87., 10., 35.])
In [162]: out
Out[162]:
array([[ nan, nan, nan, nan, 37.],
[ nan, nan, nan, 37., 95.],
[ nan, nan, 37., 95., 87.],
[ nan, 37., 95., 87., 10.],
[ 37., 95., 87., 10., 35.]])
In [163]: a_ext[:] = 100
In [164]: out
Out[164]:
array([[ 100., 100., 100., 100., 100.],
[ 100., 100., 100., 100., 100.],
[ 100., 100., 100., 100., 100.],
[ 100., 100., 100., 100., 100.],
[ 100., 100., 100., 100., 100.]])
查看新的输出
:
In [161]: a_ext
Out[161]: array([ nan, nan, nan, nan, 37., 95., 87., 10., 35.])
In [162]: out
Out[162]:
array([[ nan, nan, nan, nan, 37.],
[ nan, nan, nan, 37., 95.],
[ nan, nan, 37., 95., 87.],
[ nan, 37., 95., 87., 10.],
[ 37., 95., 87., 10., 35.]])
In [163]: a_ext[:] = 100
In [164]: out
Out[164]:
array([[ 100., 100., 100., 100., 100.],
[ 100., 100., 100., 100., 100.],
[ 100., 100., 100., 100., 100.],
[ 100., 100., 100., 100., 100.],
[ 100., 100., 100., 100., 100.]])
确认这是一个视图
最后,让我们测试一下内存需求:
In [131]: a_ext.nbytes
Out[131]: 72
In [132]: out.nbytes
Out[132]: 200
因此,即使输出显示为200
字节,实际上也只是72
字节,因为它是扩展数组的视图,扩展数组的大小为72
字节
还有一种方法是-
它的另一个好处是只使用2L-1
内存元素,而不是L^2
,这会在nanpad\u sliding2D(np.array([48,82,96,34,93,25,51,26])和nanpad\u sliding2D(a[:-1])中灾难性地失败。你应该使用a_ext.strips[0]
,而不是a.strips[0]
@Eric很好,应该是a_ext.strips[0]
!谢谢,已编辑。对于n=10000,stride版本比scipy版本快30000倍,但如果使用a_ext
进行任何计算,则该版本将为全尺寸。