Python 创建具有时间步长和多个功能的新阵列,例如用于LSTM
您好,我正在使用numpy为LSTM创建一个具有时间步长和多个功能的新阵列 我已经研究了许多使用跨步和重塑的方法,但没有找到有效的解决方案 这是一个解决玩具问题的函数,但是我有30000个示例,每个示例都有100个特性Python 创建具有时间步长和多个功能的新阵列,例如用于LSTM,python,arrays,performance,numpy,Python,Arrays,Performance,Numpy,您好,我正在使用numpy为LSTM创建一个具有时间步长和多个功能的新阵列 我已经研究了许多使用跨步和重塑的方法,但没有找到有效的解决方案 这是一个解决玩具问题的函数,但是我有30000个示例,每个示例都有100个特性 def make_timesteps(a, timesteps): array = [] for j in np.arange(len(a)): unit = [] for i in rang
def make_timesteps(a, timesteps):
array = []
for j in np.arange(len(a)):
unit = []
for i in range(timesteps):
unit.append(np.roll(a, i, axis=0)[j])
array.append(unit)
return np.array(array)
inArr=np.array([[1,2],[3,4],[5,6]])
inArr.shape=>(3,2)
outArr=make\u时间步(inArr,2)
outArr.shape=>(3,2,2)
=>正确
有没有更有效的方法(一定有!!)能有人帮忙吗 一个技巧是将最后的
L-1
行附加到数组之外,并将它们附加到数组的开头。那么,这将是一个使用非常高效的。对于想知道这个技巧的成本的人来说,正如我们稍后通过计时测试所看到的,它就像没有一样好
通向最终目标的技巧将支持代码中的向前和向后跨步,看起来是这样的-
向后跨步:
def strided_axis0_backward(inArr, L = 2):
# INPUTS :
# a : Input array
# L : Length along rows to be cut to create per subarray
# Append the last row to the start. It just helps in keeping a view output.
a = np.vstack(( inArr[-L+1:], inArr ))
# Store shape and strides info
m,n = a.shape
s0,s1 = a.strides
# Length of 3D output array along its axis=0
nd0 = m - L + 1
strided = np.lib.stride_tricks.as_strided
return strided(a[L-1:], shape=(nd0,L,n), strides=(s0,-s0,s1))
def strided_axis0_forward(inArr, L = 2):
# INPUTS :
# a : Input array
# L : Length along rows to be cut to create per subarray
# Append the last row to the start. It just helps in keeping a view output.
a = np.vstack(( inArr , inArr[:L-1] ))
# Store shape and strides info
m,n = a.shape
s0,s1 = a.strides
# Length of 3D output array along its axis=0
nd0 = m - L + 1
strided = np.lib.stride_tricks.as_strided
return strided(a[:L-1], shape=(nd0,L,n), strides=(s0,s0,s1))
向前跨步:
def strided_axis0_backward(inArr, L = 2):
# INPUTS :
# a : Input array
# L : Length along rows to be cut to create per subarray
# Append the last row to the start. It just helps in keeping a view output.
a = np.vstack(( inArr[-L+1:], inArr ))
# Store shape and strides info
m,n = a.shape
s0,s1 = a.strides
# Length of 3D output array along its axis=0
nd0 = m - L + 1
strided = np.lib.stride_tricks.as_strided
return strided(a[L-1:], shape=(nd0,L,n), strides=(s0,-s0,s1))
def strided_axis0_forward(inArr, L = 2):
# INPUTS :
# a : Input array
# L : Length along rows to be cut to create per subarray
# Append the last row to the start. It just helps in keeping a view output.
a = np.vstack(( inArr , inArr[:L-1] ))
# Store shape and strides info
m,n = a.shape
s0,s1 = a.strides
# Length of 3D output array along its axis=0
nd0 = m - L + 1
strided = np.lib.stride_tricks.as_strided
return strided(a[:L-1], shape=(nd0,L,n), strides=(s0,s0,s1))
样本运行-
In [42]: inArr
Out[42]:
array([[1, 2],
[3, 4],
[5, 6]])
In [43]: strided_axis0_backward(inArr, 2)
Out[43]:
array([[[1, 2],
[5, 6]],
[[3, 4],
[1, 2]],
[[5, 6],
[3, 4]]])
In [44]: strided_axis0_forward(inArr, 2)
Out[44]:
array([[[1, 2],
[3, 4]],
[[3, 4],
[5, 6]],
[[5, 6],
[1, 2]]])
运行时测试-
In [53]: inArr = np.random.randint(0,9,(1000,10))
In [54]: %timeit make_timesteps(inArr, 2)
...: %timeit strided_axis0_forward(inArr, 2)
...: %timeit strided_axis0_backward(inArr, 2)
...:
10 loops, best of 3: 33.9 ms per loop
100000 loops, best of 3: 12.1 µs per loop
100000 loops, best of 3: 12.2 µs per loop
In [55]: %timeit make_timesteps(inArr, 10)
...: %timeit strided_axis0_forward(inArr, 10)
...: %timeit strided_axis0_backward(inArr, 10)
...:
1 loops, best of 3: 152 ms per loop
100000 loops, best of 3: 12 µs per loop
100000 loops, best of 3: 12.1 µs per loop
In [56]: 152000/12.1 # Speedup figure
Out[56]: 12561.98347107438
即使我们在输出中增加子阵列的长度,跨步轴0
的计时也保持不变。这正好向我们展示了大步前进的巨大好处,当然还有比原来的loopy版本更疯狂的加速
正如一开始所承诺的,以下是使用np.vstack
-
In [417]: inArr = np.random.randint(0,9,(1000,10))
In [418]: L = 10
In [419]: %timeit np.vstack(( inArr[-L+1:], inArr ))
100000 loops, best of 3: 5.41 µs per loop
计时支持堆叠是一个非常有效的想法。非常感谢-这真的很有帮助,我以前看到过,但直到您的示例和链接,我才理解它!为了获得相同的顺序,我使用将第一行添加到最后一行,然后在轴1上进行np.flip。我已经编辑了这个问题,以显示我的最终代码。@nickyzee我想我不明白你为什么需要flip
。您的make_timesteps
正确吗,因为我编码的目的是产生与make_timesteps
相同的结果。根据您的翻转建议,我的代码产生的结果与使用make_timesteps
产生的结果不同。澄清这一点?很有趣-翻转需要与我从原作中获得的输出相同。对输出的目视检查也证实了这一点。不知道为什么-numpy@latest和py3.5-我原来的返回数组([[1,2],[3,4],[[3,4],[5,6],[[5,6],[7,8],[[7,8],[1,2]])
@nickyzee更新了两个向前和向后跨步版本。看看那些!