Python 创建具有时间步长和多个功能的新阵列,例如用于LSTM

Python 创建具有时间步长和多个功能的新阵列,例如用于LSTM,python,arrays,performance,numpy,Python,Arrays,Performance,Numpy,您好,我正在使用numpy为LSTM创建一个具有时间步长和多个功能的新阵列 我已经研究了许多使用跨步和重塑的方法,但没有找到有效的解决方案 这是一个解决玩具问题的函数,但是我有30000个示例,每个示例都有100个特性 def make_timesteps(a, timesteps): array = [] for j in np.arange(len(a)): unit = [] for i in rang

您好,我正在使用numpy为LSTM创建一个具有时间步长和多个功能的新阵列

我已经研究了许多使用跨步和重塑的方法,但没有找到有效的解决方案

这是一个解决玩具问题的函数,但是我有30000个示例,每个示例都有100个特性

    def make_timesteps(a, timesteps):
        array = []
        for j in np.arange(len(a)):
            unit = []
            for i in range(timesteps):
                unit.append(np.roll(a, i, axis=0)[j])
            array.append(unit)
        return np.array(array)
inArr=np.array([[1,2],[3,4],[5,6]])

inArr.shape=>(3,2)

outArr=make\u时间步(inArr,2)

outArr.shape=>(3,2,2)

=>正确


有没有更有效的方法(一定有!!)能有人帮忙吗

一个技巧是将最后的
L-1
行附加到数组之外,并将它们附加到数组的开头。那么,这将是一个使用非常高效的。对于想知道这个技巧的成本的人来说,正如我们稍后通过计时测试所看到的,它就像没有一样好

通向最终目标的技巧将支持代码中的向前和向后跨步,看起来是这样的-

向后跨步:

def strided_axis0_backward(inArr, L = 2):
    # INPUTS :
    # a : Input array
    # L : Length along rows to be cut to create per subarray

    # Append the last row to the start. It just helps in keeping a view output.
    a = np.vstack(( inArr[-L+1:], inArr ))

    # Store shape and strides info
    m,n = a.shape
    s0,s1 = a.strides

    # Length of 3D output array along its axis=0
    nd0 = m - L + 1

    strided = np.lib.stride_tricks.as_strided    
    return strided(a[L-1:], shape=(nd0,L,n), strides=(s0,-s0,s1))
def strided_axis0_forward(inArr, L = 2):
    # INPUTS :
    # a : Input array
    # L : Length along rows to be cut to create per subarray

    # Append the last row to the start. It just helps in keeping a view output.
    a = np.vstack(( inArr , inArr[:L-1] ))

    # Store shape and strides info
    m,n = a.shape
    s0,s1 = a.strides

    # Length of 3D output array along its axis=0
    nd0 = m - L + 1

    strided = np.lib.stride_tricks.as_strided    
    return strided(a[:L-1], shape=(nd0,L,n), strides=(s0,s0,s1))
向前跨步:

def strided_axis0_backward(inArr, L = 2):
    # INPUTS :
    # a : Input array
    # L : Length along rows to be cut to create per subarray

    # Append the last row to the start. It just helps in keeping a view output.
    a = np.vstack(( inArr[-L+1:], inArr ))

    # Store shape and strides info
    m,n = a.shape
    s0,s1 = a.strides

    # Length of 3D output array along its axis=0
    nd0 = m - L + 1

    strided = np.lib.stride_tricks.as_strided    
    return strided(a[L-1:], shape=(nd0,L,n), strides=(s0,-s0,s1))
def strided_axis0_forward(inArr, L = 2):
    # INPUTS :
    # a : Input array
    # L : Length along rows to be cut to create per subarray

    # Append the last row to the start. It just helps in keeping a view output.
    a = np.vstack(( inArr , inArr[:L-1] ))

    # Store shape and strides info
    m,n = a.shape
    s0,s1 = a.strides

    # Length of 3D output array along its axis=0
    nd0 = m - L + 1

    strided = np.lib.stride_tricks.as_strided    
    return strided(a[:L-1], shape=(nd0,L,n), strides=(s0,s0,s1))
样本运行-

In [42]: inArr
Out[42]: 
array([[1, 2],
       [3, 4],
       [5, 6]])

In [43]: strided_axis0_backward(inArr, 2)
Out[43]: 
array([[[1, 2],
        [5, 6]],

       [[3, 4],
        [1, 2]],

       [[5, 6],
        [3, 4]]])

In [44]: strided_axis0_forward(inArr, 2)
Out[44]: 
array([[[1, 2],
        [3, 4]],

       [[3, 4],
        [5, 6]],

       [[5, 6],
        [1, 2]]])
运行时测试-

In [53]: inArr = np.random.randint(0,9,(1000,10))

In [54]: %timeit make_timesteps(inArr, 2)
    ...: %timeit strided_axis0_forward(inArr, 2)
    ...: %timeit strided_axis0_backward(inArr, 2)
    ...: 
10 loops, best of 3: 33.9 ms per loop
100000 loops, best of 3: 12.1 µs per loop
100000 loops, best of 3: 12.2 µs per loop

In [55]: %timeit make_timesteps(inArr, 10)
    ...: %timeit strided_axis0_forward(inArr, 10)
    ...: %timeit strided_axis0_backward(inArr, 10)
    ...: 
1 loops, best of 3: 152 ms per loop
100000 loops, best of 3: 12 µs per loop
100000 loops, best of 3: 12.1 µs per loop

In [56]: 152000/12.1  # Speedup figure
Out[56]: 12561.98347107438
即使我们在输出中增加子阵列的长度,
跨步轴0
的计时也保持不变。这正好向我们展示了
大步前进的巨大好处,当然还有比原来的loopy版本更疯狂的加速

正如一开始所承诺的,以下是使用
np.vstack
-

In [417]: inArr = np.random.randint(0,9,(1000,10))

In [418]: L = 10

In [419]: %timeit np.vstack(( inArr[-L+1:], inArr ))
100000 loops, best of 3: 5.41 µs per loop

计时支持堆叠是一个非常有效的想法。

非常感谢-这真的很有帮助,我以前看到过,但直到您的示例和链接,我才理解它!为了获得相同的顺序,我使用将第一行添加到最后一行,然后在轴1上进行np.flip。我已经编辑了这个问题,以显示我的最终代码。@nickyzee我想我不明白你为什么需要
flip
。您的
make_timesteps
正确吗,因为我编码的目的是产生与
make_timesteps
相同的结果。根据您的翻转建议,我的代码产生的结果与使用
make_timesteps
产生的结果不同。澄清这一点?很有趣-翻转需要与我从原作中获得的输出相同。对输出的目视检查也证实了这一点。不知道为什么-numpy@latest和py3.5-我原来的返回数组([[1,2],[3,4],[[3,4],[5,6],[[5,6],[7,8],[[7,8],[1,2]])
@nickyzee更新了两个向前和向后跨步版本。看看那些!