Python 从2d numpy阵列创建数据历史记录?

Python 从2d numpy阵列创建数据历史记录?,python,arrays,numpy,Python,Arrays,Numpy,假设我有一个形状为nxm的二维numpy数组(其中n是大数,m>=1)。每列代表一个属性。下面提供了n=5,m=3的示例: [[1,2,3], [4,5,6], [7,8,9], [10,11,12], [13,14,15]] 我想用history_steps=p(1

假设我有一个形状为nxm的二维numpy数组(其中n是大数,m>=1)。每列代表一个属性。下面提供了n=5,m=3的示例:

[[1,2,3],
[4,5,6],
[7,8,9],
[10,11,12],
[13,14,15]]

我想用history_steps=p(1dstack+
重塑

a = np.array([[1,2,3],
[4,5,6],
[7,8,9],
[10,11,12],
[13,14,15]])

# use `dstack` to stack the two arrays(one with last row removed, the other with first 
# row removed), along the third axis, and then use reshape to flatten the second and third
# dimensions
np.dstack([a[:-1], a[1:]]).reshape(a.shape[0]-1, -1)

#array([[ 1,  4,  2,  5,  3,  6],
#       [ 4,  7,  5,  8,  6,  9],
#       [ 7, 10,  8, 11,  9, 12],
#       [10, 13, 11, 14, 12, 15]])
n, m = a.shape
p = 3
np.dstack([a[i:(n-p+i+1)] for i in range(p)]).reshape(n-p+1, -1)

#array([[ 1,  4,  7,  2,  5,  8,  3,  6,  9],
#       [ 4,  7, 10,  5,  8, 11,  6,  9, 12],
#       [ 7, 10, 13,  8, 11, 14,  9, 12, 15]])
要概括为任意的
p
,请使用列表理解生成移位数组的列表,然后执行
堆栈+重塑

a = np.array([[1,2,3],
[4,5,6],
[7,8,9],
[10,11,12],
[13,14,15]])

# use `dstack` to stack the two arrays(one with last row removed, the other with first 
# row removed), along the third axis, and then use reshape to flatten the second and third
# dimensions
np.dstack([a[:-1], a[1:]]).reshape(a.shape[0]-1, -1)

#array([[ 1,  4,  2,  5,  3,  6],
#       [ 4,  7,  5,  8,  6,  9],
#       [ 7, 10,  8, 11,  9, 12],
#       [10, 13, 11, 14, 12, 15]])
n, m = a.shape
p = 3
np.dstack([a[i:(n-p+i+1)] for i in range(p)]).reshape(n-p+1, -1)

#array([[ 1,  4,  7,  2,  5,  8,  3,  6,  9],
#       [ 4,  7, 10,  5,  8, 11,  6,  9, 12],
#       [ 7, 10, 13,  8, 11, 14,  9, 12, 15]])

下面是一个基于NumPy的方法,重点是使用-

样本运行-

In [27]: a
Out[27]: 
array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12],
       [13, 14, 15]])

In [28]: strided_axis0(a, L=2)
Out[28]: 
array([[ 1,  4,  2,  5,  3,  6],
       [ 4,  7,  5,  8,  6,  9],
       [ 7, 10,  8, 11,  9, 12],
       [10, 13, 11, 14, 12, 15]])

几乎所有pandas函数在numpy中都有一个等价项,因为pandas在引擎盖下广泛使用numpy。你为什么不阅读numpy文档来了解它呢?(注意,在大多数情况下,用
np.function
替换
pd.function
是有效的!)是的。我同意。但是,不将数据拆分为列并进行缓冲怎么样?老实说,我没有完全理解你想要做的事情,你想要的输出背后的逻辑是什么…@Julien:列代表不同的属性。而行代表这些属性在特定时间戳的值。我想要的是训练一个关于属性序列的机器学习模型。我知道我可以做时间序列方法,也可能是RNN。但是,我不太了解它们。p是如何出现在这里的?如果我想让p=3更新一个方法来处理移位的呢。现在看起来很好。非常感谢!这是新的。我从来都不知道numpy中存在类似的东西。@GKS Yup,那就是
np.lib.stride\u技巧。as\u strided
可能是NumPy中最深奥、最有效的东西。在过去24小时内使用了三次它来回答以下问题:)
In [27]: a
Out[27]: 
array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12],
       [13, 14, 15]])

In [28]: strided_axis0(a, L=2)
Out[28]: 
array([[ 1,  4,  2,  5,  3,  6],
       [ 4,  7,  5,  8,  6,  9],
       [ 7, 10,  8, 11,  9, 12],
       [10, 13, 11, 14, 12, 15]])