Python 从2d numpy阵列创建数据历史记录?
假设我有一个形状为nxm的二维numpy数组(其中n是大数,m>=1)。每列代表一个属性。下面提供了n=5,m=3的示例:Python 从2d numpy阵列创建数据历史记录?,python,arrays,numpy,Python,Arrays,Numpy,假设我有一个形状为nxm的二维numpy数组(其中n是大数,m>=1)。每列代表一个属性。下面提供了n=5,m=3的示例: [[1,2,3], [4,5,6], [7,8,9], [10,11,12], [13,14,15]] 我想用history_steps=p(1
[[1,2,3],
[4,5,6],
[7,8,9],
[10,11,12],
[13,14,15]]
我想用history_steps=p(1
dstack
+
重塑:
a = np.array([[1,2,3],
[4,5,6],
[7,8,9],
[10,11,12],
[13,14,15]])
# use `dstack` to stack the two arrays(one with last row removed, the other with first
# row removed), along the third axis, and then use reshape to flatten the second and third
# dimensions
np.dstack([a[:-1], a[1:]]).reshape(a.shape[0]-1, -1)
#array([[ 1, 4, 2, 5, 3, 6],
# [ 4, 7, 5, 8, 6, 9],
# [ 7, 10, 8, 11, 9, 12],
# [10, 13, 11, 14, 12, 15]])
n, m = a.shape
p = 3
np.dstack([a[i:(n-p+i+1)] for i in range(p)]).reshape(n-p+1, -1)
#array([[ 1, 4, 7, 2, 5, 8, 3, 6, 9],
# [ 4, 7, 10, 5, 8, 11, 6, 9, 12],
# [ 7, 10, 13, 8, 11, 14, 9, 12, 15]])
要概括为任意的p
,请使用列表理解生成移位数组的列表,然后执行堆栈+重塑
:
a = np.array([[1,2,3],
[4,5,6],
[7,8,9],
[10,11,12],
[13,14,15]])
# use `dstack` to stack the two arrays(one with last row removed, the other with first
# row removed), along the third axis, and then use reshape to flatten the second and third
# dimensions
np.dstack([a[:-1], a[1:]]).reshape(a.shape[0]-1, -1)
#array([[ 1, 4, 2, 5, 3, 6],
# [ 4, 7, 5, 8, 6, 9],
# [ 7, 10, 8, 11, 9, 12],
# [10, 13, 11, 14, 12, 15]])
n, m = a.shape
p = 3
np.dstack([a[i:(n-p+i+1)] for i in range(p)]).reshape(n-p+1, -1)
#array([[ 1, 4, 7, 2, 5, 8, 3, 6, 9],
# [ 4, 7, 10, 5, 8, 11, 6, 9, 12],
# [ 7, 10, 13, 8, 11, 14, 9, 12, 15]])
下面是一个基于NumPy的方法,重点是使用-
样本运行-
In [27]: a
Out[27]:
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12],
[13, 14, 15]])
In [28]: strided_axis0(a, L=2)
Out[28]:
array([[ 1, 4, 2, 5, 3, 6],
[ 4, 7, 5, 8, 6, 9],
[ 7, 10, 8, 11, 9, 12],
[10, 13, 11, 14, 12, 15]])
几乎所有pandas函数在numpy中都有一个等价项,因为pandas在引擎盖下广泛使用numpy。你为什么不阅读numpy文档来了解它呢?(注意,在大多数情况下,用np.function
替换pd.function
是有效的!)是的。我同意。但是,不将数据拆分为列并进行缓冲怎么样?老实说,我没有完全理解你想要做的事情,你想要的输出背后的逻辑是什么…@Julien:列代表不同的属性。而行代表这些属性在特定时间戳的值。我想要的是训练一个关于属性序列的机器学习模型。我知道我可以做时间序列方法,也可能是RNN。但是,我不太了解它们。p是如何出现在这里的?如果我想让p=3更新一个方法来处理移位的呢。现在看起来很好。非常感谢!这是新的。我从来都不知道numpy中存在类似的东西。@GKS Yup,那就是np.lib.stride\u技巧。as\u strided
可能是NumPy中最深奥、最有效的东西。在过去24小时内使用了三次它来回答以下问题:)
In [27]: a
Out[27]:
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12],
[13, 14, 15]])
In [28]: strided_axis0(a, L=2)
Out[28]:
array([[ 1, 4, 2, 5, 3, 6],
[ 4, 7, 5, 8, 6, 9],
[ 7, 10, 8, 11, 9, 12],
[10, 13, 11, 14, 12, 15]])