Python-给定长度的所有子序列对与给定数组的欧氏距离
假设我有一个numpy数组[5,7,2,3,4,6],我选择子序列的长度为3 我想得到这类子序列的欧几里德距离 可能的子序列是:Python-给定长度的所有子序列对与给定数组的欧氏距离,python,arrays,numpy,Python,Arrays,Numpy,假设我有一个numpy数组[5,7,2,3,4,6],我选择子序列的长度为3 我想得到这类子序列的欧几里德距离 可能的子序列是: [5,7,2] [7,2,3] [2,3,4] [3,4,6] 子序列1之间的距离。三,。将计算为(5-2)^2+(7-3)^2+(2-4)^2。我想对所有子序列对都这样做 有没有办法避免循环 我的实际数组相当长,因此解决方案也应该是内存有效的 编辑> 详细说明:我有一个大小为10^5到10^8个元素的时间序列 时间序列正在增长。每次添加新点时,我需要获取L个最新点,
你认为这对numpy有效吗?有没有一种简单的方法来实现它?假设
A
作为输入数组,L
作为子序列的长度,您可以使用并使用A
的滑动2D数组版本,如下所示-
# Get sliding 2D array version of input array
A2D = A[np.arange(A.size-L+1)[:,None] + np.arange(L)]
# Get pairwise distances with pdist
pairwise_dist = pdist(A2D,'sqeuclidean')
idx1,idx2 = np.triu_indices(A2D.shape[0],1)
ID_dist = np.column_stack((idx1,idx2,pairwise_dist))
# Get pairiwise IDs
idx1,idx2 = np.triu_indices(A.size-L+1,1)
# Store range array for L as would be used frequently in loop
R = np.arange(L)
# Initialize output array and start computing
pairwise_dist = np.empty(len(idx1))
for i in range(len(idx1)):
pairwise_dist[i] = ((A[R+idx2[i]] - A[R+idx1[i]])**2).sum()
diffs = A[R+idx2[i]] - A[R+idx1[i]]
pairwise_dist[i] = np.einsum('i,i->',diffs,diffs)
请注意,如果您指的是欧几里德距离,则需要将'sqeuclidean'
替换为'euclidean'
,或者省略该参数,因为它是默认参数
样本运行-
In [209]: # Inputs
...: A = np.array([5,7,2,3,4,6])
...: L = 3
...:
In [210]: A2D = A[np.arange(A.size-L+1)[:,None] + np.arange(L)]
In [211]: A2D
Out[211]:
array([[5, 7, 2],
[7, 2, 3],
[2, 3, 4],
[3, 4, 6]])
In [212]: pdist(A2D,'sqeuclidean')
Out[212]: array([ 30., 29., 29., 27., 29., 6.])
# [1] element (= 29) is (5-2)^2 + (7-3)^2 + (2-4)^2
In [201]: idx1,idx2
Out[201]: (array([0, 0, 0, 1, 1, 2]), array([1, 2, 3, 2, 3, 3]))
In [202]: np.column_stack((idx1,idx2,pairwise_dist))
Out[202]:
array([[ 0., 1., 30.],
[ 0., 2., 29.], # This was your (5-2)^2 + (7-3)^2 + (2-4)^2
[ 0., 3., 29.],
[ 1., 2., 27.],
[ 1., 3., 29.],
[ 2., 3., 6.]])
要获取相应的绑定ID,可以使用-
# Get sliding 2D array version of input array
A2D = A[np.arange(A.size-L+1)[:,None] + np.arange(L)]
# Get pairwise distances with pdist
pairwise_dist = pdist(A2D,'sqeuclidean')
idx1,idx2 = np.triu_indices(A2D.shape[0],1)
ID_dist = np.column_stack((idx1,idx2,pairwise_dist))
# Get pairiwise IDs
idx1,idx2 = np.triu_indices(A.size-L+1,1)
# Store range array for L as would be used frequently in loop
R = np.arange(L)
# Initialize output array and start computing
pairwise_dist = np.empty(len(idx1))
for i in range(len(idx1)):
pairwise_dist[i] = ((A[R+idx2[i]] - A[R+idx1[i]])**2).sum()
diffs = A[R+idx2[i]] - A[R+idx1[i]]
pairwise_dist[i] = np.einsum('i,i->',diffs,diffs)
最后,在距离旁边显示ID,如下所示-
# Get sliding 2D array version of input array
A2D = A[np.arange(A.size-L+1)[:,None] + np.arange(L)]
# Get pairwise distances with pdist
pairwise_dist = pdist(A2D,'sqeuclidean')
idx1,idx2 = np.triu_indices(A2D.shape[0],1)
ID_dist = np.column_stack((idx1,idx2,pairwise_dist))
# Get pairiwise IDs
idx1,idx2 = np.triu_indices(A.size-L+1,1)
# Store range array for L as would be used frequently in loop
R = np.arange(L)
# Initialize output array and start computing
pairwise_dist = np.empty(len(idx1))
for i in range(len(idx1)):
pairwise_dist[i] = ((A[R+idx2[i]] - A[R+idx1[i]])**2).sum()
diffs = A[R+idx2[i]] - A[R+idx1[i]]
pairwise_dist[i] = np.einsum('i,i->',diffs,diffs)
样本运行-
In [209]: # Inputs
...: A = np.array([5,7,2,3,4,6])
...: L = 3
...:
In [210]: A2D = A[np.arange(A.size-L+1)[:,None] + np.arange(L)]
In [211]: A2D
Out[211]:
array([[5, 7, 2],
[7, 2, 3],
[2, 3, 4],
[3, 4, 6]])
In [212]: pdist(A2D,'sqeuclidean')
Out[212]: array([ 30., 29., 29., 27., 29., 6.])
# [1] element (= 29) is (5-2)^2 + (7-3)^2 + (2-4)^2
In [201]: idx1,idx2
Out[201]: (array([0, 0, 0, 1, 1, 2]), array([1, 2, 3, 2, 3, 3]))
In [202]: np.column_stack((idx1,idx2,pairwise_dist))
Out[202]:
array([[ 0., 1., 30.],
[ 0., 2., 29.], # This was your (5-2)^2 + (7-3)^2 + (2-4)^2
[ 0., 3., 29.],
[ 1., 2., 27.],
[ 1., 3., 29.],
[ 2., 3., 6.]])
对于某些情况,当您在
A
中处理数以百万计的元素,并且L
以数百计时,最好对循环中此类子序列的每个成对微分执行计算,如下所示-
# Get sliding 2D array version of input array
A2D = A[np.arange(A.size-L+1)[:,None] + np.arange(L)]
# Get pairwise distances with pdist
pairwise_dist = pdist(A2D,'sqeuclidean')
idx1,idx2 = np.triu_indices(A2D.shape[0],1)
ID_dist = np.column_stack((idx1,idx2,pairwise_dist))
# Get pairiwise IDs
idx1,idx2 = np.triu_indices(A.size-L+1,1)
# Store range array for L as would be used frequently in loop
R = np.arange(L)
# Initialize output array and start computing
pairwise_dist = np.empty(len(idx1))
for i in range(len(idx1)):
pairwise_dist[i] = ((A[R+idx2[i]] - A[R+idx1[i]])**2).sum()
diffs = A[R+idx2[i]] - A[R+idx1[i]]
pairwise_dist[i] = np.einsum('i,i->',diffs,diffs)
您还可以使用np.einsum
在每次迭代中获得平方和,如下所示-
# Get sliding 2D array version of input array
A2D = A[np.arange(A.size-L+1)[:,None] + np.arange(L)]
# Get pairwise distances with pdist
pairwise_dist = pdist(A2D,'sqeuclidean')
idx1,idx2 = np.triu_indices(A2D.shape[0],1)
ID_dist = np.column_stack((idx1,idx2,pairwise_dist))
# Get pairiwise IDs
idx1,idx2 = np.triu_indices(A.size-L+1,1)
# Store range array for L as would be used frequently in loop
R = np.arange(L)
# Initialize output array and start computing
pairwise_dist = np.empty(len(idx1))
for i in range(len(idx1)):
pairwise_dist[i] = ((A[R+idx2[i]] - A[R+idx1[i]])**2).sum()
diffs = A[R+idx2[i]] - A[R+idx1[i]]
pairwise_dist[i] = np.einsum('i,i->',diffs,diffs)
假设
A
作为输入数组,L
作为子序列的长度,您可以使用并使用A
的滑动2D数组版本,如下所示-
# Get sliding 2D array version of input array
A2D = A[np.arange(A.size-L+1)[:,None] + np.arange(L)]
# Get pairwise distances with pdist
pairwise_dist = pdist(A2D,'sqeuclidean')
idx1,idx2 = np.triu_indices(A2D.shape[0],1)
ID_dist = np.column_stack((idx1,idx2,pairwise_dist))
# Get pairiwise IDs
idx1,idx2 = np.triu_indices(A.size-L+1,1)
# Store range array for L as would be used frequently in loop
R = np.arange(L)
# Initialize output array and start computing
pairwise_dist = np.empty(len(idx1))
for i in range(len(idx1)):
pairwise_dist[i] = ((A[R+idx2[i]] - A[R+idx1[i]])**2).sum()
diffs = A[R+idx2[i]] - A[R+idx1[i]]
pairwise_dist[i] = np.einsum('i,i->',diffs,diffs)
请注意,如果您指的是欧几里德距离,则需要将'sqeuclidean'
替换为'euclidean'
,或者省略该参数,因为它是默认参数
样本运行-
In [209]: # Inputs
...: A = np.array([5,7,2,3,4,6])
...: L = 3
...:
In [210]: A2D = A[np.arange(A.size-L+1)[:,None] + np.arange(L)]
In [211]: A2D
Out[211]:
array([[5, 7, 2],
[7, 2, 3],
[2, 3, 4],
[3, 4, 6]])
In [212]: pdist(A2D,'sqeuclidean')
Out[212]: array([ 30., 29., 29., 27., 29., 6.])
# [1] element (= 29) is (5-2)^2 + (7-3)^2 + (2-4)^2
In [201]: idx1,idx2
Out[201]: (array([0, 0, 0, 1, 1, 2]), array([1, 2, 3, 2, 3, 3]))
In [202]: np.column_stack((idx1,idx2,pairwise_dist))
Out[202]:
array([[ 0., 1., 30.],
[ 0., 2., 29.], # This was your (5-2)^2 + (7-3)^2 + (2-4)^2
[ 0., 3., 29.],
[ 1., 2., 27.],
[ 1., 3., 29.],
[ 2., 3., 6.]])
要获取相应的绑定ID,可以使用-
# Get sliding 2D array version of input array
A2D = A[np.arange(A.size-L+1)[:,None] + np.arange(L)]
# Get pairwise distances with pdist
pairwise_dist = pdist(A2D,'sqeuclidean')
idx1,idx2 = np.triu_indices(A2D.shape[0],1)
ID_dist = np.column_stack((idx1,idx2,pairwise_dist))
# Get pairiwise IDs
idx1,idx2 = np.triu_indices(A.size-L+1,1)
# Store range array for L as would be used frequently in loop
R = np.arange(L)
# Initialize output array and start computing
pairwise_dist = np.empty(len(idx1))
for i in range(len(idx1)):
pairwise_dist[i] = ((A[R+idx2[i]] - A[R+idx1[i]])**2).sum()
diffs = A[R+idx2[i]] - A[R+idx1[i]]
pairwise_dist[i] = np.einsum('i,i->',diffs,diffs)
最后,在距离旁边显示ID,如下所示-
# Get sliding 2D array version of input array
A2D = A[np.arange(A.size-L+1)[:,None] + np.arange(L)]
# Get pairwise distances with pdist
pairwise_dist = pdist(A2D,'sqeuclidean')
idx1,idx2 = np.triu_indices(A2D.shape[0],1)
ID_dist = np.column_stack((idx1,idx2,pairwise_dist))
# Get pairiwise IDs
idx1,idx2 = np.triu_indices(A.size-L+1,1)
# Store range array for L as would be used frequently in loop
R = np.arange(L)
# Initialize output array and start computing
pairwise_dist = np.empty(len(idx1))
for i in range(len(idx1)):
pairwise_dist[i] = ((A[R+idx2[i]] - A[R+idx1[i]])**2).sum()
diffs = A[R+idx2[i]] - A[R+idx1[i]]
pairwise_dist[i] = np.einsum('i,i->',diffs,diffs)
样本运行-
In [209]: # Inputs
...: A = np.array([5,7,2,3,4,6])
...: L = 3
...:
In [210]: A2D = A[np.arange(A.size-L+1)[:,None] + np.arange(L)]
In [211]: A2D
Out[211]:
array([[5, 7, 2],
[7, 2, 3],
[2, 3, 4],
[3, 4, 6]])
In [212]: pdist(A2D,'sqeuclidean')
Out[212]: array([ 30., 29., 29., 27., 29., 6.])
# [1] element (= 29) is (5-2)^2 + (7-3)^2 + (2-4)^2
In [201]: idx1,idx2
Out[201]: (array([0, 0, 0, 1, 1, 2]), array([1, 2, 3, 2, 3, 3]))
In [202]: np.column_stack((idx1,idx2,pairwise_dist))
Out[202]:
array([[ 0., 1., 30.],
[ 0., 2., 29.], # This was your (5-2)^2 + (7-3)^2 + (2-4)^2
[ 0., 3., 29.],
[ 1., 2., 27.],
[ 1., 3., 29.],
[ 2., 3., 6.]])
对于某些情况,当您在
A
中处理数以百万计的元素,并且L
以数百计时,最好对循环中此类子序列的每个成对微分执行计算,如下所示-
# Get sliding 2D array version of input array
A2D = A[np.arange(A.size-L+1)[:,None] + np.arange(L)]
# Get pairwise distances with pdist
pairwise_dist = pdist(A2D,'sqeuclidean')
idx1,idx2 = np.triu_indices(A2D.shape[0],1)
ID_dist = np.column_stack((idx1,idx2,pairwise_dist))
# Get pairiwise IDs
idx1,idx2 = np.triu_indices(A.size-L+1,1)
# Store range array for L as would be used frequently in loop
R = np.arange(L)
# Initialize output array and start computing
pairwise_dist = np.empty(len(idx1))
for i in range(len(idx1)):
pairwise_dist[i] = ((A[R+idx2[i]] - A[R+idx1[i]])**2).sum()
diffs = A[R+idx2[i]] - A[R+idx1[i]]
pairwise_dist[i] = np.einsum('i,i->',diffs,diffs)
您还可以使用np.einsum
在每次迭代中获得平方和,如下所示-
# Get sliding 2D array version of input array
A2D = A[np.arange(A.size-L+1)[:,None] + np.arange(L)]
# Get pairwise distances with pdist
pairwise_dist = pdist(A2D,'sqeuclidean')
idx1,idx2 = np.triu_indices(A2D.shape[0],1)
ID_dist = np.column_stack((idx1,idx2,pairwise_dist))
# Get pairiwise IDs
idx1,idx2 = np.triu_indices(A.size-L+1,1)
# Store range array for L as would be used frequently in loop
R = np.arange(L)
# Initialize output array and start computing
pairwise_dist = np.empty(len(idx1))
for i in range(len(idx1)):
pairwise_dist[i] = ((A[R+idx2[i]] - A[R+idx1[i]])**2).sum()
diffs = A[R+idx2[i]] - A[R+idx1[i]]
pairwise_dist[i] = np.einsum('i,i->',diffs,diffs)
您想要所有可能的子序列对组合,还是给定两个子序列索引的函数将返回距离?我认为您的公式需要包装在
sqrt()
:sqrt((5-2)^2+(7-3)^2+(2-4)^2)中。对于10^8个元素,我们将0.5x10^16个元素作为输出。我不懂数学,但那需要一个巨大的存储空间。即使我们暂时忘记了所涉及的计算,您确定可以存储它的输出吗?@Divakar如果当时将一个子序列与时间序列其余部分的距离存储在内存中就足够了,然后可以更新以获得下一个子序列。添加了一些图片。是否希望子序列对的所有可能组合,或者给定两个子序列索引的函数将返回距离?我认为您的公式需要包装在sqrt()
:sqrt((5-2)^2+(7-3)^2+(2-4)^2)中。对于10^8个元素,我们将0.5x10^16个元素作为输出。我不懂数学,但那需要一个巨大的存储空间。即使我们暂时忘记了所涉及的计算,您确定可以存储它的输出吗?@Divakar如果当时将一个子序列与时间序列其余部分的距离存储在内存中就足够了,然后可以更新以获得下一个子序列。添加了一些图片。OP询问欧几里德距离,您提供的是平方欧几里德(sqeuclidean
)。否则,看起来是个不错的答案:)@EelkeSpaak将(5-2)^2+(7-3)^2+(2-4)^2
列为示例值之一。所以,我推断