python中numpy的欧氏距离计算_Python_Numpy

python中numpy的欧氏距离计算

python numpy

python中numpy的欧氏距离计算,python,numpy,Python,Numpy,我是Python新手，所以这个问题可能看起来很琐碎。然而，我没有发现与我类似的案例。我有一个20个节点的坐标矩阵。我想计算这个集合中所有节点对之间的欧氏距离，并将它们存储在成对矩阵中。例如，如果我有20个节点，我希望最终结果是（20,20）的矩阵，每个节点对之间的欧氏距离值。我尝试使用for循环遍历坐标集的每个元素，并计算欧几里德距离，如下所示： ncoord=numpy.matrix('3225 318;2387 989;1228 2335;57 1569;2288

我是Python新手，所以这个问题可能看起来很琐碎。然而，我没有发现与我类似的案例。我有一个20个节点的坐标矩阵。我想计算这个集合中所有节点对之间的欧氏距离，并将它们存储在成对矩阵中。例如，如果我有20个节点，我希望最终结果是（20,20）的矩阵，每个节点对之间的欧氏距离值。我尝试使用for循环遍历坐标集的每个元素，并计算欧几里德距离，如下所示：

ncoord=numpy.matrix('3225   318;2387    989;1228    2335;57      1569;2288  8138;3514   2350;7936   314;9888    4683;6901   1834;7515   8231;709   3701;1321    8881;2290   2350;5687   5034;760    9868;2378   7521;9025   5385;4819   5943;2917   9418;3928   9770')
n=20 
c=numpy.zeros((n,n))
for i in range(0,n):
    for j in range(i+1,n):
        c[i][j]=math.sqrt((ncoord[i][0]-ncoord[j][0])**2+(ncoord[i][1]-ncoord[j][1])**2)

然而，我得到一个错误“输入必须是一个正方形数组 “我不知道是否有人知道这里发生了什么。谢谢

注意：

ncoord[i，j]

与Numpy矩阵的

ncoord[i][j]

不同。这似乎是混乱的根源。如果

ncoord

是一个Numpy数组，则它们将给出相同的结果

对于Numpy矩阵，

ncoord[i]

ncoord

的第i行，在本例中，它本身是一个形状为1 x 2的Numpy矩阵对象。因此，

ncoord[i][j]

实际上意味着：取

ncoord

的第i行，并取该1 x 2矩阵的第j行。这就是当

>0时索引问题出现的地方

关于您对分配给

c[i][j]

“工作”的评论，它不应该这样做。至少在我的Numpy 1.9.1版本中，如果索引

和

最多迭代

，它就不应该工作

另一方面，记住将矩阵

的转置添加到自身

建议使用Numpy数组而不是矩阵。看

如果坐标存储为Numpy数组，则成对距离可以计算为：

from scipy.spatial.distance import pdist

pairwise_distances = pdist(ncoord, metric="euclidean", p=2)

或者干脆

pairwise_distances = pdist(ncoord)

因为默认的度量是“欧几里德的”，而默认的“p”是2

在下面的评论中，我错误地提到pdist的结果是一个nxn矩阵。要获得n x n矩阵，需要执行以下操作：

from scipy.spatial.distance import pdist, squareform

pairwise_distances = squareform(pdist(ncoord))

或

我想你想做的是：你说你想要一个20乘20的矩阵。。。但是你编码的是三角形的

因此，我编写了一个完整的20x20矩阵

distances = []
for i in range(len(ncoord)):
    given_i = []
    for j in range(len(ncoord)):
        d_val = math.sqrt((ncoord[i, 0]-ncoord[j,0])**2+(ncoord[i,1]-ncoord[j,1])**2)
        given_i.append(d_val)

    distances.append(given_i)

    # distances[i][j] = distance from i to j

SciPy方式：

from scipy.spatial.distance import cdist
# Isn't scipy nice - can also use pdist... works in the same way but different recall method.
distances = cdist(ncoord, ncoord, 'euclidean')

除了使用嵌套的

for

循环之外，还有很多更快的替代方法。我将向您展示两种不同的方法-第一种方法是一种更通用的方法，它将向您介绍广播和矢量化，第二种方法使用更方便的scipy库函数

1.使用广播和矢量化的一般方法我建议做的第一件事是切换到使用

np.array

，而不是

np.matrix

。数组是首选的，最重要的是，因为它们可以有>2维，并且它们使元素相乘不那么麻烦

import numpy as np

ncoord = np.array(ncoord)

对于数组，我们可以通过插入新的单例维度并在其上进行减法来消除嵌套的

for

循环：

# indexing with None (or np.newaxis) inserts a new dimension of size 1
print(ncoord[:, :, None].shape)
# (20, 2, 1)

# by making the 'inner' dimensions equal to 1, i.e. (20, 2, 1) - (1, 2, 20),
# the subtraction is 'broadcast' over every pair of rows in ncoord
xydiff = ncoord[:, :, None] - ncoord[:, :, None].T

print(xydiff.shape)
# (20, 2, 20)

这相当于使用嵌套for循环在每对行上循环，但速度要快得多

xydiff2 = np.zeros((20, 2, 20), dtype=xydiff.dtype)
for ii in range(20):
    for jj in range(20):
        for kk in range(2):
            xydiff[ii, kk, jj] = ncoords[ii, kk] - ncoords[jj, kk]

# check that these give the same result
print(np.all(xydiff == xydiff2))
# True

剩下的我们也可以使用矢量化操作：

# we square the differences and sum over the 'middle' axis, equivalent to
# computing (x_i - x_j) ** 2 + (y_i - y_j) ** 2
ssdiff = (xydiff * xydiff).sum(1)

# finally we take the square root
D = np.sqrt(ssdiff)

整个过程可以在一行中完成，如下所示：

D = np.sqrt(((ncoord[:, :, None] - ncoord[:, :, None].T) ** 2).sum(1))

2.懒惰的方式，使用

pdist

事实证明，已经有了一个计算所有成对距离的快速方便的函数：

请在您的问题中加入

ncoord

的定义。感谢您提高了问题的参考价值，使其更易于回答！你的名字是什么<代码>用于范围（i+1，n-1）内的j将执行

j=i+1，i+2，…，n-2

。我猜你希望这两个范围都上升到

，而不是

n-1

@MarkG是的，我有20个节点（n=20），我希望两个索引都上升到n。我尝试了n而不是n-1，但我得到了相同的错误。我可以很容易地在MATLAB中编写代码，但我必须使用Python。Python中的索引不同，所以我可能是错的。那么对于范围（0，n）中的I，for循环应该上升到n:

，对于范围（I+1，n）中的j，for循环应该上升到。
如果这不是你的错误，那么你需要显示更多的代码。@MarkG yes这不是我的错误。我的代码就是我在主要问题中提到的。我什么都没有了，我确实做了，但没有放在这里。我的代码的最后一行是：c[j][i]=c[i][j]谢谢，它现在正在工作。但是我现在被误解了。我认为，当我们想在Python中调用矩阵的元素时，我们需要将其称为[][]，但您使用的是[，]。为什么您使用第二种格式从NCORD读取数据，但通过调用c[][]等c元素在c矩阵中节省了距离？非常感谢您提供的完整信息。我将尝试您提到的另一种方法，看看是否可以得到与结果相同的矩阵大小（我假设成对距离是一个n*n矩阵）是成对距离将是一个n x n矩阵，如果n是您的点数。谢谢。但是我仍然不明白NCORD[I，j]和NCORD[I][j]之间的区别，谢谢你的评论。我也会试试你的方法。任何时候你必须在numpy中通过一个数组进行双循环，你首先就失去了numpy提供的速度优势。你想尽可能地广播。但是，对于一些操作，包括这个操作，您不能广播，因为每个步骤的值取决于它们的邻居。在这些情况下，SciPy解决方案通常在c级进行优化（参见cython），因此它们仍然可以更快。我希望cdist函数比双环路快得多。这种广播对我来说很神奇。我怎样才能对它有一些直觉呢？感谢这个令人惊讶的方法，但它仍然比叉积慢得多，而复杂度看起来是一样的。当使用方法1计算大型矩阵（1000*20000）时，我也遇到了一些内存问题，而使用方法2（scipy）时，我没有遇到这些问题。
# we square the differences and sum over the 'middle' axis, equivalent to
# computing (x_i - x_j) ** 2 + (y_i - y_j) ** 2
ssdiff = (xydiff * xydiff).sum(1)

# finally we take the square root
D = np.sqrt(ssdiff)

D = np.sqrt(((ncoord[:, :, None] - ncoord[:, :, None].T) ** 2).sum(1))

from scipy.spatial.distance import pdist, squareform

d = pdist(ncoord)

# pdist just returns the upper triangle of the pairwise distance matrix. to get
# the whole (20, 20) array we can use squareform:

print(d.shape)
# (190,)

D2 = squareform(d)
print(D2.shape)
# (20, 20)

# check that the two methods are equivalent
print np.all(D == D2)
# True