Python 计算两个numpy数组中的欧几里德距离
我有两个Python 计算两个numpy数组中的欧几里德距离,python,numpy,euclidean-distance,Python,Numpy,Euclidean Distance,我有两个numpy数组,如下所示 X = np.array([-0.34095692,-0.34044722,-0.27155318,-0.21320583,-0.44657865,-0.19587836, -0.29414279, -0.3948753 ,-0.21655774 , -0.34857087]) Y = np.array([0.16305762,0.38554548, 0.10412536, -0.57981103, 0.17927523, -0.22612216, -0.345
numpy
数组,如下所示
X = np.array([-0.34095692,-0.34044722,-0.27155318,-0.21320583,-0.44657865,-0.19587836, -0.29414279, -0.3948753 ,-0.21655774 , -0.34857087])
Y = np.array([0.16305762,0.38554548, 0.10412536, -0.57981103, 0.17927523, -0.22612216, -0.34569697, 0.30463137,0.01301744,-0.42661108])
这是10个用户的x和y坐标。我需要找出每个用户之间的相似性。
例如:
x1 = -0.34095692
y1 = 0.16305762
x2 = -0.34044722
y2 = 0.38554548
Euclidean distance = (|x1-y1|^2 + |x2-y2|^2)^1/2
所以最终我想得到一个矩阵如下:帮助我实现这一点
完成此任务的简短代码段:
A = (X-Y)**2
p, q = np.meshgrid(np.arange(10), np.arange(10))
np.sqrt(A[p]-A[q])
编辑:说明
A
只是一个具有所有平方差的预计算向量np.meshgrid
:此函数的目的是生成两个不同数组中的所有值对。这不是最好的解决方案,因为您将获得整个矩阵,但对于您拥有的样本数量来说,这并不是什么大问题。生成的值将对应于A
的索引A[p]
也是一种魔力。你自己试着去理解它的行为nan
,但这正是您所要求的。真正的欧几里德距离是+
,而不是-
array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3, 3, 3, 3, 3],
[4, 4, 4, 4, 4, 4, 4, 4, 4, 4],
[5, 5, 5, 5, 5, 5, 5, 5, 5, 5],
[6, 6, 6, 6, 6, 6, 6, 6, 6, 6],
[7, 7, 7, 7, 7, 7, 7, 7, 7, 7],
[8, 8, 8, 8, 8, 8, 8, 8, 8, 8],
[9, 9, 9, 9, 9, 9, 9, 9, 9, 9]])
完成此任务的简短代码段:
A = (X-Y)**2
p, q = np.meshgrid(np.arange(10), np.arange(10))
np.sqrt(A[p]-A[q])
编辑:说明
A
只是一个具有所有平方差的预计算向量np.meshgrid
:此函数的目的是生成两个不同数组中的所有值对。这不是最好的解决方案,因为您将获得整个矩阵,但对于您拥有的样本数量来说,这并不是什么大问题。生成的值将对应于A
的索引A[p]
也是一种魔力。你自己试着去理解它的行为nan
,但这正是您所要求的。真正的欧几里德距离是+
,而不是-
array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3, 3, 3, 3, 3],
[4, 4, 4, 4, 4, 4, 4, 4, 4, 4],
[5, 5, 5, 5, 5, 5, 5, 5, 5, 5],
[6, 6, 6, 6, 6, 6, 6, 6, 6, 6],
[7, 7, 7, 7, 7, 7, 7, 7, 7, 7],
[8, 8, 8, 8, 8, 8, 8, 8, 8, 8],
[9, 9, 9, 9, 9, 9, 9, 9, 9, 9]])
使用zip(X,Y)
获取坐标对,如果您想获取点之间的欧几里德距离,它应该是(| x1-x2 | ^2+| y1-y2 | ^2)^0.5
,而不是(| x1-y1 | ^2-| x2-y2 | ^2)^1/2
:
In [125]: coords=zip(X, Y)
In [126]: from scipy import spatial
...: dists=spatial.distance.cdist(coords, coords)
In [127]: dists
Out[127]:
array([[ 0. , 0.22248844, 0.09104884, 0.75377329, 0.10685954,
0.41534165, 0.5109039 , 0.15149362, 0.19490308, 0.58971785],
[ 0.22248844, 0. , 0.28973034, 0.9737061 , 0.23197262,
0.62852005, 0.73270705, 0.09751671, 0.39258852, 0.81219719],
[ 0.09104884, 0.28973034, 0. , 0.68642072, 0.19047682,
0.33880688, 0.45038919, 0.23539542, 0.1064197 , 0.53629553],
[ 0.75377329, 0.9737061 , 0.68642072, 0. , 0.79415038,
0.35411306, 0.24770988, 0.90290761, 0.59283795, 0.20443561],
[ 0.10685954, 0.23197262, 0.19047682, 0.79415038, 0. ,
0.47665258, 0.54665574, 0.13560014, 0.28381556, 0.61376196],
[ 0.41534165, 0.62852005, 0.33880688, 0.35411306, 0.47665258,
0. , 0.15477091, 0.56683251, 0.24003205, 0.25201351],
[ 0.5109039 , 0.73270705, 0.45038919, 0.24770988, 0.54665574,
0.15477091, 0. , 0.65808357, 0.36700881, 0.09751671],
[ 0.15149362, 0.09751671, 0.23539542, 0.90290761, 0.13560014,
0.56683251, 0.65808357, 0. , 0.34181257, 0.73270705],
[ 0.19490308, 0.39258852, 0.1064197 , 0.59283795, 0.28381556,
0.24003205, 0.36700881, 0.34181257, 0. , 0.45902146],
[ 0.58971785, 0.81219719, 0.53629553, 0.20443561, 0.61376196,
0.25201351, 0.09751671, 0.73270705, 0.45902146, 0. ]])
要获取此数组的上三角形,请使用numpy.triu
:
In [128]: np.triu(dists)
Out[128]:
array([[ 0. , 0.22248844, 0.09104884, 0.75377329, 0.10685954,
0.41534165, 0.5109039 , 0.15149362, 0.19490308, 0.58971785],
[ 0. , 0. , 0.28973034, 0.9737061 , 0.23197262,
0.62852005, 0.73270705, 0.09751671, 0.39258852, 0.81219719],
[ 0. , 0. , 0. , 0.68642072, 0.19047682,
0.33880688, 0.45038919, 0.23539542, 0.1064197 , 0.53629553],
[ 0. , 0. , 0. , 0. , 0.79415038,
0.35411306, 0.24770988, 0.90290761, 0.59283795, 0.20443561],
[ 0. , 0. , 0. , 0. , 0. ,
0.47665258, 0.54665574, 0.13560014, 0.28381556, 0.61376196],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0.15477091, 0.56683251, 0.24003205, 0.25201351],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0.65808357, 0.36700881, 0.09751671],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0.34181257, 0.73270705],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0.45902146],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ]])
使用zip(X,Y)
获取坐标对,如果您想获取点之间的欧几里德距离,它应该是(| x1-x2 | ^2+| y1-y2 | ^2)^0.5
,而不是(| x1-y1 | ^2-| x2-y2 | ^2)^1/2
:
In [125]: coords=zip(X, Y)
In [126]: from scipy import spatial
...: dists=spatial.distance.cdist(coords, coords)
In [127]: dists
Out[127]:
array([[ 0. , 0.22248844, 0.09104884, 0.75377329, 0.10685954,
0.41534165, 0.5109039 , 0.15149362, 0.19490308, 0.58971785],
[ 0.22248844, 0. , 0.28973034, 0.9737061 , 0.23197262,
0.62852005, 0.73270705, 0.09751671, 0.39258852, 0.81219719],
[ 0.09104884, 0.28973034, 0. , 0.68642072, 0.19047682,
0.33880688, 0.45038919, 0.23539542, 0.1064197 , 0.53629553],
[ 0.75377329, 0.9737061 , 0.68642072, 0. , 0.79415038,
0.35411306, 0.24770988, 0.90290761, 0.59283795, 0.20443561],
[ 0.10685954, 0.23197262, 0.19047682, 0.79415038, 0. ,
0.47665258, 0.54665574, 0.13560014, 0.28381556, 0.61376196],
[ 0.41534165, 0.62852005, 0.33880688, 0.35411306, 0.47665258,
0. , 0.15477091, 0.56683251, 0.24003205, 0.25201351],
[ 0.5109039 , 0.73270705, 0.45038919, 0.24770988, 0.54665574,
0.15477091, 0. , 0.65808357, 0.36700881, 0.09751671],
[ 0.15149362, 0.09751671, 0.23539542, 0.90290761, 0.13560014,
0.56683251, 0.65808357, 0. , 0.34181257, 0.73270705],
[ 0.19490308, 0.39258852, 0.1064197 , 0.59283795, 0.28381556,
0.24003205, 0.36700881, 0.34181257, 0. , 0.45902146],
[ 0.58971785, 0.81219719, 0.53629553, 0.20443561, 0.61376196,
0.25201351, 0.09751671, 0.73270705, 0.45902146, 0. ]])
要获取此数组的上三角形,请使用numpy.triu
:
In [128]: np.triu(dists)
Out[128]:
array([[ 0. , 0.22248844, 0.09104884, 0.75377329, 0.10685954,
0.41534165, 0.5109039 , 0.15149362, 0.19490308, 0.58971785],
[ 0. , 0. , 0.28973034, 0.9737061 , 0.23197262,
0.62852005, 0.73270705, 0.09751671, 0.39258852, 0.81219719],
[ 0. , 0. , 0. , 0.68642072, 0.19047682,
0.33880688, 0.45038919, 0.23539542, 0.1064197 , 0.53629553],
[ 0. , 0. , 0. , 0. , 0.79415038,
0.35411306, 0.24770988, 0.90290761, 0.59283795, 0.20443561],
[ 0. , 0. , 0. , 0. , 0. ,
0.47665258, 0.54665574, 0.13560014, 0.28381556, 0.61376196],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0.15477091, 0.56683251, 0.24003205, 0.25201351],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0.65808357, 0.36700881, 0.09751671],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0.34181257, 0.73270705],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0.45902146],
[ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ]])
听起来不错。问题是什么?@Jonathon Reinhart:我不知道该怎么开始?有什么帮助吗?叹气,你有没有考虑过?或者,如果你愿意的话,SciPy有一个函数可以处理所有与距离相关的问题:你是说
(| x1-x2 | ^2+| y1-y2 | ^2)^0.5
而不是(| x1-y1 | ^2-| x2-y2 | ^2)^1/2
?听起来不错。问题是什么?@Jonathon Reinhart:我不知道该怎么开始?有什么帮助吗?叹气,你有没有考虑过?或者,如果您愿意,SciPy有一个函数可以处理所有与距离相关的问题:您是说(| x1-x2 | ^2+| y1-y2 | ^2)^0.5
而不是(| x1-y1 | ^2-| x2-y2 | ^2)^1/2
?这很好!我还没有检查这个的准确性。你能解释一下吗。有很多“南”对吗?非常感谢你详细的回答。是的,这应该是+我现在在问题中更新了。最后一个我不明白的问题是,所有这些“nan”是什么意思?(它们是更接近还是更分离,或者是什么?)差异可能是负数,sqrt
将使负数成为nan
。如果使用正确的公式,您就不会因为您的帮助而得到这些nan
sThanks。这很好!我还没有检查这个的准确性。你能解释一下吗。有很多“南”对吗?非常感谢你详细的回答。是的,这应该是+我现在在问题中更新了。最后一个我不明白的问题是,所有这些“nan”是什么意思?(它们是更接近还是更分离,或者是什么?)差异可能是负数,sqrt
将使负数成为nan
。如果使用正确的公式,您将不会得到这些nan
s感谢您的帮助。非常感谢!终于找到了。再次非常感谢。:)@NilaniAlgiriyage很乐意帮忙,np;)非常感谢你!终于找到了。再次非常感谢。:)@NilaniAlgiriyage很乐意帮忙,np;)