Python 计算两个numpy数组中的欧几里德距离

Python 计算两个numpy数组中的欧几里德距离,python,numpy,euclidean-distance,Python,Numpy,Euclidean Distance,我有两个numpy数组,如下所示 X = np.array([-0.34095692,-0.34044722,-0.27155318,-0.21320583,-0.44657865,-0.19587836, -0.29414279, -0.3948753 ,-0.21655774 , -0.34857087]) Y = np.array([0.16305762,0.38554548, 0.10412536, -0.57981103, 0.17927523, -0.22612216, -0.345

我有两个
numpy
数组,如下所示

X = np.array([-0.34095692,-0.34044722,-0.27155318,-0.21320583,-0.44657865,-0.19587836, -0.29414279, -0.3948753 ,-0.21655774 , -0.34857087])
Y = np.array([0.16305762,0.38554548, 0.10412536, -0.57981103, 0.17927523, -0.22612216, -0.34569697, 0.30463137,0.01301744,-0.42661108])
这是10个用户的x和y坐标。我需要找出每个用户之间的相似性。 例如:

x1 = -0.34095692
y1 = 0.16305762
x2 = -0.34044722
y2 = 0.38554548

Euclidean distance = (|x1-y1|^2 + |x2-y2|^2)^1/2
所以最终我想得到一个矩阵如下:帮助我实现这一点


完成此任务的简短代码段:

A = (X-Y)**2
p, q = np.meshgrid(np.arange(10), np.arange(10))
np.sqrt(A[p]-A[q])
编辑:说明

  • A
    只是一个具有所有平方差的预计算向量
  • 神奇之处在于
    np.meshgrid
    :此函数的目的是生成两个不同数组中的所有值对。这不是最好的解决方案,因为您将获得整个矩阵,但对于您拥有的样本数量来说,这并不是什么大问题。生成的值将对应于
    A
    的索引
  • 指数化部分
    A[p]
    也是一种魔力。你自己试着去理解它的行为
  • 这里的矩阵中充满了
    nan
    ,但这正是您所要求的。真正的欧几里德距离是
    +
    ,而不是
    -
  • 宝洁:

     array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
       [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
       [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
       [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
       [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
       [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
       [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
       [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
       [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
       [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
    
    array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
       [3, 3, 3, 3, 3, 3, 3, 3, 3, 3],
       [4, 4, 4, 4, 4, 4, 4, 4, 4, 4],
       [5, 5, 5, 5, 5, 5, 5, 5, 5, 5],
       [6, 6, 6, 6, 6, 6, 6, 6, 6, 6],
       [7, 7, 7, 7, 7, 7, 7, 7, 7, 7],
       [8, 8, 8, 8, 8, 8, 8, 8, 8, 8],
       [9, 9, 9, 9, 9, 9, 9, 9, 9, 9]]) 
    

    完成此任务的简短代码段:

    A = (X-Y)**2
    p, q = np.meshgrid(np.arange(10), np.arange(10))
    np.sqrt(A[p]-A[q])
    
    编辑:说明

  • A
    只是一个具有所有平方差的预计算向量
  • 神奇之处在于
    np.meshgrid
    :此函数的目的是生成两个不同数组中的所有值对。这不是最好的解决方案,因为您将获得整个矩阵,但对于您拥有的样本数量来说,这并不是什么大问题。生成的值将对应于
    A
    的索引
  • 指数化部分
    A[p]
    也是一种魔力。你自己试着去理解它的行为
  • 这里的矩阵中充满了
    nan
    ,但这正是您所要求的。真正的欧几里德距离是
    +
    ,而不是
    -
  • 宝洁:

     array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
       [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
       [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
       [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
       [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
       [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
       [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
       [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
       [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
       [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
    
    array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
       [3, 3, 3, 3, 3, 3, 3, 3, 3, 3],
       [4, 4, 4, 4, 4, 4, 4, 4, 4, 4],
       [5, 5, 5, 5, 5, 5, 5, 5, 5, 5],
       [6, 6, 6, 6, 6, 6, 6, 6, 6, 6],
       [7, 7, 7, 7, 7, 7, 7, 7, 7, 7],
       [8, 8, 8, 8, 8, 8, 8, 8, 8, 8],
       [9, 9, 9, 9, 9, 9, 9, 9, 9, 9]]) 
    
    使用
    zip(X,Y)
    获取坐标对,如果您想获取点之间的欧几里德距离,它应该是
    (| x1-x2 | ^2+| y1-y2 | ^2)^0.5
    ,而不是
    (| x1-y1 | ^2-| x2-y2 | ^2)^1/2

    In [125]: coords=zip(X, Y)
    
    In [126]: from scipy import spatial
         ...: dists=spatial.distance.cdist(coords, coords)
    
    In [127]: dists
    Out[127]: 
    array([[ 0.        ,  0.22248844,  0.09104884,  0.75377329,  0.10685954,
             0.41534165,  0.5109039 ,  0.15149362,  0.19490308,  0.58971785],
           [ 0.22248844,  0.        ,  0.28973034,  0.9737061 ,  0.23197262,
             0.62852005,  0.73270705,  0.09751671,  0.39258852,  0.81219719],
           [ 0.09104884,  0.28973034,  0.        ,  0.68642072,  0.19047682,
             0.33880688,  0.45038919,  0.23539542,  0.1064197 ,  0.53629553],
           [ 0.75377329,  0.9737061 ,  0.68642072,  0.        ,  0.79415038,
             0.35411306,  0.24770988,  0.90290761,  0.59283795,  0.20443561],
           [ 0.10685954,  0.23197262,  0.19047682,  0.79415038,  0.        ,
             0.47665258,  0.54665574,  0.13560014,  0.28381556,  0.61376196],
           [ 0.41534165,  0.62852005,  0.33880688,  0.35411306,  0.47665258,
             0.        ,  0.15477091,  0.56683251,  0.24003205,  0.25201351],
           [ 0.5109039 ,  0.73270705,  0.45038919,  0.24770988,  0.54665574,
             0.15477091,  0.        ,  0.65808357,  0.36700881,  0.09751671],
           [ 0.15149362,  0.09751671,  0.23539542,  0.90290761,  0.13560014,
             0.56683251,  0.65808357,  0.        ,  0.34181257,  0.73270705],
           [ 0.19490308,  0.39258852,  0.1064197 ,  0.59283795,  0.28381556,
             0.24003205,  0.36700881,  0.34181257,  0.        ,  0.45902146],
           [ 0.58971785,  0.81219719,  0.53629553,  0.20443561,  0.61376196,
             0.25201351,  0.09751671,  0.73270705,  0.45902146,  0.        ]])
    
    要获取此数组的上三角形,请使用
    numpy.triu

    In [128]: np.triu(dists)
    Out[128]: 
    array([[ 0.        ,  0.22248844,  0.09104884,  0.75377329,  0.10685954,
             0.41534165,  0.5109039 ,  0.15149362,  0.19490308,  0.58971785],
           [ 0.        ,  0.        ,  0.28973034,  0.9737061 ,  0.23197262,
             0.62852005,  0.73270705,  0.09751671,  0.39258852,  0.81219719],
           [ 0.        ,  0.        ,  0.        ,  0.68642072,  0.19047682,
             0.33880688,  0.45038919,  0.23539542,  0.1064197 ,  0.53629553],
           [ 0.        ,  0.        ,  0.        ,  0.        ,  0.79415038,
             0.35411306,  0.24770988,  0.90290761,  0.59283795,  0.20443561],
           [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
             0.47665258,  0.54665574,  0.13560014,  0.28381556,  0.61376196],
           [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
             0.        ,  0.15477091,  0.56683251,  0.24003205,  0.25201351],
           [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
             0.        ,  0.        ,  0.65808357,  0.36700881,  0.09751671],
           [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
             0.        ,  0.        ,  0.        ,  0.34181257,  0.73270705],
           [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
             0.        ,  0.        ,  0.        ,  0.        ,  0.45902146],
           [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
             0.        ,  0.        ,  0.        ,  0.        ,  0.        ]])
    
    使用
    zip(X,Y)
    获取坐标对,如果您想获取点之间的欧几里德距离,它应该是
    (| x1-x2 | ^2+| y1-y2 | ^2)^0.5
    ,而不是
    (| x1-y1 | ^2-| x2-y2 | ^2)^1/2

    In [125]: coords=zip(X, Y)
    
    In [126]: from scipy import spatial
         ...: dists=spatial.distance.cdist(coords, coords)
    
    In [127]: dists
    Out[127]: 
    array([[ 0.        ,  0.22248844,  0.09104884,  0.75377329,  0.10685954,
             0.41534165,  0.5109039 ,  0.15149362,  0.19490308,  0.58971785],
           [ 0.22248844,  0.        ,  0.28973034,  0.9737061 ,  0.23197262,
             0.62852005,  0.73270705,  0.09751671,  0.39258852,  0.81219719],
           [ 0.09104884,  0.28973034,  0.        ,  0.68642072,  0.19047682,
             0.33880688,  0.45038919,  0.23539542,  0.1064197 ,  0.53629553],
           [ 0.75377329,  0.9737061 ,  0.68642072,  0.        ,  0.79415038,
             0.35411306,  0.24770988,  0.90290761,  0.59283795,  0.20443561],
           [ 0.10685954,  0.23197262,  0.19047682,  0.79415038,  0.        ,
             0.47665258,  0.54665574,  0.13560014,  0.28381556,  0.61376196],
           [ 0.41534165,  0.62852005,  0.33880688,  0.35411306,  0.47665258,
             0.        ,  0.15477091,  0.56683251,  0.24003205,  0.25201351],
           [ 0.5109039 ,  0.73270705,  0.45038919,  0.24770988,  0.54665574,
             0.15477091,  0.        ,  0.65808357,  0.36700881,  0.09751671],
           [ 0.15149362,  0.09751671,  0.23539542,  0.90290761,  0.13560014,
             0.56683251,  0.65808357,  0.        ,  0.34181257,  0.73270705],
           [ 0.19490308,  0.39258852,  0.1064197 ,  0.59283795,  0.28381556,
             0.24003205,  0.36700881,  0.34181257,  0.        ,  0.45902146],
           [ 0.58971785,  0.81219719,  0.53629553,  0.20443561,  0.61376196,
             0.25201351,  0.09751671,  0.73270705,  0.45902146,  0.        ]])
    
    要获取此数组的上三角形,请使用
    numpy.triu

    In [128]: np.triu(dists)
    Out[128]: 
    array([[ 0.        ,  0.22248844,  0.09104884,  0.75377329,  0.10685954,
             0.41534165,  0.5109039 ,  0.15149362,  0.19490308,  0.58971785],
           [ 0.        ,  0.        ,  0.28973034,  0.9737061 ,  0.23197262,
             0.62852005,  0.73270705,  0.09751671,  0.39258852,  0.81219719],
           [ 0.        ,  0.        ,  0.        ,  0.68642072,  0.19047682,
             0.33880688,  0.45038919,  0.23539542,  0.1064197 ,  0.53629553],
           [ 0.        ,  0.        ,  0.        ,  0.        ,  0.79415038,
             0.35411306,  0.24770988,  0.90290761,  0.59283795,  0.20443561],
           [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
             0.47665258,  0.54665574,  0.13560014,  0.28381556,  0.61376196],
           [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
             0.        ,  0.15477091,  0.56683251,  0.24003205,  0.25201351],
           [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
             0.        ,  0.        ,  0.65808357,  0.36700881,  0.09751671],
           [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
             0.        ,  0.        ,  0.        ,  0.34181257,  0.73270705],
           [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
             0.        ,  0.        ,  0.        ,  0.        ,  0.45902146],
           [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
             0.        ,  0.        ,  0.        ,  0.        ,  0.        ]])
    


    听起来不错。问题是什么?@Jonathon Reinhart:我不知道该怎么开始?有什么帮助吗?叹气,你有没有考虑过?或者,如果你愿意的话,SciPy有一个函数可以处理所有与距离相关的问题:你是说
    (| x1-x2 | ^2+| y1-y2 | ^2)^0.5
    而不是
    (| x1-y1 | ^2-| x2-y2 | ^2)^1/2
    ?听起来不错。问题是什么?@Jonathon Reinhart:我不知道该怎么开始?有什么帮助吗?叹气,你有没有考虑过?或者,如果您愿意,SciPy有一个函数可以处理所有与距离相关的问题:您是说
    (| x1-x2 | ^2+| y1-y2 | ^2)^0.5
    而不是
    (| x1-y1 | ^2-| x2-y2 | ^2)^1/2
    ?这很好!我还没有检查这个的准确性。你能解释一下吗。有很多“南”对吗?非常感谢你详细的回答。是的,这应该是+我现在在问题中更新了。最后一个我不明白的问题是,所有这些“nan”是什么意思?(它们是更接近还是更分离,或者是什么?)差异可能是负数,
    sqrt
    将使负数成为
    nan
    。如果使用正确的公式,您就不会因为您的帮助而得到这些
    nan
    sThanks。这很好!我还没有检查这个的准确性。你能解释一下吗。有很多“南”对吗?非常感谢你详细的回答。是的,这应该是+我现在在问题中更新了。最后一个我不明白的问题是,所有这些“nan”是什么意思?(它们是更接近还是更分离,或者是什么?)差异可能是负数,
    sqrt
    将使负数成为
    nan
    。如果使用正确的公式,您将不会得到这些
    nan
    s感谢您的帮助。非常感谢!终于找到了。再次非常感谢。:)@NilaniAlgiriyage很乐意帮忙,np;)非常感谢你!终于找到了。再次非常感谢。:)@NilaniAlgiriyage很乐意帮忙,np;)