Python 2.7 在大型numpy三维阵列上计算二维成对距离_Python 2.7_Numpy_Scipy_Pdist

Python 2.7 在大型numpy三维阵列上计算二维成对距离

python-2.7 numpy

Python 2.7 在大型numpy三维阵列上计算二维成对距离,python-2.7,numpy,scipy,pdist,Python 2.7,Numpy,Scipy,Pdist,我有一个300万点的numpy数组，形式为[pt_id，x，y，z]。目标是返回欧几里德距离为两个数字的所有点对min\u d和max\u d 欧几里德距离在x和y之间，而不是在z上。但是，我想保留具有pt\u id\u from，pt\u id\u to，距离属性的数组我使用scipy的dist来计算距离： import scipy.spatial.distance coords_arr = np.array([['pt1', 2452130.000, 7278106.000, 25.000

我有一个300万点的numpy数组，形式为

[pt_id，x，y，z]

。目标是返回欧几里德距离为两个数字的所有点对

min\u d

和

max\u d

欧几里德距离在

和

之间，而不是在

上。但是，我想保留具有

pt\u id\u from

，

pt\u id\u to

，

距离

属性的数组

我使用scipy的dist来计算距离：

import scipy.spatial.distance
coords_arr = np.array([['pt1', 2452130.000, 7278106.000, 25.000],
                       ['pt2', 2479539.000, 7287455.000, 4.900],
                       ['pt3', 2479626.000, 7287458.000, 10.000],
                       ['pt4', 2484097.000, 7292784.000, 8.800],
                       ['pt5', 2484106.000, 7293079.000, 7.300],
                       ['pt6', 2484095.000, 7292891.000, 11.100]])

dists = scipy.spatial.distance.pdist(coords_arr[:,1:3], 'euclidean')
np.savetxt('test.out', scipy.spatial.distance.squareform(dists), delimiter=',')

如何返回一个数组：

[pt\u id\u from，pt\u id\u to，distance]

？

您可以使用

np。其中

获取一个范围内的距离坐标，然后以您的格式生成一个新列表，过滤相同的对。像这样：

>>> import scipy.spatial.distance
>>> import numpy as np
>>> coords_arr = np.array([['pt1', 2452130.000, 7278106.000, 25.000],
...                        ['pt2', 2479539.000, 7287455.000, 4.900],
...                        ['pt3', 2479626.000, 7287458.000, 10.000],
...                        ['pt4', 2484097.000, 7292784.000, 8.800],
...                        ['pt5', 2484106.000, 7293079.000, 7.300],
...                        ['pt6', 2484095.000, 7292891.000, 11.100]])
>>> 
>>> dists = scipy.spatial.distance.pdist(coords_arr[:,1:3], 'euclidean')
>>> dists = scipy.spatial.distance.squareform(dists)
>>> x, y = np.where((dists >= 8000) & (dists <= 30000))
>>> [(coords_arr[x[i]][0], coords_arr[y[i]][0], dists[y[i]][x[i]]) for i in xrange(len(x)) if x[i] < y[i]]
[('pt1', 'pt2', 28959.576688895162), ('pt1', 'pt3', 29042.897927032005)]

>>导入scipy.spatial.distance
>>>将numpy作为np导入
>>>coords_arr=np.数组（[[pt1'，2452130.000,7278106.000,25.000]，
…['pt2'，2479539.000，7287455.000，4.900]，
…['pt3'，2479626.000，7287458.000，10.000]，
…['pt4'，2484097.000，7292784.000，8.800]，
…['pt5'，2484106.000，7293079.000，7.300]，
…['pt6'，2484095.000，7292891.000，11.100]]
>>> 
>>>dists=scipy.space.distance.pdist（坐标系[：，1:3]，“欧几里德”）
>>>dists=scipy.space.distance.squareform（dists）
>>>如果x[i]

通过循环所有可能的组合，您只需从数据中创建一个新数组。该模块非常适合于此

n = coords_arr.shape[0] # number of points
D = scipy.spatial.distance.squareform(dists) # distance matrix

data = []
for i, j in itertools.combinations(range(n), 2):
    pt_a = coords_arr[i, 0]
    pt_b = coords_arr[j, 0]
    d_ab = D[i,j]
    data.append([pt_a, pt_b, d_ab])

result_arr = np.array(data)

如果内存有问题，您可能希望将距离查找从使用大型矩阵

更改为使用

和

索引直接在

dists

中查找值。

好吧，

['pt1'，pt2'，distance\u as\u number]

不太可能。使用混合数据类型可以获得的最接近的结果是结构化数组，但不能执行类似于

result[:2,0]

的操作。必须分别为字段名和数组索引编制索引，如：

result[['a'，'b'][0]

以下是我的解决方案：

import numpy as np
import scipy.spatial.distance

coords_arr = np.array([['pt1', 2452130.000, 7278106.000, 25.000],
                       ['pt2', 2479539.000, 7287455.000, 4.900],
                       ['pt3', 2479626.000, 7287458.000, 10.000],
                       ['pt4', 2484097.000, 7292784.000, 8.800],
                       ['pt5', 2484106.000, 7293079.000, 7.300],
                       ['pt6', 2484095.000, 7292891.000, 11.100]])

dists = scipy.spatial.distance.pdist(coords_arr[:,1:3], 'euclidean')

# Create a shortcut for `coords_arr.shape[0]` which is basically
# the total amount of points, hence `n`
n = coords_arr.shape[0]

# `a` and `b` contain the indices of the points which were used to compute the
# distances in dists. In this example:
# a = [0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4]
# b = [1, 2, 3, 4, 5, 2, 3, 4, 5, 3, 4, 5, 4, 5, 5]
a = np.arange(n).repeat(np.arange(n-1, -1, -1))
b = np.hstack([range(x, n) for x in xrange(1, n)])

min_d = 1000
max_d = 10000

# Find out which distances are in range.
in_range = np.less_equal(min_d, dists) & np.less_equal(dists, max_d)

# Define the datatype of the structured array which will be the result.
dtype = [('a', '<f8', (3,)), ('b', '<f8', (3,)), ('dist', '<f8')]

# Create an empty array. We fill it later because it makes the code cleaner.
# Its size is given by the sum over `in_range` which is possible
# since True and False are equivalent to 1 and 0.
result = np.empty(np.sum(in_range), dtype=dtype)

# Fill the resulting array.
result['a'] = coords_arr[a[in_range], 1:4]
result['b'] = coords_arr[b[in_range], 1:4]
result['dist'] = dists[in_range]

print(result)

# In caste you don't want a structured array at all, this is what you can do:
result = np.hstack([coords_arr[a[in_range],1:],
                    coords_arr[b[in_range],1:],
                    dists[in_range, None]]).astype('<f8')
print(result)

恩达拉：

[[2479539.0, 7287455.0, 4.9, 2484097.0, 7292784.0, 8.8, 7012.3893],
 [2479539.0, 7287455.0, 4.9, 2484106.0, 7293079.0, 7.3, 7244.7819],
 [2479539.0, 7287455.0, 4.9, 2484095.0, 7292891.0, 11.1, 7092.7591],
 [2479626.0, 7287458.0, 10.0, 2484097.0, 7292784.0, 8.8, 6953.8562],
 [2479626.0, 7287458.0, 10.0, 2484106.0, 7293079.0, 7.3, 7187.9093],
 [2479626.0, 7287458.0, 10.0, 2484095.0, 7292891.0, 11.1, 7034.8738]]

您还可以为示例案例添加预期的输出吗？@Divakar我固定了输出格式，因此“['pt1'、'pt2'，distance_as_number]感谢您的精彩回答。我认为在进一步检查后，我希望答案为（x_2，y_2，z_2），（x_4，y_4，z_4），（dist_2，4）@dassouki done:）虽然我不确定您是否仍然需要结构化数组，所以我提供了两种输出格式。

[[2479539.0, 7287455.0, 4.9, 2484097.0, 7292784.0, 8.8, 7012.3893],
 [2479539.0, 7287455.0, 4.9, 2484106.0, 7293079.0, 7.3, 7244.7819],
 [2479539.0, 7287455.0, 4.9, 2484095.0, 7292891.0, 11.1, 7092.7591],
 [2479626.0, 7287458.0, 10.0, 2484097.0, 7292784.0, 8.8, 6953.8562],
 [2479626.0, 7287458.0, 10.0, 2484106.0, 7293079.0, 7.3, 7187.9093],
 [2479626.0, 7287458.0, 10.0, 2484095.0, 7292891.0, 11.1, 7034.8738]]