Python 寻找重复坐标的快速实现_Python

Python 寻找重复坐标的快速实现

python

Python 寻找重复坐标的快速实现,python,Python,我正试图编写一个程序，在3D数组中查找重复的坐标（x，y，z）。脚本应使用给定公差标记一个或多个重复点-一个点可以有多个重复。我发现了许多不同的方法，其中包括使用排序方法的方法为了尝试代码，我创建了以下测试数据集： 21.9799629872016 57.4044376777929 0 22.7807110172432 57.6921361034533 0 28.660840151287 61.5676757599822 0 28.6608401512 61.56767575998 0 30.

我正试图编写一个程序，在3D数组中查找重复的坐标（

x，y，z

）。脚本应使用给定公差标记一个或多个重复点-一个点可以有多个重复。我发现了许多不同的方法，其中包括使用排序方法的方法

为了尝试代码，我创建了以下测试数据集：

21.9799629872016 57.4044376777929 0
22.7807110172432 57.6921361034533 0
28.660840151287 61.5676757599822 0
28.6608401512 61.56767575998 0
30.6654296288019 56.2221038199424 0
20.3752036442253 49.1392209993897 0
32.8036584048178 43.927288357851 0
35.8105426210901 51.9456462679106 0
40.8888359641279 58.6944308422108 0
40.88883596412 70.6944308422108 0
41.0892949118794 58.1598736482068 0
39.6860822776189 64.775018924006 0
39.1515250836149 64.8418385732565 0
8.21402748063493 63.5054455882466 0
8.2140275006 63.5074455882 0
8.21404548063493 63.5064455882466 0
8.2143214806 63.5084455882 0

我想到的代码是：

# given tolerance
tol = 0.01

# initialize empty list for the found duplicates
duplicates = []

# loop over all nodes
for i in range(0,len(nodes)):
    # current node
    curr_node = nodes[i]
    # create difference vector
    diff = nodes - curr_node
    
    # get all duplicate indices (the node itself is found as well)
    condition = np.where((abs(diff[:,0])<tol) & (abs(diff[:,1])<tol) & (abs(diff[:,2])<tol))

    # check if more than one entry is present. If larger than 1, duplicate points exist
    if len(condition[0]) > 1:
        # loop over all found duplicate points
        for j in range(0,len(condition[0])):
            # add duplicate if not already marked as duplicate
            if j>0 and condition[0][j] not in duplicates:
                duplicates.append(condition[0][j] )

但是，代码非常慢。对于300000点，大约需要10分钟。我想知道是否有更快的方法来实现这一点。

您可以将点放置在

公差

大小的立方体网格中。然后，对于每个点，您只需要检查来自同一个立方体+26个相邻立方体的点，而不是所有其他点

# compute the grid

for p in points:
    cube = (
        int(p[0] / tolerance),
        int(p[1] / tolerance),
        int(p[2] / tolerance))
    grid[cube].append(p)

# check

for p in points:
    cube = as above
    for adj in adjacent_cubes(cube)
       for p2 in grid[adj]
           check_distance(p, p2)

您可以预先对节点进行排序，以减少所需的循环数量：

import timeit
import random

nodes = [
    [21.9799629872016, 57.4044376777929, 0],
    [22.7807110172432, 57.6921361034533, 0],
    [28.660840151287, 61.5676757599822, 0], [28.6608401512, 61.56767575998, 0],
    [30.6654296288019, 56.2221038199424, 0],
    [20.3752036442253, 49.1392209993897, 0],
    [32.8036584048178, 43.927288357851, 0],
    [35.8105426210901, 51.9456462679106, 0],
    [40.8888359641279, 58.6944308422108, 0],
    [40.88883596412, 70.6944308422108, 0],
    [41.0892949118794, 58.1598736482068, 0],
    [39.6860822776189, 64.775018924006, 0],
    [39.1515250836149, 64.8418385732565, 0],
    [8.21402748063493, 63.5054455882466, 0], [8.2140275006, 63.5074455882, 0],
    [8.21404548063493, 63.5064455882466, 0], [8.2143214806, 63.5084455882, 0]
]

duplicates = [3, 14, 15, 16]
assertList = [n for i, n in enumerate(nodes) if i in duplicates]


def new(nodes, tol=0.01):
    print(f"Searching duplicates in {len(nodes)} nodes")
    coordinateLen = range(len(nodes[0]))
    nodes.sort()

    last = nodes[0]
    duplicates = []

    for i, node in enumerate(nodes[1:]):
        if not all(0 <= node[idx] - last[idx] < tol for idx in coordinateLen):
            last = node
        else:
            duplicates.append(node)
    print(f"Found: {len(duplicates)} duplicates")
    return duplicates


# generate random numbers!
randomNodes = [
    [random.uniform(0, 100),
     random.uniform(0, 100),
     random.uniform(0, 1)] for _ in range(300000)
]

# make sure there are at least the same 4 duplicates!
randomNodes += nodes

for i, lst in enumerate((nodes, randomNodes)):
    for func in ("new", ):
        t1 = timeit.Timer(f"{func}({lst})", f"from __main__ import {func}")

        # verify values of found duplicates are [3, 14, 15, 16] !!
        if i == 0:
            print(all(x for x in new(nodes) if x in assertList))
        print(f"{func} took: {t1.timeit(number=10)} seconds")
        print("")

你能不能把这些点四舍五入到一定的精度，然后用

set

或

dict

来查找复制品？这是不一样的（两个非常接近的点可能在不同的方向上被舍入），但可能足够好，这取决于你真正需要的。你想把

[x1，y1，z1]

与

[x2，y2，z2]

作为一个整体进行比较吗？也可以序列化，比如1to2，2to3，etc@tobias_k：谢谢。我也看过这种方法。但是，查找重复项应尽可能准确。因此，以1或2厘米为例，我的应用程序会有所不同@SurajS：我想将数组的每个成员与每个给定点进行比较。因此，可能是

[x1，y1，z1]

是

[x2，y2，z2]

的副本，但也是

[x10，y10，z10]

。如果我没弄错你的问题。

import timeit
import random

nodes = [
    [21.9799629872016, 57.4044376777929, 0],
    [22.7807110172432, 57.6921361034533, 0],
    [28.660840151287, 61.5676757599822, 0], [28.6608401512, 61.56767575998, 0],
    [30.6654296288019, 56.2221038199424, 0],
    [20.3752036442253, 49.1392209993897, 0],
    [32.8036584048178, 43.927288357851, 0],
    [35.8105426210901, 51.9456462679106, 0],
    [40.8888359641279, 58.6944308422108, 0],
    [40.88883596412, 70.6944308422108, 0],
    [41.0892949118794, 58.1598736482068, 0],
    [39.6860822776189, 64.775018924006, 0],
    [39.1515250836149, 64.8418385732565, 0],
    [8.21402748063493, 63.5054455882466, 0], [8.2140275006, 63.5074455882, 0],
    [8.21404548063493, 63.5064455882466, 0], [8.2143214806, 63.5084455882, 0]
]

duplicates = [3, 14, 15, 16]
assertList = [n for i, n in enumerate(nodes) if i in duplicates]


def new(nodes, tol=0.01):
    print(f"Searching duplicates in {len(nodes)} nodes")
    coordinateLen = range(len(nodes[0]))
    nodes.sort()

    last = nodes[0]
    duplicates = []

    for i, node in enumerate(nodes[1:]):
        if not all(0 <= node[idx] - last[idx] < tol for idx in coordinateLen):
            last = node
        else:
            duplicates.append(node)
    print(f"Found: {len(duplicates)} duplicates")
    return duplicates


# generate random numbers!
randomNodes = [
    [random.uniform(0, 100),
     random.uniform(0, 100),
     random.uniform(0, 1)] for _ in range(300000)
]

# make sure there are at least the same 4 duplicates!
randomNodes += nodes

for i, lst in enumerate((nodes, randomNodes)):
    for func in ("new", ):
        t1 = timeit.Timer(f"{func}({lst})", f"from __main__ import {func}")

        # verify values of found duplicates are [3, 14, 15, 16] !!
        if i == 0:
            print(all(x for x in new(nodes) if x in assertList))
        print(f"{func} took: {t1.timeit(number=10)} seconds")
        print("")

Searching duplicates in 17 nodes
Found: 4 duplicates
True
....
new took: 0.00034904800000001845 seconds

Searching duplicates in 300017 nodes
Found: 4 duplicates
...
new took: 14.316181525000001 seconds