C++ STL：海量数据中的排序和搜索_C++_Algorithm_Stl_Computational Geometry_Point Clouds

C++ STL：海量数据中的排序和搜索

c++ algorithm

C++ STL：海量数据中的排序和搜索,c++,algorithm,stl,computational-geometry,point-clouds,C++,Algorithm,Stl,Computational Geometry,Point Clouds,我有两组点来源和目标。目标是为每个源点找到目标中唯一的1:1最近邻点我的尝试如期进行，但速度慢得可笑。我已经测试了几千个点，但在实际场景中，点数将是数百万。我不是STL方面的专家。有什么建议我可以优化它吗 std::vector<UT_Vector3> targetPosVector; for (auto i = 0; i < targetNumPoints; i++) { auto pos = target->getPos3(i); targetPo

我有两组点来源和目标。目标是为每个源点找到目标中唯一的1:1最近邻点

我的尝试如期进行，但速度慢得可笑。我已经测试了几千个点，但在实际场景中，点数将是数百万。我不是STL方面的专家。有什么建议我可以优化它吗

std::vector<UT_Vector3> targetPosVector;
for (auto i = 0; i < targetNumPoints; i++)
{
    auto pos = target->getPos3(i);

    targetPosVector.push_back(pos);
}

std::vector<int> uniqueNeighborVector;
for (auto ptoff = 0; ptoff < sourceNumPoints; ptoff++)
{
    std::vector<std::pair<int, fpreal>> nearpointVector; // neighbor vector in form of "(idx, dist)"

    auto pos = source->getPos3(ptoff);
    for (auto j = 0; j < targetNumPoints; j++)
    {
        fpreal dist = pos.distance(targetPosVector[j]);

        std::pair<int, fpreal> neighbor {j, dist};
        nearpointVector.push_back(neighbor);
    }
    std::sort(nearpointVector.begin(), nearpointVector.end(), [](const std::pair<int, fpreal> &left,
                                                                 const std::pair<int, fpreal> &right)
                                                                { return left.second < right.second; });

    std::vector<int> neighborVector;
    for (auto i : nearpointVector)
    {
        neighborVector.push_back(i.first);
    }

    // trying to imitate Python's next() function
    // uniqueNeighborList[]
    // uneighbor = next(i for i in neighborVector if i not in uniqueNeighborVector)
    // uniqueNeighborVector = set(uniqueNeighborList.append(uneighbor))
    for (auto i : neighborVector)
    {     
        if (std::find(uniqueNeighborVector.begin(), uniqueNeighborVector.end(), i) == uniqueNeighborVector.end())
        {
            int uneighbor = i; // later on binds to the geometry attribute

            uniqueNeighborVector.push_back(i);

            break;
        }
    }
}

其中：

源和目标是详细的几何图形数据距离是计算两个向量之间距离的函数 getPos3是一个函数，用于获取3个浮点向量点的位置 fpreal aka 64位浮点 UT_矢量3是3-浮点矢量 sourceNumPoints和targetNumPoints是中的点数分别为源几何体和目标几何体。

如评论中所述，当试图计算数百万个点的二次复杂度时，二次复杂度会下降。即使优化代码，如果方法不变，二次复杂度也会保持不变

在我看来，这就像R3中的经典NN问题。一种众所周知的方法是使用，它们允许OLOGN查询时间在On log n构造时间和线性空间内。可以通过各种库来寻找实现：，这些是通过快速搜索得到的，我确信也有包含k-d树的复杂库

简要回顾：我会使用三维树，并在目标点集上构建它。然后一个接一个地获取每个源点，并在三维树中以Olog n时间找到每个点的最近邻点，这将产生O |源|日志|目标|时间和O |目标|大小

一个相关的。

标准库确实没有为此任务进行优化。看看专门的库，比如。这里的问题与其说是STL，不如说是选择的算法。您正在生成二次数量的距离对，因此这种方法不会扩展到数百万个点。您需要使用某种形式的空间细分来快速拒绝不可能的配对，而不是盲目地尝试所有组合。即使如此，即使是目前的蛮力实现也显得不必要的复杂。我推测一个目标点可能只匹配一次，低阶源获得第一个DIB。若是这样的话，那个么就不用生成和排序一个临时向量，然后线性扫描整个目标集，我建议通过将使用过的目标交换到后面，并在不进行中间缓冲和排序的情况下在线选择最佳匹配。虽然这不会让你进入百万，但这可能是一个开始。目标点是随机分散的。让我们假设第一个点的最近邻是1203，第二个点的最近邻可能是719123或具有巨大跳跃的任何东西。@Pradeep Barua：很好，均匀的随机分布应该使事情相对容易处理。例如，您可以尝试将目标点划分为三维网格结构。然后在搜索过程中，直接索引最近的网格条目，并逐渐向外扫描，返回并删除找到的第一个点。随着网格大小调整到密度，我相信这应该是大致线性的。