C++ 计算两个向量的指数'；有效的公共元素_C++_Vector_Unordered Set

C++ 计算两个向量的指数'；有效的公共元素

c++ vector

C++ 计算两个向量的指数'；有效的公共元素,c++,vector,unordered-set,C++,Vector,Unordered Set,我有两个向量（每个向量只有唯一的元素），它们共享一组整数。我想尽可能有效地计算一个向量的元素的指数，这些元素也存在于另一个向量中。你能比我拙劣低效的实现做得更好吗编辑：向量没有排序，我们需要未排序向量的索引。此外，在解决问题时，禁止修改初始向量（random_vec_1和random_vec_2） #包括 #包括 #包括 #包括 #包括 #包括使用名称空间std:：chrono； int main（）{ //设置1：用随机整数构造两个向量。 constexpr size\u t num=1

我有两个向量（每个向量只有唯一的元素），它们共享一组整数。我想尽可能有效地计算一个向量的元素的指数，这些元素也存在于另一个向量中。你能比我拙劣低效的实现做得更好吗

编辑： 向量没有排序，我们需要未排序向量的索引。此外，在解决问题时，禁止修改初始向量（

random_vec_1

和

random_vec_2

）

#包括
#包括
#包括
#包括
#包括
#包括
使用名称空间std:：chrono；
int main（）{
//设置1：用随机整数构造两个向量。
constexpr size\u t num=1000；
std：：随机_装置rd；
标准：mt19937 gen（rd（））；
std：：均匀分布分布dis（0，num）；
std：：向量随机向量1；
std：：向量随机向量2；
随机向量1.保留（num）；
随机向量2.保留（num）；
对于（大小i=0u；istd：：cout向量
现在太快了，我不会使用集合
：
将第一个向量复制到，例如新向量_1
排序new_vector_1
使用binary\u search
在new\u vector\u 1
中查找值
代码：
std::vector<int> new_vec_1(random_vec_1);
std::sort(std::begin(new_vec_1), std::end(new_vec_1));
std::vector<size_t> match_index_2;
match_index_2.reserve(random_vec_2.size());

for (size_t i = 0; i < random_vec_2.size(); ++i) {
    if (std::binary_search(std::begin(new_vec_1), 
                           std::end(new_vec_1),
                           random_vec_2[i])) {
        match_index_2.push_back(i);
    }
}

实际上，我希望对向量进行排序大大优于创建std:：set
，因为STL集是一棵树，而int
的向量可以使用计数排序在线性时间内进行排序，如果您的计数不超过1，则将得到一个集合。创建集合是O（n log n）对于成本日志n的n个插入，排序为O（n），如上所述
在排序的向量上，您可以运行std:：set_difference
，它也应该与两个输入中的较大值在时间上线性运行
因此，你应该能够在线性时间内做到这一点
如果不能修改向量，可以使用hashmap（std:：unordered_-map）将值映射到原始向量中的索引。请注意，由于您没有提到数字是唯一的，因此您会发现两个集合中都包含值x_1，…，x_n，然后您将使用该映射使用hashmap将其投影回原始向量中的索引。新答案
新的要求是，在计算解时不能修改原始向量。由于索引混淆，排序交集解不再有效
以下是我的建议：使用无序映射将第一个向量值映射到相应的索引，然后运行第二个向量值
// Not necessary, might increase performance
match_index_2.reserve(std::min(random_vec_1.size(), random_vec_2.size()));

std::unordered_map<int, int> index_map;
// random_vec_2 is the one from which we want the indices.
index_map.reserve(random_vec_2.size());
for (std::size_t i = 0; i < random_vec_2.size(); ++i) {
    index_map.emplace(random_vec_2[i], i);
}

for (auto& it : random_vec_1) {
    auto found_it = index_map.find(it);
    if (found_it != index_map.end()) {
        match_index_2.push_back(found_it->second);
    }
}

改编自
以下是另一个版本，改编自：
您可以将索引创建到值集中，并对这些值进行操作：
#include <algorithm>
#include <vector>

inline std::vector<std::size_t>  make_unique_sorted_index(const std::vector<int>& v) {
    std::vector<std::size_t> result(v.size());
    std::iota(result.begin(), result.end(), 0);
    std::sort(result.begin(), result.end(),
        [&v] (std::size_t a, std::size_t b) {
            return v[a] < v[b];
    });
    auto obsolete = std::unique(result.begin(), result.end(),
        [&v] (std::size_t a, std::size_t b) {
            return v[a] == v[b];
    });
    result.erase(obsolete, result.end());
    return result;
}

// Constructs an unordered range of indices [i0, i1, i2, ...iN) into the first set
// for elements that are found uniquely in both sets.
// Note: The sequence [set1[i0], set1[i1], set1[i2], ... set1[iN]) will be sorted.
std::vector<std::size_t>  unordered_set_intersection(
    const std::vector<int>& set1,
    const std::vector<int>& set2)
{
    std::vector<std::size_t> result;
    result.reserve(std::min(set1.size(), set2.size()));
    std::vector<std::size_t> index1 = make_unique_sorted_index(set1);
    std::vector<std::size_t> index2 = make_unique_sorted_index(set2);

    auto i1 = index1.begin();
    auto i2 = index2.begin();
    while(i1 != index1.end() && i2 != index2.end()) {
        if(set1[*i1] < set2[*i2]) ++i1;
        else if(set2[*i2] < set1[*i1]) ++i2;
        else {
            result.push_back(*i1);
            ++i1;
            ++i2;
        }
    }
    result.shrink_to_fit();
    return result;
}

如果索引是唯一的或不是唯一的，则该算法产生稳定的结果：

元素的排序（结果索引指向）与std:：sort一样稳定
如果索引不是唯一的，则相同元素的数量（结果索引指向）分别是第一组或第二组中相同元素的最小数量
也可以看一下：你可以考虑在@ diutl u上的例子，这些链接没有帮助。<代码>：STD：：包含告诉你第二个范围是否包含在第一个范围内，这是无关的。<代码> STD:：独特的< /代码>从一个范围中移除连续的重复，这是更不相关的。向量不是为我的真实而排序的。问题。设置交叉点（不包括）is：一般来说，我处理的向量没有排序。我只需要构造一个玩具示例，并使用std:：set
。@MrX你在编辑中说你需要未排序向量的索引。但是你当前的解决方案对向量进行排序。那么，排序可以吗？还是你希望原始向量不被碰？我拿了个大问题这里没有，但我怀疑另一种算法是否优于排序，并且使用set\u intersection
@MrX Nelxiost的方法很好，您的要求与您当前的解决方案相冲突。Nelxiost:set\u intersection
为您提供两个排序范围的通用值，而不是一个向量的索引，这也与要求。我的解决方案（而不是设置）也适用于未排序的向量。我只是为问题设置添加了一个随机洗牌，以使其更清晰。我的错误。我编辑了，以便您获得索引而不是值。我删除了带有set\u intersection的部分，因为您无法将其用于问题（可能通过创建特殊的输出迭代器除外）。@MrX您没有回答。是否要求原始向量保持未排序？计数排序仅适用于小值范围。MrX使用的范围非常小（0…1000），但可能值得注意的是，用户可能无法自由做出这样的假设。向量的一般排序上限也是O（n log n）。使用哈希映射作为计数器，可以将其应用于相当大的范围（2^xxx）同样，只要它们足够稀疏。当然，内存成本与符号（数字）的大小成线性关系，但通常情况下，这是可行的。例如，对于所有整数，在16GB内存中最多可以计算2^32次。由于OP需要一个集合，他实际上只需要一位来表示存在/不存在，即m
match_index_2.reserve(std::min(random_vec_1.size(), random_vec_2.size()));

constexpr std::size_t unmapped = -1; // -1 or another unused index
// Since std::size_t is an unsigned type, -1 will actually be the maximum value it can hold.

std::vector<std::size_t> index_map(num, unmapped);
for (std::size_t i = 0; i < random_vec_2.size(); ++i) {
    index_map[random_vec_2[i]] = i;
}

for (auto& it : random_vec_1) {
    auto index = index_map[it];
    if (index != unmapped) {
        match_index_2.push_back(index);
    }
}

auto first1 = random_vec_1.begin();
auto last1 = random_vec_1.end();
auto first2 = random_vec_2.begin();
auto last2 = random_vec_2.end();
auto index_offset = first1; // Put first2 if you want the indices of the second vector instead

while (first1 != last1 && first2 != last2)
    if (*first1 < *first2)
        ++first1;
    else if (*first2 < *first1)
        ++first2;
    else {
        match_index_2.push_back(std::distance(index_offset, first1));
        ++first1;
        ++first2;
    }
}

auto first1 = random_vec_1.begin();
auto last1 = random_vec_1.end();
auto first2 = random_vec_2.begin();
auto last2 = random_vec_2.end();
auto index_offset = first1; // Put first2 if you want the indices of the second vector instead

while (first1 != last1 && first2 != last2) {
    if (*first1 < *first2) {
        ++first1;
    } else  {
        if (!(*first2 < *first1)) {
            match_index_2.push_back(std::distance(index_offset, first1++));
        }
        ++first2;
    }
}

// Setup 2: Make elements unique.
auto first1 = random_vec_1.begin();
auto last1 = random_vec_1.end();
std::sort(first1, last1);
last1 = std::unique(first1, last1);
random_vec_1.erase(last1, random_vec_1.end());

auto first2 = random_vec_2.begin();
auto last2 = random_vec_2.end();
std::sort(first2, last2);
last2 = std::unique(first2, last2);
random_vec_2.erase(last2, random_vec_2.end());

#include <algorithm>
#include <vector>

inline std::vector<std::size_t>  make_unique_sorted_index(const std::vector<int>& v) {
    std::vector<std::size_t> result(v.size());
    std::iota(result.begin(), result.end(), 0);
    std::sort(result.begin(), result.end(),
        [&v] (std::size_t a, std::size_t b) {
            return v[a] < v[b];
    });
    auto obsolete = std::unique(result.begin(), result.end(),
        [&v] (std::size_t a, std::size_t b) {
            return v[a] == v[b];
    });
    result.erase(obsolete, result.end());
    return result;
}

// Constructs an unordered range of indices [i0, i1, i2, ...iN) into the first set
// for elements that are found uniquely in both sets.
// Note: The sequence [set1[i0], set1[i1], set1[i2], ... set1[iN]) will be sorted.
std::vector<std::size_t>  unordered_set_intersection(
    const std::vector<int>& set1,
    const std::vector<int>& set2)
{
    std::vector<std::size_t> result;
    result.reserve(std::min(set1.size(), set2.size()));
    std::vector<std::size_t> index1 = make_unique_sorted_index(set1);
    std::vector<std::size_t> index2 = make_unique_sorted_index(set2);

    auto i1 = index1.begin();
    auto i2 = index2.begin();
    while(i1 != index1.end() && i2 != index2.end()) {
        if(set1[*i1] < set2[*i2]) ++i1;
        else if(set2[*i2] < set1[*i1]) ++i2;
        else {
            result.push_back(*i1);
            ++i1;
            ++i2;
        }
    }
    result.shrink_to_fit();
    return result;
}

inline std::vector<std::size_t>  make_sorted_index(const std::vector<int>& v) {
    std::vector<std::size_t> result(v.size());
    std::iota(result.begin(), result.end(), 0);
    std::sort(result.begin(), result.end(),
        [&v] (std::size_t a, std::size_t b) {
            return v[a] < v[b];
    });
    return result;
}