C++ 找出两个集合是否重叠的最快方法？_C++_Algorithm

C++ 找出两个集合是否重叠的最快方法？

c++ algorithm

C++ 找出两个集合是否重叠的最快方法？,c++,algorithm,C++,Algorithm,显然，做std:：set_intersection（）是浪费时间。算法头中是否有一个函数可以精确地执行此操作？据我所知，std:：find_first_of（）正在进行线性搜索。如果输入已排序，则可以使用以下模板函数： template<class InputIt1, class InputIt2> bool intersect(InputIt1 first1, InputIt1 last1, InputIt2 first2, InputIt2 last2) { whil

显然，做std:：set_intersection（）是浪费时间。算法头中是否有一个函数可以精确地执行此操作？

据我所知，

std:：find_first_of（）

正在进行线性搜索。

如果输入已排序，则可以使用以下模板函数：

template<class InputIt1, class InputIt2>
bool intersect(InputIt1 first1, InputIt1 last1, InputIt2 first2, InputIt2 last2)
{
    while (first1 != last1 && first2 != last2) {
        if (*first1 < *first2) {
            ++first1;
            continue;
        } 
        if (*first2 < *first1) {
            ++first2;
            continue;
        } 
        return true;
    }
    return false;
}

此函数具有复杂性

O（n+m）

，其中

，

是输入大小。但是，如果一个输入对另一个输入非常小（n这是一个仅适用于

std:：set

（或multi）的解决方案）。map的解决方案只需要稍微多做一点工作

我试了三种方法

首先，如果一个比另一个大得多，我只需在另一个中查找一个的所有元素，然后反之亦然

常数

在理论上是错误的。对于某些k，它应该是

knlgm>m

，而不是

100n>m

，以获得最佳的大O性能：但是常数因子很大，而且100>lgm，所以确实应该进行实验

如果不是这样的话，我们在每个集合中寻找碰撞，就像

设置交叉点

一样。我们使用

.lower\u bound

来尝试更快地跳过每个列表，而不仅仅是

++

请注意，如果您的列表由交错元素组成（如

{1,3,7}

和

{0,2,4,6,8}

），这将比只使用++的对数因子慢

如果这两个集合相互“交叉”的频率较低，则会跳过每个集合的大量内容

如果要比较这两种行为，请将

下限部分替换为++

template<class Lhs, class Rhs>
bool sorted_has_overlap( Lhs const& lhs, Rhs const& rhs ) {
  if (lhs.empty() || rhs.empty()) return false;
  if (lhs.size() * 100 < rhs.size()) {
    for (auto&& x:lhs)
      if (rhs.find(x)!=rhs.end())
        return true;
    return false;
  }
  if (rhs.size() * 100 < lhs.size()) {
    for(auto&& x:rhs)
      if (lhs.find(x)!=lhs.end())
        return true;
    return false;
  }

  using std::begin; using std::end;

  auto lit = begin(lhs);
  auto lend = end(lhs);

  auto rit = begin(rhs);
  auto rend = end(rhs);

  while( lit != lend && rit != rend ) {
    if (*lit < *rit) {
      lit = lhs.lower_bound(*rit);
      continue;
    }
    if (*rit < *lit) {
      rit = rhs.lower_bound(*lit);
      continue;
    }
    return true;
  }
  return false;
}

模板
已排序的布尔值有重叠（左侧常量和左侧、右侧常量和右侧）{
if（lhs.empty（）| | rhs.empty（））返回false；
如果（左侧尺寸（）*100<右侧尺寸（））{
用于（自动和x:lhs）
如果（rhs.find（x）！=rhs.end（））
返回true；
返回false；
}
如果（rhs.size（）*100

排序数组可以选择第三种算法，并使用std:：lower_bound
快速推进“other”容器。这样做的优点是使用部分搜索（在集合中无法快速搜索）。它在“交错”元素上的表现也很差（通过对数n因子）与naive++
相比
前两种方法也可以通过排序数组快速完成，将方法调用替换为对std
中算法的调用。这种转换基本上是机械的
排序数组上的渐近最优版本将使用偏向于在列表开始处查找下界的二进制搜索——在1、2、4、8等处搜索，而不是在一半、四分之一等处搜索。请注意，这具有相同的lg（n）最坏情况，但如果搜索的元素是第一个而不是O（lg（n）），则为O（1）（搜索进度越小）意味着取得的全局进度越少，针对这种情况优化子算法可以提高全局最坏情况的速度
要了解原因，在“快速交替”中，它的性能不会比++
更差——下一个元素是符号交换的情况下需要O（1）个操作，如果间隙更大，它将O（k）替换为O（lg k）
然而，到目前为止，我们已经深入到了一个优化的领域：profile，并在继续这样做之前确定它是否值得

对排序数组的另一种方法是假定std:：lower_bound
是以最佳方式写入的（在随机访问迭代器上）。使用一个输出迭代器，该迭代器在写入时会引发异常。如果捕捉到异常，则返回true，否则返回false
（上面的优化——选择一个并对另一个进行bin搜索，以及指数前进搜索——对于std:：set\u交叉点来说可能是合法的
）

我认为使用3种算法是很重要的。设置相交测试时，一边小得多，另一边可能很常见：一个元素在一边，另一边多个元素的极端情况是众所周知的（作为搜索）
简单的“双线性”搜索在这种常见情况下可以提供线性性能。通过检测两侧之间的不对称性，您可以在适当的时候切换到“小线性，大对数”，并在这些情况下具有更好的性能。O（n+m）vs O（m lgn）——如果m有时您可以在单个内存字中对数字集进行编码。例如，您可以对该集进行编码{0,2,3,6,7}
在内存字中：…000000 11001101
。规则是：位置i
（从右到左读取）的位向上，当且仅当数字i
在集合中时
现在，如果您有两个集合，编码在内存单词a
和b
中，您可以使用位运算符和执行交集
int a = ...;
int b = ...;
int intersection = a & b;
int union = a | b;  // bonus

这种风格的好处在于，交集（并集、互补）是在一条cpu指令中执行的（我不知道这是否是正确的术语）
如果需要处理大于一个存储字位数的数字，可以使用多个存储字。通常，我使用一个存储字数组
如果要处理负数，只需使用两个数组，一个用于负数，一个用于正数#include <algorithm>
template<class InputIt1, class InputIt2>
/**
 *  When input1 is much smaller that input2
 */
bool intersect(InputIt1 first1, InputIt1 last1, InputIt2 first2, InputIt2 last2) {
    while (first1 != last1)
        if (std::binary_search(first2, last2, *first1++))
            return true;
    return false;
}

template<class Lhs, class Rhs>
bool sorted_has_overlap( Lhs const& lhs, Rhs const& rhs ) {
  if (lhs.empty() || rhs.empty()) return false;
  if (lhs.size() * 100 < rhs.size()) {
    for (auto&& x:lhs)
      if (rhs.find(x)!=rhs.end())
        return true;
    return false;
  }
  if (rhs.size() * 100 < lhs.size()) {
    for(auto&& x:rhs)
      if (lhs.find(x)!=lhs.end())
        return true;
    return false;
  }

  using std::begin; using std::end;

  auto lit = begin(lhs);
  auto lend = end(lhs);

  auto rit = begin(rhs);
  auto rend = end(rhs);

  while( lit != lend && rit != rend ) {
    if (*lit < *rit) {
      lit = lhs.lower_bound(*rit);
      continue;
    }
    if (*rit < *lit) {
      rit = rhs.lower_bound(*lit);
      continue;
    }
    return true;
  }
  return false;
}

int a = ...;
int b = ...;
int intersection = a & b;
int union = a | b;  // bonus

#include <set>
#include <iostream>
#include <algorithm>

bool overlap(const std::set<int>& s1, const std::set<int>& s2)
{
    for( const auto& i : s1) {
        if(std::binary_search(s2.begin(), s2.end(), i))
            return true;
    }
    return false;
}

int main()
{
    std::set<int> s1 {1, 2, 3};
    std::set<int> s2 {3, 4, 5, 6};

    std::cout << overlap(s1, s2) << '\n';
}