C++ 分割重叠段
这是一个分段向量C++ 分割重叠段,c++,vector,range,segment,C++,Vector,Range,Segment,这是一个分段向量 class Segment { public: size_t left; size_t right; char ID; Segment(size_t a, size_t b, char c):left(a), right(b), ID(c){assert(left<right);} }; std::vector<Segment> A = {{3, 10, 'A'}, {7, 22, 'B'}, {14, 17, 'C'} , {16
class Segment
{
public:
size_t left;
size_t right;
char ID;
Segment(size_t a, size_t b, char c):left(a), right(b), ID(c){assert(left<right);}
};
std::vector<Segment> A = {{3, 10, 'A'}, {7, 22, 'B'}, {14, 17, 'C'} , {16, 19, 'D'}, {25, 31, 'E'}, {28, 32, 'F'}, {34, 37, 'G'}, {34, 37, 'H'}, {46, 49, 'I'}, {52, 59, 'J'}}
我想创建一个对象B
,该对象包含A
中那些大段的不重叠小段。要访问B
,我们必须缩小彼此重叠的段,并为所有重叠的位置创建一个ID为X
的新段。向量B
也需要根据left
属性进行排序
对于上面的示例,预期输出为
std::vector<Segment> B = {{3, 7, 'A'}, {7, 10, `X`}, {10, 14, 'B'}, {14, 19, 'X'}, {19, 22, 'B'} , {25, 28, 'E'}, {28, 31, 'X'}, {31, 32, 'F'}, {34, 37, 'X'}, {46, 49, 'I'}, {52, 59, 'J'}}
---- 'A'
--- 'X' (overlap between 'A' and 'B')
---- 'B'
-------- 'X' (overlap between 'B', 'C' and 'D')
--- 'B' -> Note that 'B' is now split in two
--- 'E'
--- 'X' (overlap between 'E' and 'F')
- 'F'
--- 'X' (overlap between 'G' and 'H')
--- 'I'
------- 'J'
不需要对输入进行排序的算法会更好
尝试
为了表示努力,这里有一个尝试,1)有缺陷,2)实现速度相对较慢。让我们调用“断点”,即下面B
vector中左边的任意右边。其思想是通过在前面和后面的段中系统地搜索潜在的下一个断点,从一个断点跳到下一个断点。这样做时,它应该跟踪新段中应该给出的ID(如果断点之间的距离与A
中的至少一个段匹配)
std::vector<Segment> foo(std::vector<Segment>& A)
{
if (A.size() <= 1) return A;
std::vector<Segment> B;
B.reserve(A.size());
size_t A_index = 0;
size_t currentPos = A[A_index].left;
while ( A_index < A.size())
{
auto nextPos = A[A_index].right;
//std::cout << "currentPos = " << currentPos << "\n";
//std::cout << "nextPos before search = " << nextPos << "\n";
bool isIntersection = false;
// Search in preceding Segments
for (size_t i = A_index - 1 ; i < A.size() ; --i)
{
if (A[i].right > currentPos && A[i].right < nextPos )
{
nextPos = A[i].right;
isIntersection = true;
//std::cout << "Found " << nextPos << " in preceding segment\n";
}
}
// Search in following Segments
for (size_t i = A_index+1 ; i < A.size() ; ++i)
{
if ( A[i].left > currentPos && A[i].left < nextPos)
{
nextPos = A[i].left;
//std::cout << "Found left of " << nextPos << " in following segment\n";
break;
}
if ( A[i].right > currentPos && A[i].right < nextPos )
{
nextPos = A[i].right;
isIntersection = true;
//std::cout << "Found right of " << nextPos << " in following segment\n";
break;
}
}
// create new Segment
if (!isIntersection)
{
B.push_back({currentPos, nextPos, A[A_index].ID});
} else
{
B.push_back({currentPos, nextPos, 'X'});
}
if (nextPos == A[A_index].right)
{
++A_index;
nextPos = A[A_index].left;
}
currentPos = nextPos;
}
return B;
}
int main()
{
std::vector<Segment> A = {{3, 10, 'A'}, {7, 22, 'B'}, {14, 17, 'C'} , {16, 19, 'D'}, {25, 31, 'E'}, {28, 32, 'F'}, {34, 37, 'G'}, {34, 37, 'H'}, {46, 49, 'I'}, {52, 59, 'J'}};
print(A);
auto B = foo(A);
print(B);
}
std::vector foo(std::vector&A)
{
如果(A.size()以下不是最有效的,但它会产生预期的输出。策略如下:
(实际上不需要对输入向量进行排序)
将所有段分割成1个宽段。将它们存储在映射位置到该位置上的ID的映射中
根据重叠分配新ID
把这些碎片再粘在一起
为方便起见,我使用了
struct Overlap {
std::vector<char> IDs;
Overlap() {}
void add(char id) {IDs.push_back(id);}
};
实际上,唯一昂贵的部分是1(O(段数x其宽度)。我相信使用两个容器可以提高效率,一个按左
排序,另一个按右
排序,以便更容易地检测重叠。但是,我将从上面的简单实现开始。此外,如果宽度与段数相比相当小,则可能会t O(分段数x宽度)比O(分段对数x分段数)更适合排序。这里有一个解决方案,它计算分段创建的所有过渡点,然后使用这些点重建新分段
算法是:
每段生成2个过渡点,一个用于打开,一个用于关闭段
将对过渡点进行排序
从每对相邻的过渡点构造新线段。每对点表示:
a) 空段(未添加新段)
b) 单个段(添加了带.ID的段)
c) 多段(添加了带“X”的段)
新构造的段可能包含相邻的X
段,因此它们需要合并
首先,一个存储转换点的简单结构:
struct Point
{
size_t location;
bool overlap; // does this point start/close a new segment
char ID;
};
实施计划如下:
std::vector<Segment> foo(std::vector<Segment> const & segments)
{
// generate all transition points
std::vector<Point> points;
for (auto const & seg : segments)
{
points.push_back({seg.left, true, seg.ID});
points.push_back({seg.right, false, seg.ID});
}
// sort transition points
std::sort(points.begin(), points.end(),
[](auto a, auto b) { return a.location < b.location; });
std::vector<Segment> res;
// initialize overlaps
std::multiset<char> overs{points[0].ID};
// for every adjacent transition point
for(auto i = 1u; i < points.size(); ++i)
{
auto &a = points[i - 1];
auto &b = points[i];
// if there is a jump in between transition points
if (a.location < b.location)
switch (overs.size())
{
// no segment
case 0 : break;
// ony one segment
case 1 : res.push_back({a.location, b.location, *overs.begin()}); break;
// overlapping segment
default : res.push_back({a.location, b.location, 'X'}); break;
}
// update overlaps
if (b.overlap)
overs.insert(b.ID);
else
overs.erase(overs.find(b.ID));
}
// merge adjacent 'X' overlaps
for(auto i = 0u; i < res.size(); ++i)
{
if (res[i].ID == 'X')
{
auto f = std::find_if(res.begin() + i + 1, res.end(),
[](auto r) { return r.ID != 'X'; });
res[i].right = (f - 1)->right;
res.erase(res.begin() + i + 1, f);
}
}
return res;
}
std::vector foo(std::vector const&segments)
{
//生成所有过渡点
std::向量点;
用于(自动常量和分段:段)
{
点。推回({seg.left,true,seg.ID});
点。推回({seg.right,false,seg.ID});
}
//排序转换点
排序(points.begin(),points.end(),
[](自动a,自动b){返回a.location右;
res.erase(res.begin()+i+1,f);
}
}
返回res;
}
这是一个O(n log(n))
算法
这是一个。谢谢!这是一个聪明而简单的解决方法。就在你发布答案的时候,我做了一个编辑,指定了左
和右
所采用的值类型。一个段的长度可以达到1e9,所以在RAM中制作宽度为1的片段会非常迫切。@Remi.b哦:)。我不知道,片段可以当一个大的区域没有重叠,或者在一个大的区域上没有相同的区段时,不一定要有1个宽度,你只需要存储区域的开始和结束。无论如何,我并不是说这是完美的解决方案。我发现这个问题很有趣,我的建议是从正确和正确的开始simple@Remi.b我可能会只需选择两个已排序的容器。然后您可以通过“全局左”到“全局右”一个容器帮助你知道一个片段何时弹出,另一个容器告诉你一个片段何时弹出。正确和简单的方法已经远远超出我所能做到的!我将尝试思考这两个容器以及如何构建它们。谢谢+1@Remi.b也许我以后会补充一些东西,我还是很好奇谢谢!一开始,我以为你算法不需要输入段
进行排序,节省时间,非常方便
int main() {
std::vector<Segment> A = {{3, 10, 'A'}, {7, 22, 'B'}, {14, 17, 'C'} , {16, 19, 'D'}, {25, 31, 'E'}, {28, 32, 'F'}, {34, 37, 'G'}, {34, 37, 'H'}, {46, 49, 'I'}, {52, 59, 'J'}};
// dissect
std::map<size_t,Overlap> over;
for (const auto& s : A) {
for (size_t i = s.left; i < s.right; ++i) {
over[i].add(s.ID);
}
}
// assign new segments
std::map<size_t,char> pieces;
for (const auto& o : over) {
if (o.second.IDs.size() == 1) {
pieces[o.first] = o.second.IDs.front();
} else {
pieces[o.first] = 'X';
}
}
// glue them
std::vector<Segment> result;
auto it = pieces.begin();
Segment current(it->first,it->first,it->second); // here left==right !
++it;
for ( ; it != pieces.end(); ++it) {
if (it->second == current.ID) continue;
current.right = it->first -1;
result.push_back(current);
current = Segment{it->first,it->first,it->second};
}
print(result);
}
{3, 6, A} {7, 9, X} {10, 13, B} {14, 18, X} {19, 24, B} {25, 27, E} {28, 30, X} {31, 33, F} {34, 45, X} {46, 51, I}
--- A
-- X
--- B
---- X
----- B
-- E
-- X
-- F
----------- X
----- I
struct Point
{
size_t location;
bool overlap; // does this point start/close a new segment
char ID;
};
std::vector<Segment> foo(std::vector<Segment> const & segments)
{
// generate all transition points
std::vector<Point> points;
for (auto const & seg : segments)
{
points.push_back({seg.left, true, seg.ID});
points.push_back({seg.right, false, seg.ID});
}
// sort transition points
std::sort(points.begin(), points.end(),
[](auto a, auto b) { return a.location < b.location; });
std::vector<Segment> res;
// initialize overlaps
std::multiset<char> overs{points[0].ID};
// for every adjacent transition point
for(auto i = 1u; i < points.size(); ++i)
{
auto &a = points[i - 1];
auto &b = points[i];
// if there is a jump in between transition points
if (a.location < b.location)
switch (overs.size())
{
// no segment
case 0 : break;
// ony one segment
case 1 : res.push_back({a.location, b.location, *overs.begin()}); break;
// overlapping segment
default : res.push_back({a.location, b.location, 'X'}); break;
}
// update overlaps
if (b.overlap)
overs.insert(b.ID);
else
overs.erase(overs.find(b.ID));
}
// merge adjacent 'X' overlaps
for(auto i = 0u; i < res.size(); ++i)
{
if (res[i].ID == 'X')
{
auto f = std::find_if(res.begin() + i + 1, res.end(),
[](auto r) { return r.ID != 'X'; });
res[i].right = (f - 1)->right;
res.erase(res.begin() + i + 1, f);
}
}
return res;
}