Ruby 选择由不同子字符串覆盖的字符串
我想选择一组具有以下属性的子字符串所覆盖的字符串部分: 它们属于原始字符串。 它们可能有不同的长度和位置。 它们可以重叠。 它们的顺序可能与原始字符串中的顺序不同。 例如:Ruby 选择由不同子字符串覆盖的字符串,ruby,string,substring,Ruby,String,Substring,我想选择一组具有以下属性的子字符串所覆盖的字符串部分: 它们属于原始字符串。 它们可能有不同的长度和位置。 它们可以重叠。 它们的顺序可能与原始字符串中的顺序不同。 例如: string = "MGLSDGEWQQVLNVWGKVEADIAGHGQEVLIHSKHPGDFGADAQGAMTKALELFRNDIAAKYKELGFQG" substring1 = "HPGDFGADAQGAMTKALELFR" substring2 = "GEWQQVLNVWGK" substringn = "ALE
string = "MGLSDGEWQQVLNVWGKVEADIAGHGQEVLIHSKHPGDFGADAQGAMTKALELFRNDIAAKYKELGFQG"
substring1 = "HPGDFGADAQGAMTKALELFR"
substring2 = "GEWQQVLNVWGK"
substringn = "ALELFRNDIAAKYK"
我想得到:
coverage = "MGLSD<b>GEWQQVLNVWGK</b>VEADIAGHGQEVLIHSK<b>HPGDFGADAQGAMTKALELFRNDIAAKYK</b>ELGFQG"
这样,我就得到了每个子串的开始和结束位置。我如何将它们全部合并,尤其是考虑到它们可能重叠并以不同的顺序出现?这是一个好的策略吗?这应该很管用,但它确实管用:
string = "MGLSDGEWQQVLNVWGKVEADIAGHGQEVLIHSKHPGDFGADAQGAMTKALELFRNDIAAKYKELGFQG"
substring1 = "HPGDFGADAQGAMTKALELFR"
substring2 = "GEWQQVLNVWGK"
substring3 = "ALELFRNDIAAKYK"
substrings = [substring1, substring2, substring3]
overlapping_indexes = substrings.map do |substring|
start_pos = string.index substring
end_pos = start_pos + substring.length
(start_pos..end_pos)
end
# the following 3 methods are from Wayne Conrad in this question: http://stackoverflow.com/questions/6017523/how-to-combine-overlapping-time-ranges-time-ranges-union
def ranges_overlap?(a, b)
a.include?(b.begin) || b.include?(a.begin)
end
def merge_ranges(a, b)
[a.begin, b.begin].min..[a.end, b.end].max
end
def merge_overlapping_ranges(ranges)
ranges.sort_by(&:begin).inject([]) do |ranges, range|
if !ranges.empty? && ranges_overlap?(ranges.last, range)
ranges[0...-1] + [merge_ranges(ranges.last, range)]
else
ranges + [range]
end
end
end
indexes = merge_overlapping_ranges(overlapping_indexes)
x = "<b>"
y = "</b>"
offset = 0
indexes.each do |index|
string.insert(index.begin + offset, x)
offset += x.length
string.insert(index.end + offset, y)
offset += y.length
end
p string
这看起来是一个很好的开始策略,尤其是当您将.each转换为.map并返回需要标记的开始/结束位置列表时。然后你的问题变成了如何合并这些范围,以便重叠的范围合并成单个更大的范围。我会说你已经完成了80%,所以你要做的是在开始插入和结束插入时,这项工作似乎完成了,你可以克隆字符串并修改克隆的字符串,然后返回克隆的字符串作为你的答案,假设new_string=string,那么在获得start_pos和end_pos之后,您希望在新的_字符串中插入标记,用于插入的ruby文档是这样的,这将不起作用。插入的标记将破坏原始字符串,避免与另一个与当前子字符串重叠的子字符串进一步匹配。@sawa-你说得对。我更新了我的答案。谢谢
string = "MGLSDGEWQQVLNVWGKVEADIAGHGQEVLIHSKHPGDFGADAQGAMTKALELFRNDIAAKYKELGFQG"
substring1 = "HPGDFGADAQGAMTKALELFR"
substring2 = "GEWQQVLNVWGK"
substring3 = "ALELFRNDIAAKYK"
substrings = [substring1, substring2, substring3]
overlapping_indexes = substrings.map do |substring|
start_pos = string.index substring
end_pos = start_pos + substring.length
(start_pos..end_pos)
end
# the following 3 methods are from Wayne Conrad in this question: http://stackoverflow.com/questions/6017523/how-to-combine-overlapping-time-ranges-time-ranges-union
def ranges_overlap?(a, b)
a.include?(b.begin) || b.include?(a.begin)
end
def merge_ranges(a, b)
[a.begin, b.begin].min..[a.end, b.end].max
end
def merge_overlapping_ranges(ranges)
ranges.sort_by(&:begin).inject([]) do |ranges, range|
if !ranges.empty? && ranges_overlap?(ranges.last, range)
ranges[0...-1] + [merge_ranges(ranges.last, range)]
else
ranges + [range]
end
end
end
indexes = merge_overlapping_ranges(overlapping_indexes)
x = "<b>"
y = "</b>"
offset = 0
indexes.each do |index|
string.insert(index.begin + offset, x)
offset += x.length
string.insert(index.end + offset, y)
offset += y.length
end
p string