Algorithm 查找字符串中最长的相似子序列_Algorithm_Data Structures

Algorithm 查找字符串中最长的相似子序列

algorithm data-structures

Algorithm 查找字符串中最长的相似子序列,algorithm,data-structures,Algorithm,Data Structures,假设我想找到最长的子序列，这样子序列的前半部分和后半部分相同例如：在字符串abkcjadfbck中，结果是abcabc，因为abc在它的上半部分和下半部分重复。在stirng aaa中，结果为aa 在第一次传递inputString时，我们可以计算每个字符出现的频率，并删除出现频率为1的字符 input: inputString data strucutres: Set<Triple<char[], Integer, Integer>> potentialSecondW

假设我想找到最长的子序列，这样子序列的前半部分和后半部分相同

例如：在字符串abkcjadfbck中，结果是abcabc，因为abc在它的上半部分和下半部分重复。在stirng aaa中，结果为aa

在第一次传递inputString时，我们可以计算每个字符出现的频率，并删除出现频率为1的字符

input: inputString
data strucutres:
Set<Triple<char[], Integer, Integer>> potentialSecondWords;
Map<Char, List<Integer>> lettersList;

for the characters c with increasing index h in inputString do
  if (!lettersList.get(c).isEmpty()) {
    for ((secondWord, currentIndex, maxIndex) in potentialSecondWords) {
       if (there exists a j in lettersList.get(c) between currentIndex and maxIndex) {
         update (secondWord, currentIndex, maxIndex) by adding c to secondWord and replacing currentIndex with j;
       }
    }
    if potentialSecondWords contains a triple whose char[] is equal to c, remove it;  
    put new Triple with value (c,lettersList.get(c).get(0), h-1) into potentialSecondWords;
  }
  lettersList.get(c).add(h);
}
find the largest secondWord in potentialSecondWords and output secondWord twice;

input:inputString
数据结构：
设置潜在的第二个单词；
地图信笺；
对于inputString do中索引h增加的字符c
如果（！lettersList.get（c.isEmpty（））{
for（（第二个字、currentIndex、maxIndex）在潜在的第二个字中）{
if（在currentIndex和maxIndex之间的lettsList.get（c）中存在一个j）{
更新（secondWord、currentIndex、maxIndex），将c添加到secondWord并用j替换currentIndex；
}
}
如果potentialSecondWords包含字符[]等于c的三元组，请将其删除；
将带值的新三元组（c，字母列表。get（c）。get（0），h-1）放入潜在的第二个单词中；
}
信函列表。获取（c）。添加（h）；
}
在潜在的secondwords中找到最大的secondWord并输出secondWord两次；

因此，该算法在数组上传递一次，为每个索引创建一个三元组，表示从当前索引开始的潜在第二个单词，并更新所有潜在的第二个单词

使用合适的列表实现，n是inputString的大小，此算法具有最坏情况下的运行时O（n²），例如对于^n

此任务可被视为两个已知问题的组合

如果预先知道子序列两半之间的某个点，则只需找到两个字符串的最佳匹配。这就是问题所在。各种动态规划方法在O（N2）时间内求解

要找到字符串应以最佳方式拆分的点，可以使用或斐波那契搜索。这些算法具有O（logn）时间复杂度

我不明白。第一个字符串中的

abc

在哪里？为什么第二个字符串的结果不是

aaa

？很明显，这更长。我想子序列并不意味着索引必须是连续的。结果aa是[index 0，index 1]，[index 1，index 2]或[index 0，index 2]。aaa有“aa”结果，因为“aa”的上半部分与下半部分相同。你能解释一下算法吗？因此，我从我们的方法中了解到。。从i=1:n开始，创建两个字符串并对其执行最长的公共子序列。因此，找到具有相似一半的最长子序列将是n*（n*n）的顺序。但是，我们可以生成所有可能的字符串（不仅仅是最长的）？例如，对于aaa，我们将有3个这样的字符串，可能是aa、aa、aa。（第一个“a”与第二个“a”，第一个“a”与第三个“a”，第二个“a”与第三个“a”）使用这些算法搜索最长的子序列是O（N^2 log N），因为使用黄金分割搜索，您不需要在每个可能的位置拆分字符串。但这不允许获取所有子序列。生成所有子序列是一项完全不同的任务，应该通过其他一些方法来解决。