Java 包含排除字符串列表的最长子字符串
我正在使用算法查找两个字符串之间的公共子字符串。请帮助我这样做,但使用此字符串的公共子字符串的Java 包含排除字符串列表的最长子字符串,java,string,algorithm,Java,String,Algorithm,我正在使用算法查找两个字符串之间的公共子字符串。请帮助我这样做,但使用此字符串的公共子字符串的数组,我应该在函数中忽略它 我的Java代码: public static String longestSubstring(String str1, String str2) { StringBuilder sb = new StringBuilder(); if (str1 == null || str1.isEmpty() || str2 == null || st
数组
,我应该在函数中忽略它
我的Java代码:
public static String longestSubstring(String str1, String str2) {
StringBuilder sb = new StringBuilder();
if (str1 == null || str1.isEmpty() || str2 == null || str2.isEmpty()) {
return "";
}
// java initializes them already with 0
int[][] num = new int[str1.length()][str2.length()];
int maxlen = 0;
int lastSubsBegin = 0;
for (int i = 0; i < str1.length(); i++) {
for (int j = 0; j < str2.length(); j++) {
if (str1.charAt(i) == str2.charAt(j)) {
if ((i == 0) || (j == 0)) {
num[i][j] = 1;
} else {
num[i][j] = 1 + num[i - 1][j - 1];
}
if (num[i][j] > maxlen) {
maxlen = num[i][j];
// generate substring from str1 => i
int thisSubsBegin = i - num[i][j] + 1;
if (lastSubsBegin == thisSubsBegin) {
//if the current LCS is the same as the last time this block ran
sb.append(str1.charAt(i));
} else {
//this block resets the string builder if a different LCS is found
lastSubsBegin = thisSubsBegin;
sb = new StringBuilder();
sb.append(str1.substring(lastSubsBegin, i + 1));
}
}
}
}
}
return sb.toString();
}
据我所知,您必须忽略那些至少包含
ignore
中一个字符串的子字符串
if (str1.charAt(i) == str2.charAt(j)) {
if ((i == 0) || (j == 0)) {
num[i][j] = 1;
} else {
num[i][j] = 1 + num[i - 1][j - 1];
}
// we must update `sb` on every step so that we can compare it with `ignore`
int thisSubsBegin = i - num[i][j] + 1;
if (lastSubsBegin == thisSubsBegin) {
sb.append(str1.charAt(i));
} else {
lastSubsBegin = thisSubsBegin;
sb = new StringBuilder();
sb.append(str1.substring(lastSubsBegin, i + 1));
}
// check whether current substring contains any string from `ignore`,
// and if it does, find the longest one
int biggestIndex = -1;
for (String s : ignore) {
int startIndex = sb.lastIndexOf(s);
if (startIndex > biggestIndex) {
biggestIndex = startIndex;
}
}
//Then sb.substring(biggestIndex + 1) will not contain strings to be ignored
sb = sb.substring(biggestIndex + 1);
num[i][j] -= (biggestIndex + 1);
if (num[i][j] > maxlen) {
maxlen = num[i][j];
}
}
如果必须忽略那些与忽略中的任何字符串完全相同的子字符串,
然后,当找到最长公共子字符串的候选字符串时,迭代忽略,并检查其中是否有当前子字符串。创建一个字符串的后缀树,然后运行第二个字符串,查看在后缀树中可以找到哪个子字符串
有关后缀树的信息:您当前的解决方案面临什么问题?对我来说,这似乎是个好代码。代码没有问题。阅读最后一条语句作为旁注:在许多情况下,被忽略的字符串集(停止字)最好存储在哈希映射/字典数据结构中。这是因为,如果每次都要对算法进行迭代,那么大量被忽略的单词将削弱算法。我对算法的建议是构造这个HashMap,然后在生成子字符串时,在循环的深度对单词进行ping,查看它是否存在于忽略的单词Hash中,如果不存在,则只添加它。
if (str1.charAt(i) == str2.charAt(j)) {
if ((i == 0) || (j == 0)) {
num[i][j] = 1;
} else {
num[i][j] = 1 + num[i - 1][j - 1];
}
// we must update `sb` on every step so that we can compare it with `ignore`
int thisSubsBegin = i - num[i][j] + 1;
if (lastSubsBegin == thisSubsBegin) {
sb.append(str1.charAt(i));
} else {
lastSubsBegin = thisSubsBegin;
sb = new StringBuilder();
sb.append(str1.substring(lastSubsBegin, i + 1));
}
// check whether current substring contains any string from `ignore`,
// and if it does, find the longest one
int biggestIndex = -1;
for (String s : ignore) {
int startIndex = sb.lastIndexOf(s);
if (startIndex > biggestIndex) {
biggestIndex = startIndex;
}
}
//Then sb.substring(biggestIndex + 1) will not contain strings to be ignored
sb = sb.substring(biggestIndex + 1);
num[i][j] -= (biggestIndex + 1);
if (num[i][j] > maxlen) {
maxlen = num[i][j];
}
}