Java 包含排除字符串列表的最长子字符串

Java 包含排除字符串列表的最长子字符串,java,string,algorithm,Java,String,Algorithm,我正在使用算法查找两个字符串之间的公共子字符串。请帮助我这样做,但使用此字符串的公共子字符串的数组,我应该在函数中忽略它 我的Java代码: public static String longestSubstring(String str1, String str2) { StringBuilder sb = new StringBuilder(); if (str1 == null || str1.isEmpty() || str2 == null || st

我正在使用算法查找两个字符串之间的公共子字符串。请帮助我这样做,但使用此字符串的公共子字符串的
数组
,我应该在函数中忽略它

我的Java代码:

public static String longestSubstring(String str1, String str2) {

        StringBuilder sb = new StringBuilder();
        if (str1 == null || str1.isEmpty() || str2 == null || str2.isEmpty()) {
            return "";
        }

        // java initializes them already with 0
        int[][] num = new int[str1.length()][str2.length()];
        int maxlen = 0;
        int lastSubsBegin = 0;

        for (int i = 0; i < str1.length(); i++) {
            for (int j = 0; j < str2.length(); j++) {
                if (str1.charAt(i) == str2.charAt(j)) {
                    if ((i == 0) || (j == 0)) {
                        num[i][j] = 1;
                    } else {
                        num[i][j] = 1 + num[i - 1][j - 1];
                    }

                    if (num[i][j] > maxlen) {
                        maxlen = num[i][j];
                        // generate substring from str1 => i
                        int thisSubsBegin = i - num[i][j] + 1;
                        if (lastSubsBegin == thisSubsBegin) {
                            //if the current LCS is the same as the last time this block ran
                            sb.append(str1.charAt(i));
                        } else {
                            //this block resets the string builder if a different LCS is found
                            lastSubsBegin = thisSubsBegin;
                            sb = new StringBuilder();
                            sb.append(str1.substring(lastSubsBegin, i + 1));
                        }
                    }
                }
            }
        }

        return sb.toString();
    } 

据我所知,您必须忽略那些至少包含
ignore
中一个字符串的子字符串

if (str1.charAt(i) == str2.charAt(j)) {
    if ((i == 0) || (j == 0)) {
        num[i][j] = 1;
    } else {
        num[i][j] = 1 + num[i - 1][j - 1];
    }


    // we must update `sb` on every step so that we can compare it with `ignore`
    int thisSubsBegin = i - num[i][j] + 1;
    if (lastSubsBegin == thisSubsBegin) {
        sb.append(str1.charAt(i));
    } else {
        lastSubsBegin = thisSubsBegin;
        sb = new StringBuilder();
        sb.append(str1.substring(lastSubsBegin, i + 1));
    }

    // check whether current substring contains any string from `ignore`,
    // and if it does, find the longest one
    int biggestIndex = -1; 
    for (String s : ignore) {
        int startIndex = sb.lastIndexOf(s);
        if (startIndex > biggestIndex) {
            biggestIndex = startIndex;    
        }
    }    

    //Then sb.substring(biggestIndex + 1) will not contain strings to be ignored 
    sb = sb.substring(biggestIndex + 1);
    num[i][j] -= (biggestIndex + 1);

    if (num[i][j] > maxlen) {
        maxlen = num[i][j];
    }
}
如果必须忽略那些与
忽略
中的任何字符串完全相同的子字符串,

然后,当找到最长公共子字符串的候选字符串时,迭代
忽略
,并检查其中是否有当前子字符串。

创建一个字符串的后缀树,然后运行第二个字符串,查看在后缀树中可以找到哪个子字符串


有关后缀树的信息:

您当前的解决方案面临什么问题?对我来说,这似乎是个好代码。代码没有问题。阅读最后一条语句作为旁注:在许多情况下,被忽略的字符串集(停止字)最好存储在哈希映射/字典数据结构中。这是因为,如果每次都要对算法进行迭代,那么大量被忽略的单词将削弱算法。我对算法的建议是构造这个HashMap,然后在生成子字符串时,在循环的深度对单词进行ping,查看它是否存在于忽略的单词Hash中,如果不存在,则只添加它。
if (str1.charAt(i) == str2.charAt(j)) {
    if ((i == 0) || (j == 0)) {
        num[i][j] = 1;
    } else {
        num[i][j] = 1 + num[i - 1][j - 1];
    }


    // we must update `sb` on every step so that we can compare it with `ignore`
    int thisSubsBegin = i - num[i][j] + 1;
    if (lastSubsBegin == thisSubsBegin) {
        sb.append(str1.charAt(i));
    } else {
        lastSubsBegin = thisSubsBegin;
        sb = new StringBuilder();
        sb.append(str1.substring(lastSubsBegin, i + 1));
    }

    // check whether current substring contains any string from `ignore`,
    // and if it does, find the longest one
    int biggestIndex = -1; 
    for (String s : ignore) {
        int startIndex = sb.lastIndexOf(s);
        if (startIndex > biggestIndex) {
            biggestIndex = startIndex;    
        }
    }    

    //Then sb.substring(biggestIndex + 1) will not contain strings to be ignored 
    sb = sb.substring(biggestIndex + 1);
    num[i][j] -= (biggestIndex + 1);

    if (num[i][j] > maxlen) {
        maxlen = num[i][j];
    }
}