Java 包含排除字符串列表的最长子字符串_Java_String_Algorithm

Java 包含排除字符串列表的最长子字符串

java string algorithm

Java 包含排除字符串列表的最长子字符串,java,string,algorithm,Java,String,Algorithm,我正在使用算法查找两个字符串之间的公共子字符串。请帮助我这样做，但使用此字符串的公共子字符串的数组，我应该在函数中忽略它我的Java代码： public static String longestSubstring(String str1, String str2) { StringBuilder sb = new StringBuilder(); if (str1 == null || str1.isEmpty() || str2 == null || st

我正在使用算法查找两个字符串之间的公共子字符串。请帮助我这样做，但使用此字符串的公共子字符串的

数组

，我应该在函数中忽略它

我的Java代码：

public static String longestSubstring(String str1, String str2) {

        StringBuilder sb = new StringBuilder();
        if (str1 == null || str1.isEmpty() || str2 == null || str2.isEmpty()) {
            return "";
        }

        // java initializes them already with 0
        int[][] num = new int[str1.length()][str2.length()];
        int maxlen = 0;
        int lastSubsBegin = 0;

        for (int i = 0; i < str1.length(); i++) {
            for (int j = 0; j < str2.length(); j++) {
                if (str1.charAt(i) == str2.charAt(j)) {
                    if ((i == 0) || (j == 0)) {
                        num[i][j] = 1;
                    } else {
                        num[i][j] = 1 + num[i - 1][j - 1];
                    }

                    if (num[i][j] > maxlen) {
                        maxlen = num[i][j];
                        // generate substring from str1 => i
                        int thisSubsBegin = i - num[i][j] + 1;
                        if (lastSubsBegin == thisSubsBegin) {
                            //if the current LCS is the same as the last time this block ran
                            sb.append(str1.charAt(i));
                        } else {
                            //this block resets the string builder if a different LCS is found
                            lastSubsBegin = thisSubsBegin;
                            sb = new StringBuilder();
                            sb.append(str1.substring(lastSubsBegin, i + 1));
                        }
                    }
                }
            }
        }

        return sb.toString();
    }

据我所知，您必须忽略那些至少包含

ignore

中一个字符串的子字符串

if (str1.charAt(i) == str2.charAt(j)) {
    if ((i == 0) || (j == 0)) {
        num[i][j] = 1;
    } else {
        num[i][j] = 1 + num[i - 1][j - 1];
    }


    // we must update `sb` on every step so that we can compare it with `ignore`
    int thisSubsBegin = i - num[i][j] + 1;
    if (lastSubsBegin == thisSubsBegin) {
        sb.append(str1.charAt(i));
    } else {
        lastSubsBegin = thisSubsBegin;
        sb = new StringBuilder();
        sb.append(str1.substring(lastSubsBegin, i + 1));
    }

    // check whether current substring contains any string from `ignore`,
    // and if it does, find the longest one
    int biggestIndex = -1; 
    for (String s : ignore) {
        int startIndex = sb.lastIndexOf(s);
        if (startIndex > biggestIndex) {
            biggestIndex = startIndex;    
        }
    }    

    //Then sb.substring(biggestIndex + 1) will not contain strings to be ignored 
    sb = sb.substring(biggestIndex + 1);
    num[i][j] -= (biggestIndex + 1);

    if (num[i][j] > maxlen) {
        maxlen = num[i][j];
    }
}

如果必须忽略那些与

忽略中的任何字符串完全相同的子字符串，
然后，当找到最长公共子字符串的候选字符串时，迭代忽略，并检查其中是否有当前子字符串。
创建一个字符串的后缀树，然后运行第二个字符串，查看在后缀树中可以找到哪个子字符串
有关后缀树的信息：您当前的解决方案面临什么问题？对我来说，这似乎是个好代码。代码没有问题。阅读最后一条语句作为旁注：在许多情况下，被忽略的字符串集（停止字）最好存储在哈希映射/字典数据结构中。这是因为，如果每次都要对算法进行迭代，那么大量被忽略的单词将削弱算法。我对算法的建议是构造这个HashMap，然后在生成子字符串时，在循环的深度对单词进行ping，查看它是否存在于忽略的单词Hash中，如果不存在，则只添加它。
if (str1.charAt(i) == str2.charAt(j)) {
    if ((i == 0) || (j == 0)) {
        num[i][j] = 1;
    } else {
        num[i][j] = 1 + num[i - 1][j - 1];
    }


    // we must update `sb` on every step so that we can compare it with `ignore`
    int thisSubsBegin = i - num[i][j] + 1;
    if (lastSubsBegin == thisSubsBegin) {
        sb.append(str1.charAt(i));
    } else {
        lastSubsBegin = thisSubsBegin;
        sb = new StringBuilder();
        sb.append(str1.substring(lastSubsBegin, i + 1));
    }

    // check whether current substring contains any string from `ignore`,
    // and if it does, find the longest one
    int biggestIndex = -1; 
    for (String s : ignore) {
        int startIndex = sb.lastIndexOf(s);
        if (startIndex > biggestIndex) {
            biggestIndex = startIndex;    
        }
    }    

    //Then sb.substring(biggestIndex + 1) will not contain strings to be ignored 
    sb = sb.substring(biggestIndex + 1);
    num[i][j] -= (biggestIndex + 1);

    if (num[i][j] > maxlen) {
        maxlen = num[i][j];
    }
}