Java 方法查找包含给定单词的最短子字符串:需要优化

Java 方法查找包含给定单词的最短子字符串:需要优化,java,algorithm,Java,Algorithm,我有一个程序,要求我找到给定字符串的最短子段,包含一个单词列表。即使我的程序是正确的,我也无法在执行的时间范围内(5秒)交付。我想问题是因为我使用的复杂(琐碎)算法。它由嵌套循环组成,需要多次扫描单词列表数组a[]包含由单词存储的原始字符串,b[]包含组成子段的单词列表String g存储由原始字符串(包括列表中的单词)中的单词组成的临时子段 private static void search() // Function to find the subsegment with a requir

我有一个程序,要求我找到给定字符串的最短子段,包含一个单词列表。即使我的程序是正确的,我也无法在执行的时间范围内(5秒)交付。我想问题是因为我使用的复杂(琐碎)算法。它由嵌套循环组成,需要多次扫描单词列表数组
a[]
包含由单词存储的原始字符串,
b[]
包含组成子段的单词列表
String g
存储由原始字符串(包括列表中的单词)中的单词组成的临时子段

private static void search() // Function to find the subsegment with a required list of words
{
   int tail,x;//counters 
   String c[]=new String[b.length]; //initializing a temporary array to copy the list of words array.

   for(int i =0; i<a.length;i++)// looping throw original string array
    {
       System.arraycopy(b, 0, c, 0, b.length);//copying the list of words array to the temporary array

        for (int j=0;j<b.length;j++)//looping throw the temporary array
        { 
            x=0; //counter for temporary array

            if(a[i].equalsIgnoreCase(b[j]))//checking for a match with list of words
            {
                tail=i;
//adds the words starting from the position of the first found word(tail) till all the words from the list are found
                while((x<b.length)&&(tail<a.length))

                {
                    g=g+" "+a[tail];//adds the words in the string g

                    for(int k=0;k<c.length;k++) //checks for the found word from the temporary array and replaces it with ""    
                    {
                        if(c[k].equalsIgnoreCase(a[tail]))
                        {
                            c[k]=""; x++;
                        }
                    }
                    tail++;

                }
                if((tail==a.length)&&(x!=b.length))//checks if the string g does not contains all the listed word
                {
                    g="";
                }
                else
                    {
                    count(g);g="";//check for the shortest string.
                    }
            }
        }

    }print();
}
private static void search()//用于查找包含所需单词列表的子段的函数
{
int tail,x;//计数器
字符串c[]=新字符串[b.length];//初始化临时数组以复制单词数组列表。
对于(int i=0;i
String[]a;//更大的字符串
字符串[]b;//要搜索的单词列表
int指数=-1;
for(int i=0;i

如果你能忍受一个给定单词有多个实例,那就容易多了。这假设b中的每个单词都是唯一的。

另一种方法可能是将b[]中的每个单词映射到a[]中的出现索引

Map<Integer, List<Integer>> occurrence = new HashMap<Integer, List<Integer>>();
for(int idx = 0; idx < a.length; idx++)
{
  int bIdx = ... retrieve the index of the word a[idx] in b or -1 if it doesn't exist;

  if(bIdx >= 0)
  {
    List<Integer> bIdxOccurs = occurrence.get(bIdx);
    //some code to initially create the lists
    bIdxOccurs.add(idx);
  }
}
Map occurrence=newhashmap();
for(intidx=0;idx=0)
{
List bIdxOccurs=occurrence.get(bIdx);
//一些最初创建列表的代码
添加(idx);
}
}
然后从地图中的每个单词中找出索引最接近的匹配组合。最简单的方法是生成每个组合并比较最小索引和最大索引之间的距离,但可能有一种更快的方法。我必须考虑一下


最后,从最短序列的最小索引和最大索引之间的[]中提取每个单词。

以下是我想到的实现

//Implementing here with two List<String>
//Should be easy enough to use arrays, or streams, or whatever.
public static int getShortestSubseqWith(List<String> text, List<String> words) {
    int minDistance = Integer.MAX_VALUE;
    //Create a map of the last known position of each word
    Map<String, Integer> map = new HashMap();
    for (String word : words) {
        map.put(word, -1);
    }
    String word;
    //One loop through the main search string
    for (int position = 0; position < text.size(); position++){
        word = text.get(position);
        //If the current word found is in the list we're looking for
        if (map.containsKey(word)) {
            //Update the map
            map.put(word, position);
            //And if the current positions are the closest seen so far, update the min value.
            int curDistance = getCurDistance(map);
            if (curDistance < minDistance)
                minDistance = curDistance;
        }
    }
    return minDistance;
}

//Get the current distance between the last known position of each value in the map
private static int getCurDistance(Map<String, Integer> map) {
    int min = Integer.MAX_VALUE;
    int max = 0;
    for (Integer value : map.values()) {
        if (value == -1)
            return Integer.MAX_VALUE;
        else {
            max = Math.max(max,value);
            min = Math.min(min,value);
        }
    }
    return max - min;
}
//在这里用两个列表实现
//应该很容易使用数组、流或任何东西。
public static int getShortestSubseqWith(列出文本,列出单词){
int minDistance=Integer.MAX_值;
//创建每个单词最后已知位置的地图
Map Map=newhashmap();
for(字符串字:字){
地图放置(单词-1);
}
字符串字;
//通过主搜索字符串的一个循环
对于(int position=0;position

这里的主要性能影响是,如果点击次数相对较少,并且要搜索的术语列表相对较少,则应该是要搜索的
文本上的循环。如果点击次数非常频繁,性能可能会受到影响,因为通过
getCurDistance

更频繁地运行,我认为您可以通过havi来完成ng头和尾指针不断向内移动,直到不再匹配,然后对另一个指针执行相同的操作,并重复整个过程,直到不再向内移动。我可以稍后尝试对其进行编码。

动态规划解决方案:

为要查找的每个单词指定最后一个位置变量

有一个你正在寻找的清晰可见单词的总数(永远不会减少,max=你正在寻找的单词数)

对于输入中的每个单词位置:

  • 如果要查找的单词列表中存在该单词,请更新该单词的最后位置
  • 如果更新的最后一个位置未初始化,则增加总计数
  • 如果总计数等于最大值,则循环最后一个位置并找到最小的位置。当前位置和该值之间的距离将是子字符串的长度。记录这些值并在所有位置上找到最佳值
优化是对最后一个位置进行优化,以减少查找最小位置所需的时间(应与允许快速查找给定单词堆中指针的某种结构(可能是哈希或树映射)一起使用)

示例:

输入:
这是一个测试。这是一个编程测试。一个编程测试这是

寻找:
this,test,a,programming

                1    2  3  4     5    6  7  8           9     10 11          12   13   14
                This is a  test. This is a  programming test. a  programming test this is
this         -1 1    1  1  1     5    5  5  5           5     5  5           5    13   13
test         -1 -1   -1 -1 4     4    4  4  4           9     9  9           12   12   12
a            -1 -1   -1 3  3     3    3  7  7           7     10 10          10   10   10
programming  -1 -1   -1 -1 -1    -1   -1 -1 8           8     8  11          11   11   11
Count        0  1    1  2  3     3    3  3  4           4     4  4           4    4    4
Substr len   NA NA   NA NA NA    NA   NA NA 5           5     6  7           8    4    5
Shortest len NA NA   NA NA NA    NA   NA NA 5           5     5  5           5    4    4
最佳结果:
编程测试此
,长度=4

复杂性分析:

n
为输入中的字数,
k
为我们要查找的字数

该算法只进行一次传递
                1    2  3  4     5    6  7  8           9     10 11          12   13   14
                This is a  test. This is a  programming test. a  programming test this is
this         -1 1    1  1  1     5    5  5  5           5     5  5           5    13   13
test         -1 -1   -1 -1 4     4    4  4  4           9     9  9           12   12   12
a            -1 -1   -1 3  3     3    3  7  7           7     10 10          10   10   10
programming  -1 -1   -1 -1 -1    -1   -1 -1 8           8     8  11          11   11   11
Count        0  1    1  2  3     3    3  3  4           4     4  4           4    4    4
Substr len   NA NA   NA NA NA    NA   NA NA 5           5     6  7           8    4    5
Shortest len NA NA   NA NA NA    NA   NA NA 5           5     5  5           5    4    4

import collections
def minsubstring(sentence, words):
    sentence = sentence.split(' ')
    mintillnow = sentence
    words = set(words.split(' '))
    found = collections.defaultdict(lambda : [-1,-1])#position of word in the sentence and order of the word
    linked = [] # this together with 'found' provides the functionality of LinkedHashMap
    for i, word in enumerate(sentence):
        if word in words:
            found[word][0] = i
            if found[word][1] != -1:#if this word is already seen, remove it from linked list
                del(linked[found[word][1]])
            linked.append(word)#append the newly found word to the tail of the linked list
            # probably the worst part in this code, updating the indexes back to the map
            for i, wword in enumerate(linked):
                found[wword][1] = i
            # if found all the words, then check if the substring is smaller than the one till now and update
            if len(linked) == len(words):
                startPos = found[linked[0]][0]
                endPos = found[linked[-1]][0]
                if (endPos - startPos + 1) < len(mintillnow):
                    mintillnow = sentence[startPos:endPos + 1]
    return ' '.join(mintillnow)


>>> minsubstring('This is a test. This is a programming test. a programming test this is. ','this test a programming')
'a programming test this'
public final class MaxStringWindow {

    private MaxStringWindow() {}

    private static void addStringCount(Map<String, Integer> map, String str) {
        if (!map.containsKey(str)) {
            map.put(str, 1);
        } else {
            int val = map.get(str);
            map.put(str, val + 1);
        }
    }

    private static Map<String, Integer> toFindMap(List<String> strList) {
        final Map<String, Integer> toFind  = new HashMap<String, Integer>();
        for (String stri : strList) {
            addStringCount(toFind, stri);
        }
        return toFind;
    }


    public static int minWindowSize(String sentence, List<String> strList) {
        final Map<String, Integer> toFind = toFindMap(strList);
        final Map<String, Integer> hasFound  = new HashMap<String, Integer>();

        int matchCtr = 0;
        boolean matchFound = false;
        String currLeftMostString = null;

        int j = 0; // the trailing position of the sliding window
        int i = 0; // the leading position of the sliding window.

        int min = Integer.MAX_VALUE;

        String[] words = sentence.split(" "); 

        for (i = 0; i < words.length; i++) {

            if (!toFind.containsKey(words[i])) {
                continue;
            }

            if (!matchFound) {
                currLeftMostString = words[i];
                matchFound = true;
                j = i;  
            }

            addStringCount(hasFound, words[i]);

            matchCtr++;

            // check if match has been completed.
            if (matchCtr >= strList.size()) {
                if ((i - j + 1) < min) {
                    min = i - j + 1;
                }
            }

            // does the first element exceed value ?
            if (hasFound.get(currLeftMostString) > toFind.get(currLeftMostString)) {
                // advance the left pointer, such the window (i-j) is as small as possible.    
                while (!toFind.containsKey(words[j]) || hasFound.get(words[j]) > toFind.get(words[j])) {
                    if (hasFound.containsKey(words[j])) {
                        int val = hasFound.get(words[j]);
                        hasFound.put(words[j], val - 1);
                    } 
                    j++;
                }
                currLeftMostString = words[j];
            }   
        }


        if (matchCtr < strList.size()) {
            throw new IllegalArgumentException("The subset is not found in the input string.");
        }

        // note: here we dont do (i-j+1) since i has been incremented additionally in a for loop.
        return min > (i - j) ? i - j : min;
    }

}