String 如何找到包含给定字符串中所有字符的最小子字符串?

String 如何找到包含给定字符串中所有字符的最小子字符串?,string,algorithm,substring,String,Algorithm,Substring,我最近遇到了一个关于弦的有趣问题。假设您得到以下信息: Input string1: "this is a test string" Input string2: "tist" Output string: "t stri" 因此,如上所述,我如何找到string1中包含string2中所有字符的最小子字符串?这里有一个O(n)解决方案。基本思想很简单:对于每个起始索引,找到最小的结束索引,使子字符串包含所有必需的字母。诀窍是最小结束索引在函数的过程中增加,所以在一个小的数据结构支持下,我们认

我最近遇到了一个关于弦的有趣问题。假设您得到以下信息:

Input string1: "this is a test string"
Input string2: "tist"
Output string: "t stri"
因此,如上所述,我如何找到string1中包含string2中所有字符的最小子字符串?

这里有一个O(n)解决方案。基本思想很简单:对于每个起始索引,找到最小的结束索引,使子字符串包含所有必需的字母。诀窍是最小结束索引在函数的过程中增加,所以在一个小的数据结构支持下,我们认为每个字符最多两次。 在Python中:

from collections import defaultdict

def smallest(s1, s2):
    assert s2 != ''
    d = defaultdict(int)
    nneg = [0]  # number of negative entries in d
    def incr(c):
        d[c] += 1
        if d[c] == 0:
            nneg[0] -= 1
    def decr(c):
        if d[c] == 0:
            nneg[0] += 1
        d[c] -= 1
    for c in s2:
        decr(c)
    minlen = len(s1) + 1
    j = 0
    for i in xrange(len(s1)):
        while nneg[0] > 0:
            if j >= len(s1):
                return minlen
            incr(s1[j])
            j += 1
        minlen = min(minlen, j - i)
        decr(s1[i])
    return minlen

您可以在
O(N+M)
时间和
O(1)
空间中进行直方图扫描,其中
N
是第一个字符串中的字符数,
M
是第二个字符串中的字符数

它的工作原理如下:

  • 为第二个字符串的字符制作直方图(关键操作是
    hist2[s2[i]]+
  • 对第一个字符串的字符进行累积直方图,直到该直方图包含第二个字符串的直方图包含的每个字符(我称之为“直方图条件”)
  • 然后向前移动第一个字符串,从直方图中减去,直到它不符合直方图条件。将第一个字符串的那个位(在最后一步之前)标记为您的试探性子字符串
  • 再次向前移动子字符串的前面,直到再次满足直方图条件。将端部向前移动,直到再次发生故障。如果这是一个比第一个短的子字符串,请将其标记为您的暂定子字符串
  • 重复此操作,直到您通过了整个第一个字符串
  • 标记的子字符串是您的答案
请注意,通过改变对直方图条件使用的检查,您可以选择与第二个字符串具有相同的字符集,或者每种类型至少具有相同数量的字符。(这只是
a[i]>0和
a[i]>=b[i]
之间的区别。)


如果在尝试满足条件时跟踪哪些条件未满足,并且只检查在尝试打破条件时减少的内容,则可以加快直方图检查的速度。(在初始构建时,您计算满足的项目数,并在每次添加条件从false变为true的新字符时增加该计数。)

编辑::显然存在O(n)算法(参见算法学家的答案)。显然,这将打破下面描述的[幼稚]基线

太遗憾了,我得走了。。。我有点怀疑我们能得到O(n)。我明天去看胜利者;-)玩得开心

暂定算法
总体思路是按顺序尝试并使用str1中找到的str2中的一个字符作为搜索str2所有其他字母的开始(在任何一个/两个方向)。通过保持“到目前为止最佳匹配的长度”值,我们可以在搜索超过此值时中止搜索。其他启发式方法可能用于进一步中止次优(到目前为止)解决方案。str1中起始字母顺序的选择非常重要;建议从计数最低的str1字母开始,在随后的尝试中尝试使用计数增加的其他字母

  [loose pseudo-code]
  - get count for each letter/character in str1  (number of As, Bs etc.)
  - get count for each letter in str2
  - minLen = length(str1) + 1  (the +1 indicates you're not sure all chars of 
                                str2 are in str1)
  - Starting with the letter from string2 which is found the least in string1,
    look for other letters of Str2, in either direction of str1, until you've 
    found them all (or not, at which case response = impossible => done!). 
    set x = length(corresponding substring of str1).
 - if (x < minLen), 
         set minlen = x, 
         also memorize the start/len of the str1 substring.
 - continue trying with other letters of str1 (going the up the frequency
   list in str1), but abort search as soon as length(substring of strl) 
   reaches or exceed minLen.  
   We can find a few other heuristics that would allow aborting a 
   particular search, based on [pre-calculated ?] distance between a given
   letter in str1 and some (all?) of the letters in str2.
 - the overall search terminates when minLen = length(str2) or when 
   we've used all letters of str1 (which match one letter of str2)
   as a starting point for the search
[松散伪代码]
-获取str1中每个字母/字符的计数(As、Bs等的数量)
-获取str2中每个字母的计数
-minLen=length(str1)+1(+1表示您不确定
str2在str1中)
-从string1中最少的string2中的字母开始,
寻找Str2的其他字母,在str1的任意方向,直到你
全部找到(或者没有,在这种情况下response=impossable=>done!)。
设置x=长度(str1对应的子字符串)。
-如果(x
要查看包括工作代码在内的更多详细信息,请访问我的博客:

为了帮助说明这种方法,我使用了一个示例:string1=
“acbbaca”
和string2=
“aba”
。在这里,我们还使用术语“窗口”,这意味着来自string1的连续字符块(可以与术语子字符串互换)

i) string1=“acbbaca”和string2=“aba”

ii)找到第一个最小窗口。 请注意,我们不能从开始就前进 已找到的指针['a']== 需要查找['a']==2。推进会 意味着打破限制

iii)找到第二个窗口。开始 指针仍然指向第一个 元素“a”。hasFound['a'](3)是 大于查找['a']所需的值(2)。我们 减量已发现['a']减一且 前进开始指针指向右侧

iv)我们跳过“c”,因为找不到它 在string2中。开始指针现在指向“b”。 hasFound['b'](2)大于 需要查找['b'](1)。我们减量 已找到[b],并开始前进 指向右边的指针

v) 开始指针现在指向 下一个“b”。hasFound['b'](1)等于 需要查找['b'](1)。我们停下来 马上,这是我们最新的 找到最小窗口

这个想法主要是基于两个poi的帮助
//-----------------------------------------------------------------------

bool IsInSet(char ch, char* cSet)
{
    char* cSetptr = cSet;
    int index = 0;
    while (*(cSet+ index) != '\0')
    {
        if(ch == *(cSet+ index))
        {
            return true;            
        }
        ++index;
    }
    return false;
}

void removeChar(char ch, char* cSet)
{
    bool bShift = false;
    int index = 0;
    while (*(cSet + index) != '\0')
    {
        if( (ch == *(cSet + index)) || bShift)
        {
            *(cSet + index) = *(cSet + index + 1);
            bShift = true;
        }
        ++index;
    }
}
typedef struct subStr
{
    short iStart;
    short iEnd;
    short szStr;
}ss;

char* subStringSmallest(char* testStr, char* cSet)
{
    char* subString = NULL;
    int iSzSet = strlen(cSet) + 1;
    int iSzString = strlen(testStr)+ 1;
    char* cSetBackUp = new char[iSzSet];
    memcpy((void*)cSetBackUp, (void*)cSet, iSzSet);

    int iStartIndx = -1;    
    int iEndIndx = -1;
    int iIndexStartNext = -1;

    std::vector<ss> subStrVec;
    int index = 0;

    while( *(testStr+index) != '\0' )
    {
        if (IsInSet(*(testStr+index), cSetBackUp))
        {
            removeChar(*(testStr+index), cSetBackUp);

            if(iStartIndx < 0)
            {
                iStartIndx = index;
            }
            else if( iIndexStartNext < 0)
                iIndexStartNext = index;
            else
                ;

            if (strlen(cSetBackUp) == 0 )
            {
                iEndIndx = index;
                if( iIndexStartNext == -1)
                    break;
                else
                {
                    index = iIndexStartNext;
                    ss stemp = {iStartIndx, iEndIndx, (iEndIndx-iStartIndx + 1)};
                    subStrVec.push_back(stemp);
                    iStartIndx = iEndIndx = iIndexStartNext = -1;
                    memcpy((void*)cSetBackUp, (void*)cSet, iSzSet);
                    continue;
                }
            }
        }
        else
        {
            if (IsInSet(*(testStr+index), cSet))
            {
                if(iIndexStartNext < 0)
                    iIndexStartNext = index;
            }
        }

        ++index;
    }


    int indexSmallest = 0;
    for(int indexVec = 0; indexVec < subStrVec.size(); ++indexVec)
    {
        if(subStrVec[indexSmallest].szStr > subStrVec[indexVec].szStr)
            indexSmallest = indexVec;       
    }

    subString = new char[(subStrVec[indexSmallest].szStr) + 1];
    memcpy((void*)subString, (void*)(testStr+ subStrVec[indexSmallest].iStart), subStrVec[indexSmallest].szStr);
    memset((void*)(subString + subStrVec[indexSmallest].szStr), 0, 1);

    delete[] cSetBackUp;
    return subString;
}
//--------------------------------------------------------------------
private static Map<Character, Integer> frequency;
private static Set<Character> charsCovered;
private static Map<Character, Integer> encountered;
/**
 * To set the first match index as an intial start point
 */
private static boolean hasStarted = false;
private static int currentStartIndex = 0;
private static int finalStartIndex = 0;
private static int finalEndIndex = 0;
private static int minLen = Integer.MAX_VALUE;
private static int currentLen = 0;
/**
 * Whether we have already found the match and now looking for other
 * alternatives.
 */
private static boolean isFound = false;
private static char currentChar;

public static String findSmallestSubStringWithAllChars(String big, String small) {

    if (null == big || null == small || big.isEmpty() || small.isEmpty()) {
        return null;
    }

    frequency = new HashMap<Character, Integer>();
    instantiateFrequencyMap(small);
    charsCovered = new HashSet<Character>();
    int charsToBeCovered = frequency.size();
    encountered = new HashMap<Character, Integer>();

    for (int i = 0; i < big.length(); i++) {
        currentChar = big.charAt(i);
        if (frequency.containsKey(currentChar) && !isFound) {
            if (!hasStarted && !isFound) {
                hasStarted = true;
                currentStartIndex = i;
            }
            updateEncounteredMapAndCharsCoveredSet(currentChar);
            if (charsCovered.size() == charsToBeCovered) {
                currentLen = i - currentStartIndex;
                isFound = true;
                updateMinLength(i);
            }
        } else if (frequency.containsKey(currentChar) && isFound) {
            updateEncounteredMapAndCharsCoveredSet(currentChar);
            if (currentChar == big.charAt(currentStartIndex)) {
                encountered.put(currentChar, encountered.get(currentChar) - 1);
                currentStartIndex++;
                while (currentStartIndex < i) {
                    if (encountered.containsKey(big.charAt(currentStartIndex))
                            && encountered.get(big.charAt(currentStartIndex)) > frequency.get(big
                                    .charAt(currentStartIndex))) {
                        encountered.put(big.charAt(currentStartIndex),
                                encountered.get(big.charAt(currentStartIndex)) - 1);
                    } else if (encountered.containsKey(big.charAt(currentStartIndex))) {
                        break;
                    }
                    currentStartIndex++;
                }
            }
            currentLen = i - currentStartIndex;
            updateMinLength(i);
        }
    }
    System.out.println("start: " + finalStartIndex + " finalEnd : " + finalEndIndex);
    return big.substring(finalStartIndex, finalEndIndex + 1);
}

private static void updateMinLength(int index) {
    if (minLen > currentLen) {
        minLen = currentLen;
        finalStartIndex = currentStartIndex;
        finalEndIndex = index;
    }

}

private static void updateEncounteredMapAndCharsCoveredSet(Character currentChar) {
    if (encountered.containsKey(currentChar)) {
        encountered.put(currentChar, encountered.get(currentChar) + 1);
    } else {
        encountered.put(currentChar, 1);
    }

    if (encountered.get(currentChar) >= frequency.get(currentChar)) {
        charsCovered.add(currentChar);
    }
}

private static void instantiateFrequencyMap(String str) {

    for (char c : str.toCharArray()) {
        if (frequency.containsKey(c)) {
            frequency.put(c, frequency.get(c) + 1);
        } else {
            frequency.put(c, 1);
        }
    }

}

public static void main(String[] args) {

    String big = "this is a test string";
    String small = "tist";
    System.out.println("len: " + big.length());
    System.out.println(findSmallestSubStringWithAllChars(big, small));
}
public static String shortestSubstrContainingAllChars(String input, String target) {
    int needToFind[] = new int[256];
    int hasFound[] = new int[256];
    int totalCharCount = 0;
    String result = null;

    char[] targetCharArray = target.toCharArray();
    for (int i = 0; i < targetCharArray.length; i++) {
        needToFind[targetCharArray[i]]++;           
    }

    char[] inputCharArray = input.toCharArray();
    for (int begin = 0, end = 0; end < inputCharArray.length; end++) {

        if (needToFind[inputCharArray[end]] == 0) {
            continue;
        }

        hasFound[inputCharArray[end]]++;
        if (hasFound[inputCharArray[end]] <= needToFind[inputCharArray[end]]) {
            totalCharCount ++;
        }
        if (totalCharCount == target.length()) {
            while (needToFind[inputCharArray[begin]] == 0 
                    || hasFound[inputCharArray[begin]] > needToFind[inputCharArray[begin]]) {

                if (hasFound[inputCharArray[begin]] > needToFind[inputCharArray[begin]]) {
                    hasFound[inputCharArray[begin]]--;
                }
                begin++;
            }

            String substring = input.substring(begin, end + 1);
            if (result == null || result.length() > substring.length()) {
                result = substring;
            }
        }
    }
    return result;
}
@Test
public void shortestSubstringContainingAllCharsTest() {
    String result = StringUtil.shortestSubstrContainingAllChars("acbbaca", "aba");
    assertThat(result, equalTo("baca"));

    result = StringUtil.shortestSubstrContainingAllChars("acbbADOBECODEBANCaca", "ABC");
    assertThat(result, equalTo("BANC"));

    result = StringUtil.shortestSubstrContainingAllChars("this is a test string", "tist");
    assertThat(result, equalTo("t stri"));
}
import java.io.*;
import  java.util.*;

class UserMainCode
{


    public String GetSubString(String input1,String input2){
        // Write code here...
        return find(input1, input2);
    }
  private static boolean containsPatternChar(int[] sCount, int[] pCount) {
        for(int i=0;i<256;i++) {
            if(pCount[i]>sCount[i])
                return false;
        }
        return true;
    }
  public static String find(String s, String p) {
        if (p.length() > s.length())
            return null;
        int[] pCount = new int[256];
        int[] sCount = new int[256];
        // Time: O(p.lenght)
        for(int i=0;i<p.length();i++) {
            pCount[(int)(p.charAt(i))]++;
            sCount[(int)(s.charAt(i))]++;
        }
        int i = 0, j = p.length(), min = Integer.MAX_VALUE;
        String res = null;
        // Time: O(s.lenght)
        while (j < s.length()) {
            if (containsPatternChar(sCount, pCount)) {
                if ((j - i) < min) {
                    min = j - i;
                    res = s.substring(i, j);
                    // This is the smallest possible substring.
                    if(min==p.length())
                        break;
                    // Reduce the window size.
                    sCount[(int)(s.charAt(i))]--;
                    i++;
                }
            } else {
                sCount[(int)(s.charAt(j))]++;
                // Increase the window size.
                j++;
            }
        }
        System.out.println(res);
        return res;
    }
}
#include <iostream>
#include <vector>
#include <string>
#include <climits>
using namespace std;
string find_minimum_window(string s, string t) {
    if(s.empty() || t.empty()) return;

    int ns = s.size(), nt = t.size();
    vector<int> total(256, 0);
    vector<int> sofar(256, 0);
    for(int i=0; i<nt; i++) 
        total[t[i]]++;

    int L = 0, R; 
    int minL = 0;                           //gist2
    int count = 0;
    int min_win_len = INT_MAX;

    for(R=0; R<ns; R++) {                   // gist0, a big for loop
        if(total[s[R]] == 0) continue;
        else sofar[s[R]]++;

        if(sofar[s[R]] <= total[s[R]])      // gist1, <= not <
            count++;

        if(count == nt) {                   // POS1
            while(true) {
                char c = s[L]; 
                if(total[c] == 0) { L++; }
                else if(sofar[c] > total[c]) {
                    sofar[c]--;
                    L++;
                }
                else break;
            }  
            if(R - L + 1 < min_win_len) {   // this judge should be inside POS1
                min_win_len = R - L + 1;
                minL = L;
            }
        }
    }
    string res;
    if(count == nt)                         // gist3, cannot forget this. 
        res = s.substr(minL, min_win_len);  // gist4, start from "minL" not "L"
    return res;
}
int main() {
    string s = "abdccdedca";
    cout << find_minimum_window(s, "acd");
}
-module(leetcode).

-export([min_window/0]).

%% Given a string S and a string T, find the minimum window in S which will contain all the characters in T in complexity O(n).

%% For example,
%% S = "ADOBECODEBANC"
%% T = "ABC"
%% Minimum window is "BANC".

%% Note:
%% If there is no such window in S that covers all characters in T, return the emtpy string "".
%% If there are multiple such windows, you are guaranteed that there will always be only one unique minimum window in S.



min_window() ->
    "eca" = min_window("cabeca", "cae"),
    "eca" = min_window("cfabeca", "cae"),
    "aec" = min_window("cabefgecdaecf", "cae"),
    "cwae" = min_window("cabwefgewcwaefcf", "cae"),
    "BANC" = min_window("ADOBECODEBANC", "ABC"),
    ok.

min_window(T, S) ->
    min_window(T, S, []).

min_window([], _T, MinWindow) ->
    MinWindow;
min_window([H | Rest], T, MinWindow) ->
    NewMinWindow = case lists:member(H, T) of
                       true ->
                           MinWindowFound = fullfill_window(Rest, lists:delete(H, T), [H]),
                           case length(MinWindow) == 0 orelse (length(MinWindow) > length(MinWindowFound)
                               andalso length(MinWindowFound) > 0) of
                               true ->
                                   MinWindowFound;
                               false ->
                                   MinWindow
                           end;
                       false ->
                           MinWindow
                   end,
    min_window(Rest, T, NewMinWindow).

fullfill_window(_, [], Acc) ->
    %% window completed
    Acc;
fullfill_window([], _T, _Acc) ->
    %% no window found
    "";
fullfill_window([H | Rest], T, Acc) ->
    %% completing window
    case lists:member(H, T) of
        true ->
            fullfill_window(Rest, lists:delete(H, T), Acc ++ [H]);
        false ->
            fullfill_window(Rest, T, Acc ++ [H])
    end.
charcount = { 'a': 3, 'b' : 1 };
str = "kjhdfsbabasdadaaaaasdkaaajbajerhhayeom"

def find (c, s):
  Ns = len (s)

  C = list (c.keys ())
  D = list (c.values ())

  # prime numbers assigned to the first 25 chars
  prmsi = [ 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89 , 97]

  # primes used in the key, all other set to 1
  prms = []
  Cord = [ord(c) - ord('a') for c in C]

  for e,p in enumerate(prmsi):
    if e in Cord:
      prms.append (p)
    else:
      prms.append (1)

  # Product of match
  T = 1
  for c,d in zip(C,D):
    p = prms[ord (c) - ord('a')]
    T *= p**d

  print ("T=", T)

  t = 1 # product of current string
  f = 0
  i = 0

  matches = []
  mi = 0
  mn = Ns
  mm = 0

  while i < Ns:
    k = prms[ord(s[i]) - ord ('a')]
    t *= k

    print ("testing:", s[f:i+1])

    if (t > T):
      # included too many chars: move start
      t /= prms[ord(s[f]) - ord('a')] # remove first char, usually division by 1
      f += 1 # increment start position
      t /= k # will be retested, could be replaced with bool

    elif t == T:
      # found match
      print ("FOUND match:", s[f:i+1])
      matches.append (s[f:i+1])

      if (i - f) < mn:
        mm = mi
        mn = i - f

      mi += 1

      t /= prms[ord(s[f]) - ord('a')] # remove first matching char

      # look for next match
      i += 1
      f += 1

    else:
      # no match yet, keep searching
      i += 1

  return (mm, matches)


print (find (charcount, str))
def minimum_window(s, t, min_length = 100000):
    d = {}
    for x in t:
        if x in d:
            d[x]+= 1
        else:
            d[x] = 1

    tot = sum([y for x,y in d.iteritems()])
    l = []
    ind = 0 
    for i,x in enumerate(s):
        if ind == 1:
            l = l + [x]
        if x in d:
            tot-=1
            if not l:
                ind = 1
                l = [x]

        if tot == 0:
            if len(l)<min_length:
                min_length = len(l)
                min_length = minimum_window(s[i+1:], t, min_length)

return min_length

l_s = "ADOBECODEBANC"
t_s = "ABC"

min_length = minimum_window(l_s, t_s)

if min_length == 100000:
      print "Not found"
else:
      print min_length
public static Tuple<int, int> FindMinSubstringWindow(string input, string pattern)
{
    Tuple<int, int> windowCoords = new Tuple<int, int>(0, input.Length - 1);
    int[] patternHist = new int[256];
    for (int i = 0; i < pattern.Length; i++)
    {
        patternHist[pattern[i]]++;
    }
    int[] inputHist = new int[256];
    int minWindowLength = int.MaxValue;
    int count = 0;
    for (int begin = 0, end = 0; end < input.Length; end++)
    {
        // Skip what's not in pattern.
        if (patternHist[input[end]] == 0)
        {
            continue;
        }
        inputHist[input[end]]++;
        // Count letters that are in pattern.
        if (inputHist[input[end]] <= patternHist[input[end]])
        {
            count++;
        }
        // Window found.
        if (count == pattern.Length)
        {
            // Remove extra instances of letters from pattern
            // or just letters that aren't part of the pattern
            // from the beginning.
            while (patternHist[input[begin]] == 0 ||
                   inputHist[input[begin]] > patternHist[input[begin]])
            {
                if (inputHist[input[begin]] > patternHist[input[begin]])
                {
                    inputHist[input[begin]]--;
                }
                begin++;
            }
            // Current window found.
            int windowLength = end - begin + 1;
            if (windowLength < minWindowLength)
            {
                windowCoords = new Tuple<int, int>(begin, end);
                minWindowLength = windowLength;
            }
        }
    }
    if (count == pattern.Length)
    {
        return windowCoords;
    }
    return null;
}
def get(s, alphabet="abc"):
    seen = {}
    for c in alphabet:
        seen[c] = 0
    seen[s[0]] = 1
    start = 0
    end = 0
    shortest_s = 0
    shortest_e = 99999
    while end + 1 < len(s):
        while seen[s[start]] > 1:
            seen[s[start]] -= 1
            start += 1
        # Constant time check:
        if sum(seen.values()) == len(alphabet) and all(v == 1 for v in seen.values()) and \
                shortest_e - shortest_s > end - start:
            shortest_s = start
            shortest_e = end
        end += 1
        seen[s[end]] += 1
    return s[shortest_s: shortest_e + 1]


print(get("abbcac")) # Expected to return "bca"
    String s = "xyyzyzyx";
    String s1 = "xyz";
    String finalString ="";
    Map<Character,Integer> hm = new HashMap<>();
    if(s1!=null && s!=null && s.length()>s1.length()){
        for(int i =0;i<s1.length();i++){
            if(hm.get(s1.charAt(i))!=null){
                int k = hm.get(s1.charAt(i))+1;
                hm.put(s1.charAt(i), k);
            }else
                hm.put(s1.charAt(i), 1);
        }
        Map<Character,Integer> t = new HashMap<>();
        int start =-1;
         for(int j=0;j<s.length();j++){
             if(hm.get(s.charAt(j))!=null){
                 if(t.get(s.charAt(j))!=null){
                     if(t.get(s.charAt(j))!=hm.get(s.charAt(j))){
                     int k = t.get(s.charAt(j))+1;
                        t.put(s.charAt(j), k);
                     }
                 }else{
                     t.put(s.charAt(j), 1);
                     if(start==-1){
                         if(j+s1.length()>s.length()){
                             break;
                         }
                         start = j;
                     }
                 }
                 if(hm.equals(t)){
                    t = new HashMap<>();
                    if(finalString.length()<s.substring(start,j+1).length());
                    {
                        finalString=s.substring(start,j+1);
                    }
                    j=start;
                    start=-1;                       
                 }
             }
         }