Java 查找包含数组中所有单词的字符串的子字符串

Java 查找包含数组中所有单词的字符串的子字符串,java,string,substring,unordered,Java,String,Substring,Unordered,我有一个字符串和一个单词数组,我必须编写代码来查找字符串的所有子字符串,这些子字符串以任何顺序包含数组中的所有单词。字符串不包含任何特殊字符/数字,每个单词之间用空格分隔 例如: 给定字符串: aaaa aaaa aaaa aaaa cccc bbbb bbbb bbbb bbbb aaaa bbbb cccc aaaa bbbb cccc aaaa aaaa aaaa aaaa cccc bbbb bbbb bbbb bbbb aaaa aaaa aaaa aaaa cccc b

我有一个字符串和一个单词数组,我必须编写代码来查找字符串的所有子字符串,这些子字符串以任何顺序包含数组中的所有单词。字符串不包含任何特殊字符/数字,每个单词之间用空格分隔

例如:

给定字符串:

aaaa aaaa aaaa aaaa cccc bbbb bbbb bbbb bbbb aaaa bbbb cccc
aaaa
bbbb
cccc
aaaa aaaa aaaa aaaa cccc bbbb bbbb bbbb bbbb    

aaaa aaaa aaaa aaaa cccc bbbb    

aaaa cccc bbbb bbbb bbbb bbbb    

cccc bbbb bbbb bbbb bbbb aaaa  

aaaa cccc bbbb
    for(int i=0;i<str_arr.length;i++)
    {
        if( (str_arr.length - i) >= words.length)
        {
            String res = check(i);
            if(!res.equals(""))
            {
                System.out.println(res);
                System.out.println("");
            }
            reset_all();
        }
        else
        {
            break;
        }
    }

public static String check(int i)
{
    String res = "";
    num_words = 0;

    for(int j=i;j<str_arr.length;j++)
    {
        if(has_word(str_arr[j]))
        {
            t.put(str_arr[j].toLowerCase(), 1);
            h.put(str_arr[j].toLowerCase(), 1);

            res = res + str_arr[j]; //+ " ";

            if(all_complete())
            {
                return res;
            }

            res = res + " ";
        }
        else
        {
            res = res + str_arr[j] + " ";
        }

    }
    res = "";
    return res;
}
数组中的单词:

aaaa aaaa aaaa aaaa cccc bbbb bbbb bbbb bbbb aaaa bbbb cccc
aaaa
bbbb
cccc
aaaa aaaa aaaa aaaa cccc bbbb bbbb bbbb bbbb    

aaaa aaaa aaaa aaaa cccc bbbb    

aaaa cccc bbbb bbbb bbbb bbbb    

cccc bbbb bbbb bbbb bbbb aaaa  

aaaa cccc bbbb
    for(int i=0;i<str_arr.length;i++)
    {
        if( (str_arr.length - i) >= words.length)
        {
            String res = check(i);
            if(!res.equals(""))
            {
                System.out.println(res);
                System.out.println("");
            }
            reset_all();
        }
        else
        {
            break;
        }
    }

public static String check(int i)
{
    String res = "";
    num_words = 0;

    for(int j=i;j<str_arr.length;j++)
    {
        if(has_word(str_arr[j]))
        {
            t.put(str_arr[j].toLowerCase(), 1);
            h.put(str_arr[j].toLowerCase(), 1);

            res = res + str_arr[j]; //+ " ";

            if(all_complete())
            {
                return res;
            }

            res = res + " ";
        }
        else
        {
            res = res + str_arr[j] + " ";
        }

    }
    res = "";
    return res;
}
输出示例:

aaaa aaaa aaaa aaaa cccc bbbb bbbb bbbb bbbb aaaa bbbb cccc
aaaa
bbbb
cccc
aaaa aaaa aaaa aaaa cccc bbbb bbbb bbbb bbbb    

aaaa aaaa aaaa aaaa cccc bbbb    

aaaa cccc bbbb bbbb bbbb bbbb    

cccc bbbb bbbb bbbb bbbb aaaa  

aaaa cccc bbbb
    for(int i=0;i<str_arr.length;i++)
    {
        if( (str_arr.length - i) >= words.length)
        {
            String res = check(i);
            if(!res.equals(""))
            {
                System.out.println(res);
                System.out.println("");
            }
            reset_all();
        }
        else
        {
            break;
        }
    }

public static String check(int i)
{
    String res = "";
    num_words = 0;

    for(int j=i;j<str_arr.length;j++)
    {
        if(has_word(str_arr[j]))
        {
            t.put(str_arr[j].toLowerCase(), 1);
            h.put(str_arr[j].toLowerCase(), 1);

            res = res + str_arr[j]; //+ " ";

            if(all_complete())
            {
                return res;
            }

            res = res + " ";
        }
        else
        {
            res = res + str_arr[j] + " ";
        }

    }
    res = "";
    return res;
}
我已经使用for循环实现了这一点,但这是非常低效的

我怎样才能更有效地做到这一点

我的代码:

aaaa aaaa aaaa aaaa cccc bbbb bbbb bbbb bbbb aaaa bbbb cccc
aaaa
bbbb
cccc
aaaa aaaa aaaa aaaa cccc bbbb bbbb bbbb bbbb    

aaaa aaaa aaaa aaaa cccc bbbb    

aaaa cccc bbbb bbbb bbbb bbbb    

cccc bbbb bbbb bbbb bbbb aaaa  

aaaa cccc bbbb
    for(int i=0;i<str_arr.length;i++)
    {
        if( (str_arr.length - i) >= words.length)
        {
            String res = check(i);
            if(!res.equals(""))
            {
                System.out.println(res);
                System.out.println("");
            }
            reset_all();
        }
        else
        {
            break;
        }
    }

public static String check(int i)
{
    String res = "";
    num_words = 0;

    for(int j=i;j<str_arr.length;j++)
    {
        if(has_word(str_arr[j]))
        {
            t.put(str_arr[j].toLowerCase(), 1);
            h.put(str_arr[j].toLowerCase(), 1);

            res = res + str_arr[j]; //+ " ";

            if(all_complete())
            {
                return res;
            }

            res = res + " ";
        }
        else
        {
            res = res + str_arr[j] + " ";
        }

    }
    res = "";
    return res;
}
for(int i=0;i=words.length)
{
字符串res=检查(i);
如果(!res.equals(“”)
{
系统输出打印项次(res);
System.out.println(“”);
}
重置_all();
}
其他的
{
打破
}
}
公共静态字符串检查(int i)
{
字符串res=“”;
num_words=0;

对于(intj=i;j我的第一种方法类似于下面的伪代码

  for word:string {
    if word in array {
      for each stored potential substring {
        if word wasnt already found {
          remove word from notAlreadyFoundList
          if notAlreadyFoundList is empty {
            use starting pos and ending pos to save our substring
          }
        }
      store position and array-word as potential substring
  }
这应该有不错的性能,因为您只遍历字符串一次

[编辑]

这是我的伪代码的一个实现,请尝试一下,看看它的性能是更好还是更差。它的工作原理是假设在找到最后一个单词后立即找到匹配的子字符串。如果确实需要所有匹配项,请更改标记为
//ALLMATCHES
的行:

class SubStringFinder {
    String textString = "aaaa aaaa aaaa aaaa cccc bbbb bbbb bbbb bbbb aaaa bbbb cccc";
    Set<String> words = new HashSet<String>(Arrays.asList("aaaa", "bbbb", "cccc"));

    public static void main(String[] args) {
        new SubStringFinder();
    }

    public SubStringFinder() {
        List<PotentialMatch> matches = new ArrayList<PotentialMatch>();
        for (String textPart : textString.split(" ")) {
            if (words.contains(textPart)) {
                for (Iterator<PotentialMatch> matchIterator = matches.iterator(); matchIterator.hasNext();) {
                    PotentialMatch match = matchIterator.next();
                    String result = match.tryMatch(textPart);
                    if (result != null) {
                        System.out.println("Match found: \"" + result + "\"");
                        matchIterator.remove(); //ALLMATCHES - remove this line
                    }
                }
                Set<String> unfound = new HashSet<String>(words);
                unfound.remove(textPart);
                matches.add(new PotentialMatch(unfound, textPart));
            }// ALLMATCHES add these lines 
             // else {
             // matches.add(new PotentialMatch(new HashSet<String>(words), textPart));
             // }
        }
    }

    class PotentialMatch {
        Set<String> unfoundWords;
        StringBuilder stringPart;
        public PotentialMatch(Set<String> unfoundWords, String part) {
            this.unfoundWords = unfoundWords;
            this.stringPart = new StringBuilder(part);
        }
        public String tryMatch(String part) {
            this.stringPart.append(' ').append(part);
            unfoundWords.remove(part);                
            if (unfoundWords.isEmpty()) {
                return this.stringPart.toString();
            }
            return null;
        }
    }
}
类子字符串查找器{
String textString=“aaaa aaaa aaaa aaaa cccc BBBBBBBBBBBBBB aaaa bbbb cccc”;
Set words=newhashset(Arrays.asList(“aaaa”、“bbbb”、“cccc”);
公共静态void main(字符串[]args){
新的子字符串查找器();
}
公共子字符串查找器(){
列表匹配项=新的ArrayList();
用于(字符串文本部分:textString.split(“”){
if(words.contains(textPart)){
for(迭代器matchIterator=matches.Iterator();matchIterator.hasNext();){
PotentialMatch=matchIterator.next();
字符串结果=match.tryMatch(textPart);
如果(结果!=null){
System.out.println(“找到匹配项:\”“+结果+\”);
matchIterator.remove();//所有匹配项-删除此行
}
}
Set unfound=新哈希集(单词);
取消查找。删除(文本部分);
添加(新的潜在匹配(未找到,文本部分));
}//所有匹配添加这些行
//否则{
//添加(新的潜在匹配(新的HashSet(单词),textPart));
// }
}
}
类电位匹配{
说空话,;
StringBuilder stringPart;
公共潜在匹配(设置无效词、字符串部分){
this.unfoundWords=无根据的话;
this.stringPart=新的StringBuilder(零件);
}
公共字符串tryMatch(字符串部分){
this.stringPart.append(“”).append(part);
删除(部分);
if(unfoundWords.isEmpty()){
返回此.stringPart.toString();
}
返回null;
}
}
}

以下是另一种方法:

public static void main(String[] args) throws FileNotFoundException {
    // init
    List<String> result = new ArrayList<String>();
    String string = "aaaa aaaa aaaa aaaa cccc bbbb bbbb bbbb bbbb aaaa bbbb cccc";
    String[] words = { "aaaa", "bbbb", "cccc" };
    // find all combs as regexps (e.g. "(aaaa )+(bbbb )+(cccc )*cccc", "(aaaa )+(cccc )+(bbbb )*bbbb")
    List<String> regexps = findCombs(Arrays.asList(words));
    // compile and add
    for (String regexp : regexps) {
        Pattern p = Pattern.compile(regexp);
        Matcher m = p.matcher(string);
        while (m.find()) {
            result.add(m.group());
        }
    }
    System.out.println(result);
}

private static List<String> findCombs(List<String> words) {
    if (words.size() == 1) {
        words.set(0, "(" + Pattern.quote(words.get(0)) + " )*" + Pattern.quote(words.get(0)));
        return words;
    }
    List<String> list = new ArrayList<String>();
    for (String word : words) {
        List<String> tail = new LinkedList<String>(words);
        tail.remove(word);
        for (String s : findCombs(tail)) {
            list.add("(" + Pattern.quote(word) + " ?)+" + s);
        }
    }
    return list;
}

我知道结果是不完整的:你只得到了可用的组合,完全扩展,但你得到了所有的组合。

如果你能举个例子就更好了,为什么不展示一下你到目前为止拥有的东西?限制是什么?字符串中的字符数,字数?我不知道你是如何得到结果的为什么
aaaa aaaa aaaaaa cccc bbbbbbbbbbbbbbbbbbbbbb
a匹配而非
aaaa aaaa aaaa cccc bbbbbbbbbbbb
aaaa aaaa aaaa cccc bbbbbbbbbbbbbb
?在上述代码中也做了同样的事情&通过使用treemap搜索得到o(log(n))时间复杂度…对于字符串中的每个单词,您似乎要遍历字符串一次,这将给您带来O(n^2)复杂度。此树映射在代码中的何处?如何使用映射优化嵌套for循环?此部分代码使用树映射检索单词t.put(str_arr[j].toLowerCase(),1);那么您仍然有一个嵌套的for循环,它给出了O(n^2*logn)。