在Java中使用集合中的通配符返回字符串列表的最快方法_Java_Search_Collections

在Java中使用集合中的通配符返回字符串列表的最快方法

java search collections

在Java中使用集合中的通配符返回字符串列表的最快方法,java,search,collections,Java,Search,Collections,我有一套10万根绳子。例如，我想从这个集合中得到所有以JO开头的字符串。最好的解决方案是什么我认为不支持通配符。如果希望所有字符串都以一个序列开头，可以将所有字符串添加到类似树集的NavigableSet中，并获得子文本，text+'\uFFFF'将提供所有以文本开头的条目。此查找为Olog n 如果希望所有以序列结尾的字符串，可以执行类似的操作，但必须反转字符串。在这种情况下，从反向字符串到正向字符串的树映射是一种更好的结构如果需要x*z，可以使用第一个集合进行搜索，并使用映射的值进行并集

我有一套10万根绳子。例如，我想从这个集合中得到所有以JO开头的字符串。最好的解决方案是什么

我认为不支持通配符。

如果希望所有字符串都以一个序列开头，可以将所有字符串添加到类似树集的NavigableSet中，并获得子文本，text+'\uFFFF'将提供所有以文本开头的条目。此查找为Olog n

如果希望所有以序列结尾的字符串，可以执行类似的操作，但必须反转字符串。在这种情况下，从反向字符串到正向字符串的树映射是一种更好的结构

如果需要x*z，可以使用第一个集合进行搜索，并使用映射的值进行并集

如果希望包含x，可以使用可导航的，其中键是从第一个、第二个、第三个字符开始的每个字符串。值是一个集合，因为可以获得重复项。您可以执行类似“从结构开始”的搜索。

这里有一个自定义matcher类，它不使用正则表达式进行匹配。它只在构造函数中使用正则表达式，更准确地说，它支持通配符匹配：

public class WildCardMatcher {
    private Iterable<String> patternParts;
    private boolean openStart;
    private boolean openEnd;

    public WildCardMatcher(final String pattern) {
        final List<String> tmpList = new ArrayList<String>(
                                     Arrays.asList(pattern.split("\\*")));
        while (tmpList.remove("")) { /* remove empty Strings */ }
        // these last two lines can be made a lot simpler using a Guava Joiner
        if (tmpList.isEmpty())
            throw new IllegalArgumentException("Invalid pattern");
        patternParts = tmpList;
        openStart = pattern.startsWith("*");
        openEnd = pattern.endsWith("*");
    }

    public boolean matches(final String item) {
        int index = -1;
        int nextIndex = -1;
        final Iterator<String> it = patternParts.iterator();
        if (it.hasNext()) {
            String part = it.next();
            index = item.indexOf(part);
            if (index < 0 || (index > 0 && !openStart))
                return false;
            nextIndex = index + part.length();
            while (it.hasNext()) {
                part = it.next();
                index = item.indexOf(part, nextIndex);
                if (index < 0)
                    return false;
                nextIndex = index + part.length();
            }
            if (nextIndex < item.length())
                return openEnd;
        }
        return true;
    }

}

输出：

虽然这还远没有做好生产准备，但它应该足够快，并且它支持多个通配符，包括第一个和最后一个通配符。当然，如果您的通配符仅在末尾，请使用Peter的答案+1。

我可能遗漏了一些东西，但是循环和执行str.matchessomePattern有什么问题？您总是要查找xyz*形式的字符串，还是也要查找x*y形式的字符串等？@aioobe，您的建议针对每次查找，并且可能会导致不必要的内存使用，以返回结果。@Mat您不想去那里。Java字符串是基于unicode的。匹配的字节将是巨大的。如果你多次比较它们，排序它们一次，然后进行二进制搜索可能会更快。这很优雅！但是，如果您还没有树集，那么只需搜索一个ArrayList就可以在日志n中填充它。这个话题说他有一个收藏。希望他能够自己选择此集合。@aioobe，我希望他已经有了一个集合，如果需要并发，可以轻松地将其更改为TreeSet或ConcurrentSkipListSet。是否可以在不强制使用正则表达式的情况下将\W混合到解决方案中？

public static void main(final String[] args) throws Exception {
    testMatch("foo*bar", "foobar", "foo123bar", "foo*bar", "foobarandsomethingelse");
    testMatch("*.*", "somefile.doc", "somefile", ".doc", "somefile.");
    testMatch("pe*", "peter", "antipeter");
}

private static void testMatch(final String pattern, final String... words) {
    final WildCardMatcher matcher = new WildCardMatcher(pattern);
    for (final String word : words) {
        System.out.println("Pattern " + pattern + " matches word '"
                          + word + "': " + matcher.matches(word));
    }
}

Pattern foo*bar matches word 'foobar': true
Pattern foo*bar matches word 'foo123bar': true
Pattern foo*bar matches word 'foo*bar': true
Pattern foo*bar matches word 'foobarandsomethingelse': false
Pattern *.* matches word 'somefile.doc': true
Pattern *.* matches word 'somefile': false
Pattern *.* matches word '.doc': true
Pattern *.* matches word 'somefile.': true
Pattern pe* matches word 'peter': true
Pattern pe* matches word 'antipeter': false