Java 番石榴Splitter.onPattern（..）.split（）与String.split（..）有何不同？_Java_Regex_Split_Guava

Java 番石榴Splitter.onPattern（..）.split（）与String.split（..）有何不同？

java regex

Java 番石榴Splitter.onPattern（..）.split（）与String.split（..）有何不同？,java,regex,split,guava,Java,Regex,Split,Guava,我最近利用前瞻正则表达式的强大功能来拆分字符串： "abc8".split("(?=\\d)|\\W") 如果打印到控制台，此表达式将返回： [abc, 8] 对这个结果非常满意，我想把它转移到番石榴上进行进一步的开发，看起来是这样的： Splitter.onPattern("(?=\\d)|\\W").split("abc8") 令我惊讶的是，输出变为： [abc] 为什么？当模式匹配空字符串时，番石榴拆分器似乎有错误。如果您尝试创建一个匹配器，并打印出匹配的内容： Pattern p

我最近利用前瞻正则表达式的强大功能来拆分字符串：

"abc8".split("(?=\\d)|\\W")

如果打印到控制台，此表达式将返回：

[abc, 8]

对这个结果非常满意，我想把它转移到番石榴上进行进一步的开发，看起来是这样的：

Splitter.onPattern("(?=\\d)|\\W").split("abc8")

令我惊讶的是，输出变为：

[abc]

为什么？

当模式匹配空字符串时，番石榴

拆分器似乎有错误。如果您尝试创建一个匹配器，并打印出匹配的内容：
Pattern pattern = Pattern.compile("(?=\\d)|\\W");
Matcher matcher = pattern.matcher("abc8");
while (matcher.find()) {
    System.out.println(matcher.start() + "," + matcher.end());
}

您将获得输出3,3
，这使得它看起来与8
匹配。因此，它只是在那里拆分，结果只有abc

您可以使用例如Pattern#split（String）
来提供正确的输出：
Pattern.compile("(?=\\d)|\\W").split("abc8")

你发现了一只虫子
System.out.println(s.split("abc82")); // [abc, 8]
System.out.println(s.split("abc8"));  // [abc]

这是Splitter
用于实际拆分String
s（）的方法：
这种逻辑非常有效，除非空匹配发生在字符串的末尾。如果空匹配确实出现在字符串的末尾，它将跳过该字符。这部分应该是什么样子（注意=
->
）：
在我看来，对于单字符部分和零长度分隔符的组合，它看起来像是一个逐个错误。对我来说，它在字符串的启动过程中似乎不起作用，但在中间工作得很好。这会有帮助，然后你可以简单地匹配任何你想要的并保留分隔符…“abc8”。split（（？=\\d）\\\W”）
是模式的缩写。compile（（？=\\d）\\\W”）。split（“abc8”）
@Jeffrey是的，这就是检查失败的原因。这看起来像是Splitter
@squezymo中的一个错误String#split中还有一些代码，但几乎是肯定的。认为应该禁止使用lookahead或lookahead来匹配空字符串的模式，但似乎您找到了错误的正确位置。@Bubletan（？=\d）不会匹配空字符串。它匹配一个后跟数字的空字符串，这不是一回事；一个空字符串。你能在上面写一个bug吗？我相信他们会想解决这个问题的。@Jeffrey真棒；我是在调查问题，而不是请求。
@Override
protected String computeNext() {
  /*
   * The returned string will be from the end of the last match to the
   * beginning of the next one. nextStart is the start position of the
   * returned substring, while offset is the place to start looking for a
   * separator.
   */
  int nextStart = offset;
  while (offset != -1) {
    int start = nextStart;
    int end;

    int separatorPosition = separatorStart(offset);

    if (separatorPosition == -1) {
      end = toSplit.length();
      offset = -1;
    } else {
      end = separatorPosition;
      offset = separatorEnd(separatorPosition);
    }

    if (offset == nextStart) {
      /*
       * This occurs when some pattern has an empty match, even if it
       * doesn't match the empty string -- for example, if it requires
       * lookahead or the like. The offset must be increased to look for
       * separators beyond this point, without changing the start position
       * of the next returned substring -- so nextStart stays the same.
       */
      offset++;
      if (offset >= toSplit.length()) {
        offset = -1;
      }
      continue;
    }

    while (start < end && trimmer.matches(toSplit.charAt(start))) {
      start++;
    }
    while (end > start && trimmer.matches(toSplit.charAt(end - 1))) {
      end--;
    }

    if (omitEmptyStrings && start == end) {
      // Don't include the (unused) separator in next split string.
      nextStart = offset;
      continue;
    }

    if (limit == 1) {
      // The limit has been reached, return the rest of the string as the
      // final item.  This is tested after empty string removal so that
      // empty strings do not count towards the limit.
      end = toSplit.length();
      offset = -1;
      // Since we may have changed the end, we need to trim it again.
      while (end > start && trimmer.matches(toSplit.charAt(end - 1))) {
        end--;
      }
    } else {
      limit--;
    }

    return toSplit.subSequence(start, end).toString();
  }
  return endOfData();
}

if (offset == nextStart) {
  /*
   * This occurs when some pattern has an empty match, even if it
   * doesn't match the empty string -- for example, if it requires
   * lookahead or the like. The offset must be increased to look for
   * separators beyond this point, without changing the start position
   * of the next returned substring -- so nextStart stays the same.
   */
  offset++;
  if (offset >= toSplit.length()) {
    offset = -1;
  }
  continue;
}

if (offset == nextStart) {
  /*
   * This occurs when some pattern has an empty match, even if it
   * doesn't match the empty string -- for example, if it requires
   * lookahead or the like. The offset must be increased to look for
   * separators beyond this point, without changing the start position
   * of the next returned substring -- so nextStart stays the same.
   */
  offset++;
  if (offset > toSplit.length()) {
    offset = -1;
  }
  continue;
}