Java 番石榴Splitter.onPattern(..).split()与String.split(..)有何不同?
我最近利用前瞻正则表达式的强大功能来拆分字符串:Java 番石榴Splitter.onPattern(..).split()与String.split(..)有何不同?,java,regex,split,guava,Java,Regex,Split,Guava,我最近利用前瞻正则表达式的强大功能来拆分字符串: "abc8".split("(?=\\d)|\\W") 如果打印到控制台,此表达式将返回: [abc, 8] 对这个结果非常满意,我想把它转移到番石榴上进行进一步的开发,看起来是这样的: Splitter.onPattern("(?=\\d)|\\W").split("abc8") 令我惊讶的是,输出变为: [abc] 为什么?当模式匹配空字符串时,番石榴拆分器似乎有错误。如果您尝试创建一个匹配器,并打印出匹配的内容: Pattern p
"abc8".split("(?=\\d)|\\W")
如果打印到控制台,此表达式将返回:
[abc, 8]
对这个结果非常满意,我想把它转移到番石榴上进行进一步的开发,看起来是这样的:
Splitter.onPattern("(?=\\d)|\\W").split("abc8")
令我惊讶的是,输出变为:
[abc]
为什么?当模式匹配空字符串时,番石榴
拆分器似乎有错误。如果您尝试创建一个匹配器,并打印出匹配的内容:
Pattern pattern = Pattern.compile("(?=\\d)|\\W");
Matcher matcher = pattern.matcher("abc8");
while (matcher.find()) {
System.out.println(matcher.start() + "," + matcher.end());
}
您将获得输出3,3
,这使得它看起来与8
匹配。因此,它只是在那里拆分,结果只有abc
您可以使用例如Pattern#split(String)
来提供正确的输出:
Pattern.compile("(?=\\d)|\\W").split("abc8")
你发现了一只虫子
System.out.println(s.split("abc82")); // [abc, 8]
System.out.println(s.split("abc8")); // [abc]
这是Splitter
用于实际拆分String
s()的方法:
这种逻辑非常有效,除非空匹配发生在字符串的末尾。如果空匹配确实出现在字符串的末尾,它将跳过该字符。这部分应该是什么样子(注意=
->
):
在我看来,对于单字符部分和零长度分隔符的组合,它看起来像是一个逐个错误。对我来说,它在字符串的启动过程中似乎不起作用,但在中间工作得很好。这会有帮助,然后你可以简单地匹配任何你想要的并保留分隔符…“abc8”。split((?=\\d)\\\W”)
是模式的缩写。compile((?=\\d)\\\W”)。split(“abc8”)
@Jeffrey是的,这就是检查失败的原因。这看起来像是Splitter
@squezymo中的一个错误String#split
中还有一些代码,但几乎是肯定的。认为应该禁止使用lookahead或lookahead来匹配空字符串的模式,但似乎您找到了错误的正确位置。@Bubletan(?=\d)不会匹配空字符串。它匹配一个后跟数字的空字符串,这不是一回事;一个空字符串。你能在上面写一个bug吗?我相信他们会想解决这个问题的。@Jeffrey真棒;我是在调查问题,而不是请求。
@Override
protected String computeNext() {
/*
* The returned string will be from the end of the last match to the
* beginning of the next one. nextStart is the start position of the
* returned substring, while offset is the place to start looking for a
* separator.
*/
int nextStart = offset;
while (offset != -1) {
int start = nextStart;
int end;
int separatorPosition = separatorStart(offset);
if (separatorPosition == -1) {
end = toSplit.length();
offset = -1;
} else {
end = separatorPosition;
offset = separatorEnd(separatorPosition);
}
if (offset == nextStart) {
/*
* This occurs when some pattern has an empty match, even if it
* doesn't match the empty string -- for example, if it requires
* lookahead or the like. The offset must be increased to look for
* separators beyond this point, without changing the start position
* of the next returned substring -- so nextStart stays the same.
*/
offset++;
if (offset >= toSplit.length()) {
offset = -1;
}
continue;
}
while (start < end && trimmer.matches(toSplit.charAt(start))) {
start++;
}
while (end > start && trimmer.matches(toSplit.charAt(end - 1))) {
end--;
}
if (omitEmptyStrings && start == end) {
// Don't include the (unused) separator in next split string.
nextStart = offset;
continue;
}
if (limit == 1) {
// The limit has been reached, return the rest of the string as the
// final item. This is tested after empty string removal so that
// empty strings do not count towards the limit.
end = toSplit.length();
offset = -1;
// Since we may have changed the end, we need to trim it again.
while (end > start && trimmer.matches(toSplit.charAt(end - 1))) {
end--;
}
} else {
limit--;
}
return toSplit.subSequence(start, end).toString();
}
return endOfData();
}
if (offset == nextStart) {
/*
* This occurs when some pattern has an empty match, even if it
* doesn't match the empty string -- for example, if it requires
* lookahead or the like. The offset must be increased to look for
* separators beyond this point, without changing the start position
* of the next returned substring -- so nextStart stays the same.
*/
offset++;
if (offset >= toSplit.length()) {
offset = -1;
}
continue;
}
if (offset == nextStart) {
/*
* This occurs when some pattern has an empty match, even if it
* doesn't match the empty string -- for example, if it requires
* lookahead or the like. The offset must be increased to look for
* separators beyond this point, without changing the start position
* of the next returned substring -- so nextStart stays the same.
*/
offset++;
if (offset > toSplit.length()) {
offset = -1;
}
continue;
}