Java 在正则表达式中没有正确获取*量词？_Java_Regex

Java 在正则表达式中没有正确获取*量词？

java regex

Java 在正则表达式中没有正确获取*量词？,java,regex,Java,Regex,我是新加入regex的，我正在经历。我有一个关于*量词的问题。以下是*量词的定义： X*-找不到或有几个字母X *-任何字符序列基于上述定义，我编写了一个小程序： public static void testQuantifier() { String testStr = "axbx"; System.out.println(testStr.replaceAll("x*", "M")); //my expected output is MMMM but actual

我是新加入regex的，我正在经历。我有一个关于

量词的问题。以下是

量词的定义：

```
X*
```
-找不到或有几个字母X
```
*
```
-任何字符序列

基于上述定义，我编写了一个小程序：

public static void testQuantifier() {
    String testStr = "axbx";
    System.out.println(testStr.replaceAll("x*", "M"));
    //my expected output is MMMM but actual output is MaMMbMM
    /*
    Logic behind my expected output is:
    1. it encounters a which means 0 x is found. It should replace a with M.
    2. it encounters x which means 1 x is found. It should replace x with M.
    3. it encounters b which means 0 x is found. It should replace b with M.
    4. it encounters x which means 1 x is found. It should replace x with M.
    so output should be MMMM but why it is MaMMbMM?
    */

    System.out.println(testStr.replaceAll(".*", "M"));
    //my expected output is M but actual output is MM

    /*
    Logic behind my expected output is:
    It encounters axbx, which is any character sequence, it should 
    replace complete sequence with M.
    So output should be M but why it is MM?
    */
}

更新：-

根据修订后的理解，我希望输出为

MaMMbM

，但不是

MaMMbMM

。所以我不明白为什么我最后会得到一个额外的m

我对第一个正则表达式的修订理解是：

1. it encounters a which means 0 x is found. It should replace a with Ma.
2. it encounters x which means 1 x is found. It should replace x with M.
3. it encounters b which means 0 x is found. It should replace b with Mb.
4. it encounters x which means 1 x is found. It should replace x with M.
5. Lastly it encounters end of string at index 4. So it replaces 0x at end of String with M.

（虽然我也觉得奇怪的是字符串结尾的索引）

所以第一部分现在已经清楚了

另外，如果有人能澄清第二个正则表达式，这将是很有帮助的。

这就是你出错的地方：

首先，它遇到一个表示找到0 x的。所以它应该用M代替a

否-表示找到0

s，然后找到

。您没有说过应该用

替换

。。。您说过，任何数量的

s（包括0）都应替换为

如果要将每个字符替换为

，只需使用

：

System.out.println(testStr.replaceAll(".", "23"));

（我个人预计会有

MaMbM

的结果-我正在研究为什么会得到

mambmm

-我怀疑这是因为

和

之间有一个0

的序列，但对我来说还是有点奇怪。）

编辑：如果您查看模式匹配的位置，它会变得更加清晰。下面的代码显示：

Pattern pattern = Pattern.compile("x*");
Matcher matcher = pattern.matcher("axbx");
while (matcher.find()) {
    System.out.println(matcher.start() + "-" + matcher.end());
}

结果（请记住，目的是排他性的）有一点解释：

0-0 (index 0 = 'a', doesn't match)
1-2 (index 1 = 'x', matches)
2-2 (index 2 = 'b', doesn't match)
3-4 (index 3 = 'x', matches)
4-4 (index 4 is the end of the string)

如果用“M”替换这些匹配项中的每一个，那么最终将得到实际得到的输出

我认为最基本的问题是，如果你有一个模式可以匹配（整体）空字符串，你可以说这个模式在输入的任意两个字符之间出现了无限次。我可能会尽量避免这种模式——确保任何匹配都必须至少包含一个字符。

和

不会被替换，因为它们与正则表达式不匹配。替换不匹配字母之前或字符串结尾之前的

es和空字符串

让我们看看会发生什么：

我们在这条线的起点。正则表达式引擎试图匹配一个
```
x
```
，但失败了，因为这里有一个
```
a
```
正则表达式引擎回溯，因为
```
x*
```
也允许
```
x
```
的零重复。我们有一个匹配项，替换为
```
M
```
正则表达式引擎通过
```
a
```
并成功匹配
```
x
```
。替换为
```
M
```
正则表达式引擎现在尝试在当前位置（在上一次匹配之后）匹配
```
x
```
，该位置正好在
```
b
```
之前。不可能
但它可以再次回溯，在这里匹配零
```
x
```
es。替换为
```
M
```
正则表达式引擎通过
```
b
```
并成功匹配
```
x
```
。替换为
```
M
```
正则表达式引擎现在尝试在当前位置（在上一次匹配之后）匹配
```
x
```
，该位置位于字符串的末尾。不可能
但它可以再次回溯，在这里匹配零
```
x
```
es。替换为
```
M
```

顺便说一句，这取决于实现。例如，在Python中，它是

>>> re.sub("x*", "M", "axbx")
'MaMbM'

因为这里有一个更好的学习regex的资源，那就是：Python以您期望的方式实现替换。你可以为这两种行为辩护；我也觉得Python的方式更直观。根据您的逻辑，我希望输出为MaMMbM，但不是MaMMbMM。所以我不明白为什么最后会有额外的m？@emily:不，我原来的逻辑只会给出

MaMbM

。但是，请看我的编辑，了解我认为它是如何实际发生的。因此，我得到了第一部分Jon，正如我在更新中所说的，如果您可以了解System.out.println（testStr.replaceAll（“.*”，“M”）；那就是helpful@emilly：您就快到了-但是在找到

之后，它找到了空字符串-这就是为什么在末尾有额外的

。我编辑过的答案在索引4的末尾显示了匹配。