C# 什么'；这两者之间的区别是什么；“团体”；及；“捕获”；在.NET正则表达式中？_C#_.net_Regex

C# 什么'；这两者之间的区别是什么；“团体”；及；“捕获”；在.NET正则表达式中？

c# .net regex

C# 什么'；这两者之间的区别是什么；“团体”；及；“捕获”；在.NET正则表达式中？,c#,.net,regex,C#,.net,Regex,对于.NET的正则表达式语言，“组”和“捕获”之间的区别，我有点模糊。考虑下面的C代码：我希望这会导致对字母“Q”的单个捕获，但如果我打印返回的MatchCollection的属性，我会看到： matches.Count: 1 matches[0].Value: {Q} matches[0].Captures.Count: 1 matches[0].Captures[0].Value: {Q} matches[0].Groups.

对于.NET的正则表达式语言，“组”和“捕获”之间的区别，我有点模糊。考虑下面的C代码：

我希望这会导致对字母“Q”的单个捕获，但如果我打印返回的

MatchCollection

的属性，我会看到：

matches.Count: 1
matches[0].Value: {Q}
        matches[0].Captures.Count: 1
                matches[0].Captures[0].Value: {Q}
        matches[0].Groups.Count: 2
                matches[0].Groups[0].Value: {Q}
                matches[0].Groups[0].Captures.Count: 1
                        matches[0].Groups[0].Captures[0].Value: {Q}
                matches[0].Groups[1].Value: Q
                matches[0].Groups[1].Captures.Count: 1
                        matches[0].Groups[1].Captures[0].Value: Q

这到底是怎么回事？我知道整场比赛都有一个截图，但是团队是如何进入的呢？为什么不匹配[0]。捕获包括对字母“Q”的捕获？

来自MSDN：

Captures属性的真正用途是将量词应用于捕获组，以便该组在单个正则表达式中捕获多个子字符串。在本例中，Group对象包含有关上次捕获的子字符串的信息，而Captures属性包含有关该组捕获的所有子字符串的信息。在下面的示例中，正则表达式\b（\w++\s*）+。匹配以句点结尾的整个句子。组（\w+\s*）+捕获集合中的单个单词。因为组集合只包含关于最后捕获的子字符串的信息，所以它捕获句子“句子”中的最后一个单词。但是，组捕获的每个单词都可以从Captures属性返回的集合中获得

你不会是第一个对此模糊不清的人。以下是名人对此的看法（第437+页）：

根据您的视图，它可以添加一个有趣的新维度匹配结果，或添加混淆和膨胀

此外:

一组人之间的主要区别对象和捕获对象是什么每个组对象包含一个捕获的集合表示所有中介匹配都由在比赛期间分组，以及最终文本由小组匹配

几页之后，他的结论是：

在通过.NET之后文件和实际了解这些对象添加了什么，我对他们的感觉很复杂。在…上一方面，这是一个有趣的问题创新[…]另一方面，它似乎增加了效率负担[…] 一个不会被使用的功能在大多数情况下

换句话说：它们非常相似，但偶尔，你会发现它们的用处。在你长出另一把灰胡子之前，你甚至可能会喜欢它

既然上述两个问题并没有在另一个帖子里说出来，似乎回答了你的问题，请考虑以下几点。将捕获视为一种历史跟踪器。当正则表达式进行匹配时，它会从左到右遍历字符串（暂时忽略回溯），当遇到匹配的捕获括号时，它会将其存储在

$x

（x是任意数字），比如

$1

在重复捕获括号时，普通正则表达式引擎将丢弃当前的

$1

，并将其替换为新值。不是.NET，它将保留此历史记录并将其放置在捕获[0]中

如果我们将您的正则表达式更改为如下所示：

MatchCollection matches = Regex.Matches("{Q}{R}{S}", @"(\{[A-Z]\})+");

您会注意到，第一个

组

将有一个

捕获

（第一个组始终是整个匹配项，即等于

$0

），第二个组将保留

{S}

，即只有最后一个匹配组。但是，这里有一个捕获，如果你想找到另外两个捕获，它们在

捕获

，它包含

{Q}

{R}

和

{s}

的所有中间捕获

如果您想知道如何从多个捕获中获取信息，而多个捕获只显示字符串中明确存在的单个捕获的最后匹配，则必须使用

捕获

最后一个问题的最后一句话：总的比赛总是有一个总的捕获，不要把它与单独的小组混为一谈。捕获只在组内有趣。

组是我们在正则表达式中与组关联的对象

"(a[zx](b?))"

Applied to "axb" returns an array of 3 groups:

group 0: axb, the entire match.
group 1: axb, the first group matched.
group 2: b, the second group matched.

除了这些只是“捕获”的群体。此处不表示非捕获组（使用“（？：”语法）

"(a[zx](?:b?))"

Applied to "axb" returns an array of 2 groups:

group 0: axb, the entire match.
group 1: axb, the first group matched.

捕获也是我们与“捕获的组”关联的内容。但是，当多次使用量词应用组时，只有最后一个匹配项作为组的匹配项保留。捕获数组存储所有这些匹配项

"(a[zx]\s+)+"

Applied to "ax az ax" returns an array of 2 captures of the second group.

group 1, capture 0 "ax "
group 1, capture 1 "az "

至于你的最后一个问题——在研究这个问题之前，我会认为捕获是一个按它们所属的组排序的捕获数组。相反，它只是组[0]的别名。捕获。非常无用。

假设你有以下文本输入

dogcatcat

和类似

dog（cat）的模式（catcat））

在本例中，您有3个组，第一个组（主要组）对应于匹配项

匹配==

dogcatcat

和组0=

dogcatcat

组1==

catcat

组2==

catcat

那到底是怎么回事

让我们考虑一个用C代码（.NET）编写的小例子，使用<代码>正则表达式> /代码>类.< /p>

int matchIndex = 0;
int groupIndex = 0;
int captureIndex = 0;

foreach (Match match in Regex.Matches(
        "dogcatabcdefghidogcatkjlmnopqr", // input
        @"(dog(cat(...)(...)(...)))") // pattern
)
{
    Console.Out.WriteLine($"match{matchIndex++} = {match}");

    foreach (Group @group in match.Groups)
    {
        Console.Out.WriteLine($"\tgroup{groupIndex++} = {@group}");

        foreach (Capture capture in @group.Captures)
        {
            Console.Out.WriteLine($"\t\tcapture{captureIndex++} = {capture}");
        }

        captureIndex = 0;
    }

    groupIndex = 0;
    Console.Out.WriteLine();
        }

输出：

match0 = dogcatabcdefghi
    group0 = dogcatabcdefghi
        capture0 = dogcatabcdefghi
    group1 = dogcatabcdefghi
        capture0 = dogcatabcdefghi
    group2 = catabcdefghi
        capture0 = catabcdefghi
    group3 = abc
        capture0 = abc
    group4 = def
        capture0 = def
    group5 = ghi
        capture0 = ghi

match1 = dogcatkjlmnopqr
    group0 = dogcatkjlmnopqr
        capture0 = dogcatkjlmnopqr
    group1 = dogcatkjlmnopqr
        capture0 = dogcatkjlmnopqr
    group2 = catkjlmnopqr
        capture0 = catkjlmnopqr
    group3 = kjl
        capture0 = kjl
    group4 = mno
        capture0 = mno
    group5 = pqr
        capture0 = pqr

让我们只分析第一个匹配（

match0

）

如您所见，有三个小组：

group3

、

group4

和

group5

    group3 = kjl
        capture0 = kjl
    group4 = mno
        capture0 = mno
    group5 = pqr
        capture0 = pqr

创建这些组（3-5）是因为主模式的“子模式”

（…）（…）（…）

（狗（猫）（…）（…）（…））

group3

的值对应于它的捕获（

capture0

）（与

group4

和

group5

的情况相同）。这是因为没有类似
（…）{3}
的组重复

<好的，让我们考虑另一个问题。
group3 = kjl capture0 = kjl group4 = mno capture0 = mno group5 = pqr capture0 = pqr

match0 = dogcatabcdefghi group0 = dogcatabcdefghi capture0 = dogcatabcdefghi group1 = dogcatabcdefghi capture0 = dogcatabcdefghi group2 = catabcdefghi capture0 = catabcdefghi group3 = ghi capture0 = abc capture1 = def capture2 = ghi match1 = dogcatkjlmnopqr group0 = dogcatkjlmnopqr capture0 = dogcatkjlmnopqr group1 = dogcatkjlmnopqr capture0 = dogcatkjlmnopqr group2 = catkjlmnopqr capture0 = catkjlmnopqr group3 = pqr capture0 = kjl capture1 = mno capture2 = pqr

csharp> Regex.Match("3:10pm", @"((\d)+):((\d)+)(am|pm)"). > Groups.Cast<Group>(). > Zip(Enumerable.Range(0, int.MaxValue), (g, n) => "[" + n + "] " + g); { "[0] 3:10pm", "[1] 3", "[2] 3", "[3] 10", "[4] 0", "[5] pm" }

csharp> Regex.Match("3:10pm", @"((\d)+):((\d)+)(am|pm)"). > Groups.Cast<Group>(). > Skip(4).First().Captures.Cast<Capture>(). > Zip(Enumerable.Range(0, int.MaxValue), (c, n) => "["+n+"] " + c); { "[0] 1", "[1] 0" }