Regex 允许在正则表达式中嵌套连续匹配_Regex

Regex 允许在正则表达式中嵌套连续匹配

regex

Regex 允许在正则表达式中嵌套连续匹配,regex,Regex,我的数据是： Hello Test1 Begin * nm: 866 444 988 * nm: 08 66 # allowed * nm: 77 2 End * nm: 0 我希望捕获标记开始和结束之间的每个数字，并且前面必须有 * num: or # allowed * nm: 我的模式在.Net中运行良好（我使用捕获集合），但在其他引擎中不起作用…我的问题是如何添加另一个ancher\G来捕获嵌套的连续数字：（问题是如何掌握\G锚定）它返回捕获值中的每个数字谢谢编辑：我找到了

我的数据是：

Hello
Test1
Begin
* nm: 866 444 988
* nm: 08 66
# allowed * nm: 77 2
End
* nm: 0

我希望捕获标记

开始

和

结束

之间的每个数字，并且前面必须有

* num: or # allowed * nm:

我的模式在.Net中运行良好（我使用捕获集合），但在其他引擎中不起作用…我的问题是如何添加另一个ancher\G来捕获嵌套的连续数字：（问题是如何掌握\G锚定）

它返回捕获值中的每个数字

谢谢

编辑：我找到了一个解决方案，但它不是一个简单的模式：

(?mx:
   \G(?!\A)
      |
   ^Begin\r?\n
)
(?:#[ ]allowed[ ])?
\*[ ]nm:
  |
(?!^)\G[ ]*(\d+)\s*

编辑：2）

我的第二个模式的另一个问题是：如果在模式末尾添加

[]*\r？\n

而不是\s*，它将失败。为什么?

 (?xm:
     \G(?!\A)
         |
     ^Begin\r?\n
 )
 (?:#[ ]allowed[ ])?
 \*[ ]nm:
     |
 (?!^)\G[ ]*(\d+)
 [ ]*\r?\n # <-- the problem here

（？xm:
\G（？！\A）
|
^开始\r\n
)
（？：#[]允许[]）？
\*[]纳米：
|
（？！^）\G[]*（\d+）
[]*\r？\n#每次比赛的数字都在第1组。它不是一个捕获集合，但这就是为什么\G


反正也有。此外，由于这一性质，当

找到了end

编辑-请注意，您可以在（开始）
周围放置一个捕获组，作为新块开始的标志
 # (?mi:(?!\A)\G|(?:(?:^Begin|(?!\A)\G)(?s:(?!^End).)*?(?:^(?:\#[ ]+allowed[ ]+)?\*[ ]+nm:)))[ ]+(\d+)

 (?xmi:
      (?! \A )
      \G 
   |  
      (?:
           (?:
                ^ Begin  
             |  
                (?! \A )
                \G 
           )
           (?s:
                (?! ^ End )
                . 
           )*?
           (?:
                ^ 
                (?: \# [ ]+ allowed [ ]+ )?
                \* [ ]+ nm: 
           )
      )
 )
 [ ]+  
 ( \d+ )                            # (1)

附加评论：
 (?xmi:
      (?! \A )                # Here, matched before, give '[ ]+\d+` a first chance
      \G                      # to match again.
   |  
      (?:                     # Here, could have matched before
           (?:
                ^ Begin                 # Give a new begin position first chance
             |                        # or,
                (?! \A )                # See if this matched before
                \G 
           )

           # If this is new begin or matched before, move the position up to
           # the first/next delimiter 'nm:'

           (?s:                    # Lazy, move the position along (dot-all in this cluster)
                (?! ^ End )
                . 
           )*?
           (?:                     # Here we found the first/next delimiter
                ^ 
                (?: \# [ ]+ allowed [ ]+ )?
                \* [ ]+ nm: 
           )
      )
 )
 [ ]+  
 ( \d+ )                 # (1)

您可以使用以下模式：（Java/PCRE/Perl/.NET版本）
（*）注意多行模式和Ruby：在其他语言中，多行模式将^
和$
锚定的含义从“字符串的开始”和“字符串的结束”更改为“行的开始”和“行的结束”。在Ruby中，多行模式允许点匹配换行符（相当于其他语言的“单线”或“点全部”模式）。在Ruby中，无论采用何种模式，默认情况下，^
和$
都匹配行的开始和结束
这只是利用了一个事实，即数字不是一行的开始
当正则表达式引擎执行替换的分支2）时，模式将自动失败，因为（？=End$）
后面不能跟\Q\E（\d+）
。由于换行符和三个分支包含在一个原子组中，因此正则表达式引擎不可能回溯并尝试分支3）。这样，每次分支2）匹配时，都会破坏连续性
注意事项：

\Q..\E
功能允许在不转义特殊字符的情况下编写文字字符串。在自由空间模式下，\Q..\E
中的所有空格都会被考虑在内
要使此模式在ruby中工作，您需要删除m修饰符，删除所有\Q
和\E
，并在字符类中转义或括起所有空格、特殊字符和用于写注释的空格中使用的锐字符。
示例：（？：\Q\35;允许\E）\Q*nm:\E
=>（？：\\\\\\\\[]允许[]）\*[]nm:
你真的只需要一次就可以完成吗？@anubhava:是的，我的问题是关于嵌套锚和连续数字匹配…谢谢什么平台？它与perl兼容吗？一般来说，我的意见是不要试图连续地这样做。由于您在正则表达式中设置的限制越多，匹配的速度越慢，其他人也越难理解。您的最终解决方案在开始时也匹配123
，可能忘记将（？！\a）
放在最后一个\G
之前，感谢它的工作。你能解释一下你的模式吗？。特别是在两个非捕获组中的部分。我用rubular测试了你的代码，它在regexstorm.net（.net engine）中工作，但没有返回所有数字。有没有一种方法可以使你的代码“通用”或者在大多数情况下，不要认为dotnet
引擎具有\G
锚结构。为什么它需要它当它的捕获集合。想让我展示一下它是如何在Dot-NEt中实现的吗？Dot-NEt支持\G锚。。我的第一个模式在.NET中运行良好，但它不是通用的，因为我使用captureCollection对象。您的模式只返回第一个数字。您可以尝试使用regexstorm.netOk，如果Dot NET支持\G，则必须先查找，然后在循环中查找下一个（每次更改开始位置）。行为与\G相同，每次查找只匹配一次。如果在dotnet中您碰巧执行了（？：[]+（\d+）+
，则每行都将是一个捕获集合。但这不能转移到其他不支持它的发动机上。其他引擎会在每次通过时覆盖捕获组。（顺便说一句，我使用VS2010进行测试。）对于这个结构，有很多他可能不知道的小注意事项。我想把它作为一个原子组的一部分来做，（？=End$）
，但我想让它有一点通用性。@Casimir et Hippolyte:谢谢，但我用rubular（ruby引擎）和regexstorm（.Net引擎，经过简单修改）测试了你的模式，用php测试了你的模式，但不起作用。@walidtoumi：的确，这是关于CR字符和$
锚的错误（？：\G（？！\A）^Begin\r？$）（？>\n（？：（？：\35;允许）？\*nm:（？=^End\r？$）\124；*）*（\ d+）\r？
@walidtoumi:注意，不能将m修饰符放在这里（？m:\G..），否则修饰符的范围将仅限于此非捕获组，而不是整个模式。
 (?xmi:
      (?! \A )                # Here, matched before, give '[ ]+\d+` a first chance
      \G                      # to match again.
   |  
      (?:                     # Here, could have matched before
           (?:
                ^ Begin                 # Give a new begin position first chance
             |                        # or,
                (?! \A )                # See if this matched before
                \G 
           )

           # If this is new begin or matched before, move the position up to
           # the first/next delimiter 'nm:'

           (?s:                    # Lazy, move the position along (dot-all in this cluster)
                (?! ^ End )
                . 
           )*?
           (?:                     # Here we found the first/next delimiter
                ^ 
                (?: \# [ ]+ allowed [ ]+ )?
                \* [ ]+ nm: 
           )
      )
 )
 [ ]+  
 ( \d+ )                 # (1)

(?xm)  # switch on freespacing mode and multiline mode*
(?: \G(?!\A) | ^Begin\r?$ )  # two entry points: the end of the last match OR
                             # "Begin" that starts and ends a line

(?> \n  # a newline can start with:
    (?:
        (?:\Q# allowed \E)? \Q* nm:\E  # 1) the start of a line with numbers,
      |
        (?=End\r?$)                    # 2) the last line end of a block,
      |
        .*                             # 3) or an other full line
    )  
)*  # this group is optional to allow several consecutive numbers,
    # but the branch 3) can be repeated several times until the branch 1)
    # matches and the first number is found, or until the branch 2) matches
    # and closes the block.
\Q \E      # a space
(\d+)  \r? # the number