PHP正则表达式崩溃apache_Php_Regex_Windows_Apache

PHP正则表达式崩溃apache

php regex windows apache

PHP正则表达式崩溃apache,php,regex,windows,apache,Php,Regex,Windows,Apache,我有一个为模板系统进行匹配的正则表达式，不幸的是，在一些微不足道的查找中，它似乎会使apache（它在Windows上运行）崩溃。我已经研究了这个问题，有一些关于增加堆栈大小等的建议，但似乎都不起作用，而且我也不太喜欢通过提高限制来处理这些问题，因为这通常只是将bug推向了未来无论如何，有没有关于如何修改正则表达式以减少出错的想法其思想是捕获最里面的块（在本例中，{block:test}应该首先捕获它！{/block:test}），然后我将替换开始/结束标记，并通过正则表达式重新运行整个过程

我有一个为模板系统进行匹配的正则表达式，不幸的是，在一些微不足道的查找中，它似乎会使apache（它在Windows上运行）崩溃。我已经研究了这个问题，有一些关于增加堆栈大小等的建议，但似乎都不起作用，而且我也不太喜欢通过提高限制来处理这些问题，因为这通常只是将bug推向了未来

无论如何，有没有关于如何修改正则表达式以减少出错的想法

其思想是捕获最里面的块（在本例中，

{block:test}应该首先捕获它！{/block:test}

），然后我将替换开始/结束标记，并通过正则表达式重新运行整个过程，直到没有剩余的块为止

正则表达式：

~(?P<opening>{(?P<inverse>[!])?block:(?P<name>[a-z0-9\s_-]+)})(?P<contents>(?:(?!{/?block:[0-9a-z-_]+}).)*)(?P<closing>{/block:\3})~ism

~（？P{（？P[！]）？块：（？P[a-z0-9\s{-]+）}）（？P（？（？！{/？块：[0-9a-z-+]））*（？P{/块：\3}）~ism

示例模板：

<div class="f_sponsors s_banners">
    <div class="s_previous">&laquo;</div>
    <div class="s_sponsors">
        <ul>
            {block:sponsors}
            <li>
                <a href="{var:url}" target="_blank">
                    <img src="image/160x126/{var:image}" alt="{var:name}" title="{var:name}" />
                </a>
            {block:test}This should be caught first!{/block:test}
            </li>
            {/block:sponsors}
        </ul>
    </div>
    <div class="s_next">&raquo;</div>
</div>


&拉阔；

{区块：赞助商}

{block:test}这应该首先捕获！{/block:test}

{/阻止：发起人}

&拉阔；

我想这是不可能的（

您可以使用

原子组：（？>…）

或

所有格量词：？++*++…

来抑制/限制回溯，并通过

展开循环

技术加速匹配。我的解决方案：

\{block:（\w++）\}（[^试试这个：
'~(?P<opening>\{(?P<inverse>[!])?block:(?P<name>[a-z0-9\s_-]+)\})(?P<contents>[^{]*(?:\{(?!/block:(?P=name)\})[^{]*)*)(?P<closing>\{/block:(?P=name)\})~i'

首先，我们感兴趣的唯一字符是大括号，因此我们可以用[^{]*
拼凑任何其他字符。只有在看到{
之后，我们才检查它是否是{/block}的开头
tag。如果不是，我们继续使用它，开始扫描下一个，并根据需要重复
使用RegexBuddy，我测试了每个正则表达式，方法是将光标放在{block:substandors}
标记的开头并进行调试

标记以强制失败的匹配并再次调试。您的正则表达式需要940步才能成功，2265步才能失败。我的正则表达式需要57步才能成功，83步才能失败

另一方面，我删除了

修饰符，因为我没有使用点（

），而删除了

修饰符，因为它从来都不需要。我还使用了命名的backreference

（？p=name）

，而不是@DaveRandom的优秀建议中的

\3

。并且我避开了所有大括号（

和

）因为我觉得那样读起来更容易

编辑：如果要匹配最里面的命名块，请将正则表达式的中间部分改为：

(?P<contents>
  [^{]*(?:\{(?!/block:(?P=name)\})[^{]*)*
)

…因为你知道它永远不会消耗它以后必须归还的任何东西。这会使事情加快一点，但这并不是必须的。

解决方案必须是一个单一的正则表达式吗？一个更有效的方法可能只是查找第一个出现的

{/block:

（可以是简单的字符串搜索或正则表达式）然后从该点向后搜索以找到匹配的开始标记，适当地替换跨度并重复，直到没有更多的块。如果每次都从模板顶部开始查找第一个结束标记，则将得到嵌套最深的块

镜像算法也同样有效-查找最后一个开始标记，然后从那里向前搜索相应的结束标记：

<?php

$template = //...

while(true) {
  $last_open_tag = strrpos($template, '{block:');
  $last_inverted_tag = strrpos($template, '{!block:');
  // $block_start is the index of the '{' of the last opening block tag in the
  // template, or false if there are no more block tags left
  $block_start = max($last_open_tag, $last_inverted_tag);
  if($block_start === false) {
    // all done
    break;
  } else {
    // extract the block name (the foo in {block:foo}) - from the character
    // after the next : to the character before the next }, inclusive
    $block_name_start = strpos($template, ':', $block_start) + 1;
    $block_name = substr($template, $block_name_start,
        strcspn($template, '}', $block_name_start));

    // we now have the start tag and the block name, next find the end tag.
    // $block_end is the index of the '{' of the next closing block tag after
    // $block_start.  If this doesn't match the opening tag something is wrong.
    $block_end = strpos($template, '{/block:', $block_start);
    if(strpos($template, $block_name.'}', $block_end + 8) !== $block_end + 8) {
      // non-matching tag
      print("Non-matching tag found\n");
      break;
    } else {
      // now we have found the innermost block
      // - its start tag begins at $block_start
      // - its content begins at
      //   (strpos($template, '}', $block_start) + 1)
      // - its content ends at $block_end
      // - its end tag ends at ($block_end + strlen($block_name) + 9)
      //   [9 being the length of '{/block:' plus '}']
      // - the start tag was inverted iff $block_start === $last_inverted_tag
      $template = // do whatever you need to do to replace the template
    }
  }
}

echo $template;

听起来很奇怪，我曾经有一个regex，它一直在吃一个特定的Apache实例和S
（大写！）flag修复了它。我猜这是一个未报告的内存泄漏或其他什么，研究过程导致了它的避免。很长时间，但值得一试，我会说…@DaveRandom，我曾经有过相同的问题，使用相同的修复方法！让我们看看这是否对Opa有效。发生的另一个想法是逃避文字{
正则表达式中的字符可能会有所帮助。从技术上讲，它们是元字符，虽然PCRE似乎非常宽容未替换的大括号，但如果正确转义它们，它可能会减少所需的工作。还有，为什么要使用命名捕获组而不使用后面引用中的名称呢？/block:\3
=>/block:（？P=name）
。这对于您的正则表达式尤其适用，因为
是可选的，在这种情况下
将是\2
，而不是\3
，您的意思是在~ism部分的s标志上吗？因为如果结束标记与没有s的开始标记在不同的行上，则它不起作用。不过，后面的引用名称很好，请注意s是的，感谢大家迄今为止在这方面的帮助！但是，当你在一个块中放置一个块时，它只捕获最外层的块，而不是最内层的块。我对问题进行了一些更新！我对问题进行了一些更新，因为它只捕获最外层的块。但是你的问题仍然不需要~sm部分，所以我认为它已打开正确的轨道！非常好！我想我理解OP想要什么…我认为“嵌套块”是指任何名称的嵌套块，而不仅仅是相同名称的嵌套块，它们被迭代替换。因此{1}{/2}{/1}
应该在1
之前捕获2
。如果是这种情况，您可以很容易地更改中间部分，从[^{]*（？：\{（？）/block:（？P=name）\}）[^{]*
到[^{]*（？：{（？）/？block:[a-z0-9\s\-]+\}][^{]*）*
-匹配HTML的类似问题：。（？：（？！）*\s
，而崩溃了？>[^
(?P<contents>
  [^{]*(?:\{(?!/block:(?P=name)\})[^{]*)*
)

(?P<contents>
  [^{]*(?:\{(?!/?block:[a-z0-9\s_-]+\})[^{]*)*
)

(?P<contents>
  [^{]*+(?:\{(?!/?block:[a-z0-9\s_-]+\})[^{]*+)*+
)

<?php

$template = //...

while(true) {
  $last_open_tag = strrpos($template, '{block:');
  $last_inverted_tag = strrpos($template, '{!block:');
  // $block_start is the index of the '{' of the last opening block tag in the
  // template, or false if there are no more block tags left
  $block_start = max($last_open_tag, $last_inverted_tag);
  if($block_start === false) {
    // all done
    break;
  } else {
    // extract the block name (the foo in {block:foo}) - from the character
    // after the next : to the character before the next }, inclusive
    $block_name_start = strpos($template, ':', $block_start) + 1;
    $block_name = substr($template, $block_name_start,
        strcspn($template, '}', $block_name_start));

    // we now have the start tag and the block name, next find the end tag.
    // $block_end is the index of the '{' of the next closing block tag after
    // $block_start.  If this doesn't match the opening tag something is wrong.
    $block_end = strpos($template, '{/block:', $block_start);
    if(strpos($template, $block_name.'}', $block_end + 8) !== $block_end + 8) {
      // non-matching tag
      print("Non-matching tag found\n");
      break;
    } else {
      // now we have found the innermost block
      // - its start tag begins at $block_start
      // - its content begins at
      //   (strpos($template, '}', $block_start) + 1)
      // - its content ends at $block_end
      // - its end tag ends at ($block_end + strlen($block_name) + 9)
      //   [9 being the length of '{/block:' plus '}']
      // - the start tag was inverted iff $block_start === $last_inverted_tag
      $template = // do whatever you need to do to replace the template
    }
  }
}

echo $template;