在php中使用POSIX正则表达式从C源代码中删除注释_Php_Regex

在php中使用POSIX正则表达式从C源代码中删除注释

php regex

在php中使用POSIX正则表达式从C源代码中删除注释,php,regex,Php,Regex,我在使用php和正则表达式从C源代码中删除单行注释时遇到问题 /**/注释被删除，因此剩下的都是/注释首先：我有一个正则表达式：$content=ereg\u replace（“^\/\/.*$”，“$content”）这将删除整个文件（不仅是包含^//comment$的所有行）。我假设是因为它贪婪的搜索，但我怎么能让它不贪婪呢？我怎样才能让它对所有匹配的行都这样做秒：问题是它们在字符串中时不应该被删除“//不应该被删除”。我怎样才能做到这一点？我在想，当它找到“字符时，它应该跳过它，

我在使用php和正则表达式从C源代码中删除单行注释时遇到问题

/**/

注释被删除，因此剩下的都是

注释

首先：

我有一个正则表达式：

$content=ereg\u replace（“^\/\/.*$”，“$content”）
这将删除整个文件（不仅是包含^//comment$
的所有行）。我假设是因为它贪婪的搜索，但我怎么能让它不贪婪呢？我怎样才能让它对所有匹配的行都这样做
秒：
问题是它们在字符串中时不应该被删除“//不应该被删除”
。我怎样才能做到这一点？我在想，当它找到“
字符时，它应该跳过它，但我不知道怎么做
感谢所有帮助我的人，我真的很感激。
这将匹配所有单行注释，但那些用双引号括起来的注释除外。

“。甚至那些他们在评论中说的

(?=([^"\\]*(\\.|"([^"\\]*\\.)*[^"\\]*"))*[^"]*$)//.*$

*保留给

的所有权利为了避免字符串陷阱，一种方法是首先匹配要避免的内容，然后捕获或跳过它

自PHP 5.3以来，

ereg

函数已被弃用，但始终可以使用它们：

$result = ereg_replace('("([^\\\"]|\\\.)*")|//[^' . "\n" . ']*|/\*\**([^*]|\*\**[^*/])*\*\**/', '\1', $str);

它可以工作，但与preg版本相比，性能非常差（preg版本具有许多改进模式的功能）：

preg版本比ereg_uu版本快约450倍

子模式的详细信息

[^*]*（？：\*+（？！/）[^*]*）*+

：

此子模式描述多行注释的内容，因此在

/*

和第一个

*/

之间：

[^*]*           # all that is not an asterisk (can be empty)

(?:             # open a non capturing group:
                # The reason of this group is to handle asterisks that
                # are not a part of the closing sequence */

    \*+         # one or more asterisks 
    (?!/)       # negative lookahead : not followed by / 
                # (it is a zero-width assertion, in other words it's only a test 
                # and it doesn't consume characters)

    [^*]*       # zero or more characters that are not an asterisk
)*+             # repeat the group zero or more times (possessive)

字符串

/*aaaa**bbb***cc***/

的正则表达式引擎遍历（关于）：

[^*]*           # all that is not an asterisk (can be empty)

(?:             # open a non capturing group:
                # The reason of this group is to handle asterisks that
                # are not a part of the closing sequence */

    \*+         # one or more asterisks 
    (?!/)       # negative lookahead : not followed by / 
                # (it is a zero-width assertion, in other words it's only a test 
                # and it doesn't consume characters)

    [^*]*       # zero or more characters that are not an asterisk
)*+             # repeat the group zero or more times (possessive)

/*

aaaa**bbb***cc***/

/\*
[^*]*（？：\*+（？！/）[^*]）*+\*/
成功

/*
aaaa
**bbb***cc***/
/\*
[^*]*
（？：\*+（？！/）[^*]）*+*/ 成功 /*aaaa**bbb***cc***/ /\*[^*]* （？：\*+（？！/）[^*]*）*+ \*/ 尝试组 /*aaaa *** bbb***cc***/ /\*[^*]*（？： \*+ （？！/）[^*]）*+*/ 成功 /*aaaa** b bb***cc***/ /\*[^*]*（？：\*+ （？！/） [^*]）*+\*/ 已验证 /*aaaa** bbb***//\*[^*]*（？：\*+（？！/）[^*]*）*+\*//code>成功 /*aaaa**bbb***cc***/ /\*[^*]* （？：\*+（？！/）[^*]*）*+ \*/ 尝试组 /*aaaa**bbb *** cc***/ /\*[^*]*（？： \*+ （？！/）[^*]）*+*/ 成功 /*aaaa**bbb*** c c***/ /\*[^*]*（？：\*+ （？！/）[^*]）*+\*/ 已验证 /*aaaa**bbb*** ***//\*[^*]*（？：\*+（？！/）[^*]*）*+\*//code>成功 /*aaaa**bbb***cc***/ /\*[^*]* （？：\*+（？！/）[^*]*）*+ \*/ 尝试组 /*aaaa**bbb**cc *** / /\*[^*]*（？： \*+ （？！/）[^*]）*+*/ 成功 /*aaaa**bbb***cc*** / /\*[^*]*（？：\*+ （？！/） [^*]*）*+\*//code>失败 /*aaaa**bbb**cc ***//code>//\*[^*]*（？： \*+（？！/）[^*]）*+\*/ 回溯 /*aaaa**bbb***cc** * / /\*[^*]*（？：\*+ （？！/） [^*]）*+\*/ 已验证 /*aaaa**bbb***cc***/ /\*[^*]*（？：\*+（？！/） [^*]* ）*+\*/ 成功 /*aaaa**bbb***cc***/ /\*[^*]* （？：\*+（？！/）[^*]*）*+ \*/ 尝试组 /*aaaa**bbb***cc** * / /\*[^*]*（？： \*+ （？！/）[^*]）*+\*/ 成功 /*aaaa**bbb***cc*** / /\*[^*]*（？：\*+ （？！/） [^*]*）*+\*//code>失败 /*aaaa**bbb***cc***/ /\*[^*]* （？：\*+（？！/）[^*]*）*+ \*/ 失败 /*aaaa**bbb*** cc** */ /\*[^*]* （？：\*+（？！/）[^*]）*+ \*/ 回溯 /*aaaa**bbb***cc** */ /\*[^*]*（？：\*+（？！/）[^*]*）*+ \*/ 成功这比你想象的要困难得多。想象这样的行：char*foo=“//不要删除”“//真的请不要”///但是去掉这个哈哈！//还有这个