Python正则表达式问题：剥离多行注释但保留换行符_Python_Regex_Parsing_Comments

Python正则表达式问题：剥离多行注释但保留换行符

python regex parsing

Python正则表达式问题：剥离多行注释但保留换行符,python,regex,parsing,comments,Python,Regex,Parsing,Comments,我正在分析一个源代码文件，我想删除所有行注释（即以“/”开头）和多行注释（即/…../）。但是，如果多行注释中至少有一个换行符（\n），我希望输出正好有一个换行符例如，代码： qwe /* 123 456 789 */ asd 应该变成： qwe asd 而不是“qweasd”或：这样做的最佳方式是什么？谢谢编辑：测试的示例代码： comments_test = "hello // comment\n"+\ "line 2 /* a commen

我正在分析一个源代码文件，我想删除所有行注释（即以“/”开头）和多行注释（即/…../）。但是，如果多行注释中至少有一个换行符（\n），我希望输出正好有一个换行符

例如，代码：

qwe /* 123
456 
789 */ asd

应该变成：

qwe
asd

而不是“qweasd”或：

这样做的最佳方式是什么？谢谢

编辑：测试的示例代码：

comments_test = "hello // comment\n"+\
                "line 2 /* a comment */\n"+\
                "line 3 /* a comment*/ /*comment*/\n"+\
                "line 4 /* a comment\n"+\
                "continuation of a comment*/ line 5\n"+\
                "/* comment */line 6\n"+\
                "line 7 /*********\n"+\
                "********************\n"+\
                "**************/\n"+\
                "line ?? /*********\n"+\
                "********************\n"+\
                "********************\n"+\
                "********************\n"+\
                "********************\n"+\
                "**************/\n"+\
                "line ??"

预期成果：

hello 
line 2 
line 3  
line 4
line 5
line 6
line 7
line ??
line ??

这就是你要找的吗

>>> print(s)
qwe /* 123
456
789 */ asd
>>> print(re.sub(r'\s*/\*.*\n.*\*/\s*', '\n', s, flags=re.S))
qwe
asd

这将只适用于那些超过一行的评论，但不会影响其他评论

这个怎么样：

re.sub(r'\s*/\*(.|\n)*?\*/\s*', '\n', s, re.DOTALL).strip()

它攻击前导空格、

/*

、任何文本和换行符，直到第一个

*\

，然后攻击其后的任何空格

这是对sykora的例子的一个小小的扭曲，但它在内部也是非贪婪的。您也可能需要查看多行选项。

参见——如果考虑嵌套注释，正则表达式不是解决方案。

comment_re = re.compile(
    r'(^)?[^\S\n]*/(?:\*(.*?)\*/[^\S\n]*|/[^\n]*)($)?',
    re.DOTALL | re.MULTILINE
)

def comment_replacer(match):
    start,mid,end = match.group(1,2,3)
    if mid is None:
        # single line comment
        return ''
    elif start is not None or end is not None:
        # multi line comment at start or end of a line
        return ''
    elif '\n' in mid:
        # multi line comment with line break
        return '\n'
    else:
        # multi line comment without line break
        return ' '

def remove_comments(text):
    return comment_re.sub(comment_replacer, text)

```
（^）？
```
如果注释开始于行的开头，则只要使用了
```
多行
```
-标志，注释将匹配
```
[^\S\n]
```
将匹配除换行符以外的任何空白字符。如果评论是从自己的行开始的，我们不想匹配换行符
```
/\*（.*）\*/
```
将匹配多行注释并捕获内容。惰性匹配，因此我们不匹配两个或多个注释
```
DOTALL
```
-标志使
匹配换行符
```
/[^\n]
```
将匹配单行注释。无法使用
，因为有
```
DOTALL
```
-标志
```
（$）？
```
如果注释在一行末尾停止，则只要使用了
```
多行
```
-标志，注释将匹配

示例：

>>> s = ("qwe /* 123\n"
         "456\n"
         "789 */ asd /* 123 */ zxc\n"
         "rty // fgh\n")
>>> print '"' + '"\n"'.join(
...     remove_comments(s).splitlines()
... ) + '"'
"qwe"
"asd zxc"
"rty"
>>> comments_test = ("hello // comment\n"
...                  "line 2 /* a comment */\n"
...                  "line 3 /* a comment*/ /*comment*/\n"
...                  "line 4 /* a comment\n"
...                  "continuation of a comment*/ line 5\n"
...                  "/* comment */line 6\n"
...                  "line 7 /*********\n"
...                  "********************\n"
...                  "**************/\n"
...                  "line ?? /*********\n"
...                  "********************\n"
...                  "********************\n"
...                  "********************\n"
...                  "********************\n"
...                  "**************/\n")
>>> print '"' + '"\n"'.join(
...     remove_comments(comments_test).splitlines()
... ) + '"'
"hello"
"line 2"
"line 3 "
"line 4"
"line 5"
"line 6"
"line 7"
"line ??"
"line ??"

编辑：

更新至新规范
增加了另一个例子

*？

>>> s = ("qwe /* 123\n"
         "456\n"
         "789 */ asd /* 123 */ zxc\n"
         "rty // fgh\n")
>>> print '"' + '"\n"'.join(
...     remove_comments(s).splitlines()
... ) + '"'
"qwe"
"asd zxc"
"rty"
>>> comments_test = ("hello // comment\n"
...                  "line 2 /* a comment */\n"
...                  "line 3 /* a comment*/ /*comment*/\n"
...                  "line 4 /* a comment\n"
...                  "continuation of a comment*/ line 5\n"
...                  "/* comment */line 6\n"
...                  "line 7 /*********\n"
...                  "********************\n"
...                  "**************/\n"
...                  "line ?? /*********\n"
...                  "********************\n"
...                  "********************\n"
...                  "********************\n"
...                  "********************\n"
...                  "**************/\n")
>>> print '"' + '"\n"'.join(
...     remove_comments(comments_test).splitlines()
... ) + '"'
"hello"
"line 2"
"line 3 "
"line 4"
"line 5"
"line 6"
"line 7"
"line ??"
"line ??"