将单行JavaScript注释（//）与re匹配_Javascript_Python_Regex_Replace

将单行JavaScript注释（//）与re匹配

javascript python regex replace

将单行JavaScript注释（//）与re匹配,javascript,python,regex,replace,Javascript,Python,Regex,Replace,我想使用python的re模块从（大部分有效的）JavaScript中过滤掉（大部分是一行）注释。例如： // this is a comment var x = 2 // and this is a comment too var url = "http://www.google.com/" // and "this" too url += 'but // this is not a comment' // however this one is url += 'this "is not a

我想使用python的

re

模块从（大部分有效的）JavaScript中过滤掉（大部分是一行）注释。例如：

// this is a comment
var x = 2 // and this is a comment too
var url = "http://www.google.com/" // and "this" too
url += 'but // this is not a comment' // however this one is
url += 'this "is not a comment' + " and ' neither is this " // only this

我现在试了半个多小时都没有成功。有人能帮我吗

编辑1：

foo = 'http://stackoverflow.com/' // these // are // comments // too //

bar = 'http://no.comments.com/'

编辑2：

foo = 'http://stackoverflow.com/' // these // are // comments // too //

bar = 'http://no.comments.com/'

如果有显式分号，解析可能会更容易

在任何情况下，这都是有效的：

import re

rx = re.compile(r'.*(//(.*))$')

lines = ["// this is a comment", 
    "var x = 2 // and this is a comment too",
    """var url = "http://www.google.com/" // and "this" too""",
    """url += 'but // this is not a comment' // however this one is""",
    """url += 'this "is not a comment' + " and ' neither is this " // only this""",]

for line in lines: 
    print rx.match(line).groups()

上述各项的产出：

('// this is a comment', ' this is a comment')
('// and this is a comment too', ' and this is a comment too')
('// and "this" too', ' and "this" too')
('// however this one is', ' however this one is')
('// only this', ' only this')

我不确定删除这些注释后您在使用javascript做什么，但可能会有所帮助。无论如何，它可以很好地删除评论，而且还有一个问题。

我的正则表达式功能有点过时了，所以我用你的问题刷新了我的记忆。它变成了一个相当大的正则表达式，主要是因为我还想过滤多行注释

import re

reexpr = r"""
    (                           # Capture code
        "(?:\\.|[^"\\])*"       # String literal
        |
        '(?:\\.|[^'\\])*'       # String literal
        |
        (?:[^/\n"']|/[^/*\n"'])+ # Any code besides newlines or string literals
        |
        \n                      # Newline
    )|
    (/\*  (?:[^*]|\*[^/])*   \*/)        # Multi-line comment
    |
    (?://(.*)$)                 # Comment
    $"""
rx = re.compile(reexpr, re.VERBOSE + re.MULTILINE)

这个正则表达式与三个不同的子组匹配。一个用于代码，两个用于注释内容。下面是一个如何提取这些信息的示例

code = r"""// this is a comment
var x = 2 * 4 // and this is a comment too
var url = "http://www.google.com/" // and "this" too
url += 'but // this is not a comment' // however this one is
url += 'this "is not a comment' + " and ' neither is this " // only this

bar = 'http://no.comments.com/' // these // are // comments
bar = 'text // string \' no // more //\\' // comments
bar = 'http://no.comments.com/'
bar = /var/ // comment

/* comment 1 */
bar = open() /* comment 2 */
bar = open() /* comment 2b */// another comment
bar = open( /* comment 3 */ file) // another comment 
"""

parts = rx.findall(code)
print '*' * 80, '\nCode:\n\n', '\n'.join([x[0] for x in parts if x[0].strip()])
print '*' * 80, '\nMulti line comments:\n\n', '\n'.join([x[1] for x in parts if x[1].strip()])
print '*' * 80, '\nOne line comments:\n\n', '\n'.join([x[2] for x in parts if x[2].strip()])

在这一点上，你应该考虑使用适当的解析器，而不是试图破坏正则表达式。谢谢Anon.，如果我不能想出一个正则表达式很快，我会去一个解析器。也许是蜘蛛门钥匙？谢谢，这绝对是+1。现在让我修改一下我的问题：）另外，JavaScript不是我写的，所以不幸的是，我不能保证显式分号……嗯，不，这只在行的末尾总是有注释，并且注释本身没有//时才有效。两者都

var url=”http://www“

和

//以//

开头的注释将失败。@好的，它适用于指定的输入。正如@Anon提到的，这里需要一个真正的解析器来正确捕获所有内容。谢谢，JSMin的Python实现现在可以满足我的需要。哇，这甚至比问题提前了一步，但这正是我需要的！非常感谢您花时间解决这个问题！我编辑正则表达式是因为它与“*”不匹配，就像“x=4*5”中的“*”，它变成了“x=4.5）”，不适用于

/*/*/

或

/*/*/*/

。修正：将

/\\*（？：\*？[^/]\\n）*\*/

替换为

/\*（？：[^*]\\*[^/]）*\*/

。谢谢Gumbo，我已经更改了正则表达式。