Python 用于管理字符串文本等项的转义字符的正则表达式_Python_Regex

Python 用于管理字符串文本等项的转义字符的正则表达式

python regex

Python 用于管理字符串文本等项的转义字符的正则表达式,python,regex,Python,Regex,我希望能够将字符串文字与转义引号选项相匹配。例如，我希望能够搜索“这是一个‘带有转义\’值的‘测试’ok”，并让它正确地将反斜杠识别为转义字符。我尝试过以下解决方案： import re regexc = re.compile(r"\'(.*?)(?<!\\)\'") match = regexc.search(r""" Example: 'Foo \' Bar' End. """) print match.groups() # I want ("Foo \' Bar") to be

我希望能够将字符串文字与转义引号选项相匹配。例如，我希望能够搜索“这是一个‘带有转义\’值的‘测试’ok”，并让它正确地将反斜杠识别为转义字符。我尝试过以下解决方案：

import re
regexc = re.compile(r"\'(.*?)(?<!\\)\'")
match = regexc.search(r""" Example: 'Foo \' Bar'  End. """)
print match.groups() 
# I want ("Foo \' Bar") to be printed above

有没有正则表达式大师能够解决这个问题？谢谢。

如果我理解您的意思（我不确定我是否理解），您希望在字符串中查找带引号的字符串，忽略转义引号。是这样吗？如果是，请尝试以下方法：

/(?<!\\)'((?:\\'|[^'])*)(?<!\\)'/

这似乎是正确的。

我认为这会起作用：

import re
regexc = re.compile(r"(?:^|[^\\])'(([^\\']|\\'|\\\\)*)'")

def check(test, base, target):
    match = regexc.search(base)
    assert match is not None, test+": regex didn't match for "+base
    assert match.group(1) == target, test+": "+target+" not found in "+base
    print "test %s passed"%test

check("Empty","''","")
check("single escape1", r""" Example: 'Foo \' Bar'  End. """,r"Foo \' Bar")
check("single escape2", r"""'\''""",r"\'")
check("double escape",r""" Example2: 'Foo \\' End. """,r"Foo \\")
check("First quote escaped",r"not matched\''a'","a")
check("First quote escaped beginning",r"\''a'","a")

正则表达式

r“（？：^ |[^\\]]”（（[^\\'].\\\\\'.\\\\\\\\\\\）*”

仅正向匹配字符串中所需的内容：

不是反斜杠或引号的字符

转义引号

转义反斜杠

编辑：

在前面添加额外的正则表达式以检查第一个转义的引号。

使用带有Python的re.findall（）的表达式：

re.findall（r）（？
[%s=>%s”%（s，re.findall（r）（Douglas Leeder的模式（（？：^\\\\]）”（（[^\\']\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\）*）
）将无法匹配“测试”测试\x3F测试“
和“测试\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\/code>（包含引号和反斜杠以外的转义的字符串
克莱特斯的模式（（？re_single_quote=r“”[^'\]*（？：\\.[^'\]*）*”
首先，请注意，MizardX的答案是100%准确的。我只想补充一些关于效率的额外建议。其次，我想指出，这个问题很久以前就得到了解决和优化-请参阅：，（其中非常详细地介绍了这个具体问题-强烈推荐）
首先让我们看一下子表达式，它匹配可能包含转义单引号的单引号字符串。如果要允许转义单引号，最好至少也允许转义单引号（Douglas Leeder的答案就是这样）.但只要你在做，就可以很容易地允许任何其他内容。有了这些要求。MizardX是唯一一个正确表达的人。在这里，它有短格式和长格式（我冒昧地以VERBOSE
模式编写了这篇文章，其中包含了大量描述性注释——对于非平凡的正则表达式，您应该始终这样做）：
#MizardX匹配单引号字符串的正确正则表达式：
re_sq_short=r“'（（？：\\.\.[^\\']）*）”
re_sq_long=r”“”
"#文字开场白
（#捕获组$1:内容。
（？：#内容备选方案分组
\\.#要么逃过一劫
|[^\\']#或一个非引号、非转义。
)*#零个或多个内容备选方案。
)#结束$1：内容。
'
"""

这可以正常工作并正确匹配以下所有字符串测试用例：
text01=r“out1”转义：\\\'out2”
test02=r“out1”转义引号：\''out2“
test03=r“out1”转义了任何内容：\X“out2”
test04=r“out1”两个转义：out2
test05=r“out1”结尾处的转义引号：\''out2“
test06=r“out1”在结尾处转义：\\'out2'

好的，现在让我们开始对此进行改进。首先，备选方案的顺序会有所不同，人们应该始终将最可能的备选方案放在首位。在这种情况下，非转义字符比转义字符更可能出现，因此颠倒顺序将略微提高正则表达式的效率：
#更好的正则表达式匹配单引号字符串：
re_sq_short=r“'（（？：[^\\']\\\）*）”
re_sq_long=r”“”
"#文字开场白
（#$1:目录。
（？：#内容备选方案分组
[^\\']#非引号、非转义、，
|或者逃过任何东西。
)*#零个或多个内容备选方案。
)#结束$1：内容。
'
"""

“展开循环”：
这稍微好一点，但可以通过应用Jeffrey Friedl的“展开循环”效率技术（来自）。上面的正则表达式不是最优的，因为它必须将星号量词艰苦地应用于两个备选方案的非捕获组，每个备选方案一次只消耗一个或两个字符。通过认识到类似的模式被反复重复，并且可以使用一个等价的表达式，可以完全消除这种备选方案ted将不做任何更改地执行相同的操作。下面是一个优化的表达式，用于匹配单个带引号的字符串，并将其内容捕获到组$1
：
#更好的正则表达式匹配单引号字符串：
re_sq_short=r“（[^'\\]*（？：\\.[^'\]*）*）”
re_sq_long=r”“”
"#文字开场白
（#$1:目录。
[^'\]*{normal*}零或更多非-'，非转义。
（？：#群{（特殊正规*）*}构造。
\\#{特殊}什么都逃不掉。
[^'\]*\\更多{正常*}。
)*#完成{（特殊正规*）*}构造。
)#结束$1：内容。
'
"""

这个表达式一口吞下了所有非引号、非反斜杠（绝大多数字符串），这大大减少了正则表达式引擎必须执行的工作量。您问得更好吗？好吧，我将此问题中提供的每个正则表达式输入到中，并测量正则表达式引擎完成以下字符串匹配（所有解决方案都正确匹配）所需的步骤数：
'这是一个示例字符串，其中包含一个“内部引用的”字符串。

以下是上述测试字符串的基准测试结果：
r”“”
编写单引号正则表达式步骤以：匹配非匹配
Evan Fosmark'（*？）？
r”“”
作者/正则表达式01 02 03 04 05 06 07 08 09 10 11 12 13 14
道格拉斯·利德p XX
private final static String TESTS[] = {
        "'testing 123'",
        "'testing 123\\'",
        "'testing 123",
        "blah 'testing 123",
        "blah 'testing 123'",
        "blah 'testing 123' foo",
        "this 'is a \\' test'",
        "another \\' test 'testing \\' 123' \\' blah"
};

public static void main(String args[]) {
    Pattern p = Pattern.compile("(?<!\\\\)'((?:\\\\'|[^'])*)(?<!\\\\)'");
    for (String test : TESTS) {
        Matcher m = p.matcher(test);
        if (m.find()) {
            System.out.printf("%s => %s%n", test, m.group(1));
        } else {
            System.out.printf("%s doesn't match%n", test);
        }
    }
}

'testing 123' => testing 123
'testing 123\' doesn't match
'testing 123 doesn't match
blah 'testing 123 doesn't match
blah 'testing 123' => testing 123
blah 'testing 123' foo => testing 123
this 'is a \' test' => is a \' test
another \' test 'testing \' 123' \' blah => testing \' 123

import re
regexc = re.compile(r"(?:^|[^\\])'(([^\\']|\\'|\\\\)*)'")

def check(test, base, target):
    match = regexc.search(base)
    assert match is not None, test+": regex didn't match for "+base
    assert match.group(1) == target, test+": "+target+" not found in "+base
    print "test %s passed"%test

check("Empty","''","")
check("single escape1", r""" Example: 'Foo \' Bar'  End. """,r"Foo \' Bar")
check("single escape2", r"""'\''""",r"\'")
check("double escape",r""" Example2: 'Foo \\' End. """,r"Foo \\")
check("First quote escaped",r"not matched\''a'","a")
check("First quote escaped beginning",r"\''a'","a")

re.findall(r"(?<!\\)'((?:\\'|[^'])*)(?<!\\)'", s)

>>> re.findall(r"(?<!\\)'((?:\\'|[^'])*)(?<!\\)'",
 r"\''foo bar gazonk' foo 'bar' gazonk 'foo \'bar\' gazonk' 'gazonk bar foo\'")
['foo bar gazonk', 'bar', "foo \\'bar\\' gazonk"]
>>>

["%s => %s" % (s, re.findall(r"(?<!\\)'((?:\\'|[^'])*)(?<!\\)'", s)) for s in TESTS]

(?<!\\)(?:\\\\)*'((?:\\.|[^\\'])*)'

(?<!\\)(?:\\\\)*("|')((?:\\.|(?!\1)[^\\])*)\1

Doublas Leeder´s test cases:
"''" matched successfully: ""
" Example: 'Foo \' Bar'  End. " matched successfully: "Foo \' Bar"
"'\''" matched successfully: "\'"
" Example2: 'Foo \\' End. " matched successfully: "Foo \\"
"not matched\''a'" matched successfully: "a"
"\''a'" matched successfully: "a"

cletus´ test cases:
"'testing 123'" matched successfully: "testing 123"
"'testing 123\\'" matched successfully: "testing 123\\"
"'testing 123" didn´t match, as exected.
"blah 'testing 123" didn´t match, as exected.
"blah 'testing 123'" matched successfully: "testing 123"
"blah 'testing 123' foo" matched successfully: "testing 123"
"this 'is a \' test'" matched successfully: "is a \' test"
"another \' test 'testing \' 123' \' blah" matched successfully: "testing \' 123"

MizardX´s test cases:
"test 'test \x3F test' test" matched successfully: "test \x3F test"
"test \\'test' test" matched successfully: "test"
"test 'test\\' test" matched successfully: "test\\"

>>> print re.findall(r"('([^'\\]|\\'|\\\\)*')",r""" Example: 'Foo \' Bar'  End. """)[0][0]