Python-re.sub而不替换regex的一部分_Python_Regex

Python-re.sub而不替换regex的一部分

python regex

Python-re.sub而不替换regex的一部分,python,regex,Python,Regex,例如，我有一个字符串“perfect bear hunts”，我想用单词“the”替换出现“bear”之前的单词因此，由此产生的字符串将是“熊狩猎” 我想我会用 re.sub("\w+ bear","the","perfect bear hunts") 但它也取代了“熊”。在匹配过程中使用bear时，如何排除它被替换？正则表达式是您需要的 re.sub(".+(?=bear)", "the ", "prefect bear swims") 使用积极的前瞻性替换bear之前的所有内容： re

例如，我有一个字符串“perfect bear hunts”，我想用单词“the”替换出现“bear”之前的单词

因此，由此产生的字符串将是“熊狩猎”

我想我会用

re.sub("\w+ bear","the","perfect bear hunts")

但它也取代了“熊”。在匹配过程中使用bear时，如何排除它被替换？

正则表达式是您需要的

re.sub(".+(?=bear)", "the ", "prefect bear swims")

使用积极的前瞻性替换bear之前的所有内容：

re.sub(".+(?=bear )","the ","perfect bear swims")

将捕获任何字符（行终止符除外）。

使用lookaheads的替代方法：

使用组捕获要保留的零件

（）

，然后使用替换中的

\1

重新插入

re.sub("\w+ (bear)",r"the \1","perfect bear swims")

与其他答案一样，我会使用积极的前瞻性断言

然后，为了解决通过引入一些注释（比如“beard”？）而引起的问题，我添加了

（\b |$）

。这将匹配单词边界或字符串的结尾，因此您只匹配单词

bear

，不再匹配

因此，您可以得到以下结果：

import re

def bear_replace(string):
    return re.sub(r"\w+ (?=bear(\b|$))", "the ", string)

和测试用例（使用pytest）：

这将取代文字“熊”之前的所有内容。在“我的长胡子”上试试这个。这将产生“熊游泳”，画得非常好，编辑后它将取代“熊”之前的所有文字，而不仅仅是它之前的单词。在“我的长胡子”上试试这个，看看问题…用空格更新。谢谢你的提示；）它仍然把“一只大熊”变成“thebear”，而不是“thebear”。OP说他们想替换“熊”之前的单词，而不是整个字符串。您毫无理由地去更改OP的

\w+

。请注意，这也将匹配“beard”等词。你应该考虑添加一个单词边界<代码> \b/COD>。很抱歉，因为NoTiky，但是我想指出，如果“熊”这个词后面跟着任何标点符号“熊”，或者“熊，谁”，那么，<代码>熊（s $ $）/代码>将不匹配。我建议改为使用boundary

\b

一词（尽管无可否认，这也不是一个完美的解决方案；例如，它将匹配“熊大小”）。

import pytest

@pytest.mark.parametrize('string, expected', [
    ("perfect bear swims", "the bear swims"),

    # We only capture the first word before 'bear
    ("before perfect bear swims", "before the bear swims"),

    # 'beard' isn't captured
    ("a perfect beard", "a perfect beard"),

    # We handle the case where 'bear' is the end of the string
    ("perfect bear", "the bear"),

    # 'bear' is followed by a non-space punctuation character
    ("perfect bear-string", "the bear-string"),
])
def test_bear_replace(string, expected):
    assert bear_replace(string) == expected