Python 具有单次传递多次替换的字典匹配问题_Python_Regex_String_Replace_Regular Language

Python 具有单次传递多次替换的字典匹配问题

python regex string replace

Python 具有单次传递多次替换的字典匹配问题,python,regex,string,replace,regular-language,Python,Regex,String,Replace,Regular Language,嗨，我正在尝试使用以下函数在一次传递中替换多个单词： def multiple_replace(text, dict): regex = re.compile("(%s)" % "|".join(map(re.escape, dict.keys()))) return regex.sub(lambda mo: dict[mo.string[mo.start():mo.end()]], text) 但我的问题是，如果我有一本字典： dict = { 'hello1': 'hi',

嗨，我正在尝试使用以下函数在一次传递中替换多个单词：

def multiple_replace(text, dict):
    regex = re.compile("(%s)" % "|".join(map(re.escape, dict.keys())))
    return regex.sub(lambda mo: dict[mo.string[mo.start():mo.end()]], text)

但我的问题是，如果我有一本字典：

dict = { 'hello1': 'hi', 'hello111' : 'GoodMorning', 'world' : 'earth' }

我试着

s = " hello111 world"
multiple_replace(s, dict)

该函数与

hello1

匹配，而与

hello111

不匹配如果你们有任何线索那就太好了

我想反向搜索，以确保在对键进行排序时，函数以最长的键开始，但这可能不是最好的方法

Wiktor Stribiżew的评论权

首先对键进行排序或给出单词边界

def multiple_replace_sort(text, a_dict): regex = re.compile("(%s)" % "|".join(map(re.escape, sorted(a_dict, key=lambda obj: len(obj), reverse=True)))) return regex.sub(lambda mo: a_dict[mo.string[mo.start():mo.end()]], text) def multiple_replace_boundary(text, a_dict): regex = re.compile(r"(%s)\b" % "|".join(map(re.escape, a_dict.keys()))) return regex.sub(lambda mo: a_dict[mo.string[mo.start():mo.end()]], text)
非单词项可能不适用于上述方法，必须先分离，或者使用更好的代码来处理

使用word boundary-
regex=re.compile（r“\b（%s）\b“%”join（map（re.escape，dict.keys（）））
。或者按长度降序排列键，并使用非单词边界方法。尝试使用匹配对象的
regex=re.compile（（%s）”%“|”.join（已排序（dict，key=lambda k:len（k），reverse=True））
-它是这样工作的吗？不需要使用匹配对象的
.start（）
和
.end（）
。只需使用
.group（）
获取值即可。此外，您的正则表达式只使用尾随词边界，这是行不通的。如果键以非单词字符开始/结束，则单词边界方法将不起作用。好吧，我只是按照原始代码进行操作，并进行最少的更改。在组尾使用单个单词边界
\b
，用于转换最可能的单词。然后添加一个
world-
键并重试。我明白了，然后必须先将单词和非单词分开，然后再次出现排序问题。最好对键进行排序=]
def multiple_replace_separate(text, a_dict): word, non_word = list(), list() for key in a_dict: word.append(key) if len(re.match(r'([a-zA-Z0-9]*)', key).group(0)) == len(key) else non_word.append(key) regex = re.compile(r"(%s)\B|(%s)\b" % ("|".join(non_word), "|".join(map(re.escape, word)))) return regex.sub(lambda mo: a_dict[mo.string[mo.start():mo.end()]], text)