Python 替换所有出现的特定单词
假设我有以下句子:Python 替换所有出现的特定单词,python,regex,python-2.7,Python,Regex,Python 2.7,假设我有以下句子: bean likes to sell his beans 我想用其他单词替换所有出现的特定单词。例如,bean到robert和bean到cars >>> "bean likes to sell his beans".replace("bean","robert") 'robert likes to sell his roberts' 我不能只使用str.replace,因为在这种情况下,它会将bean更改为roberts >>> "be
bean likes to sell his beans
我想用其他单词替换所有出现的特定单词。例如,bean
到robert
和bean
到cars
>>> "bean likes to sell his beans".replace("bean","robert")
'robert likes to sell his roberts'
我不能只使用str.replace
,因为在这种情况下,它会将bean
更改为roberts
>>> "bean likes to sell his beans".replace("bean","robert")
'robert likes to sell his roberts'
我只需要更改整个单词,而不需要更改另一个单词中出现的单词。我认为我可以通过使用正则表达式来实现这一点,但我不知道如何正确地做到这一点
"bean likes to sell his beans".replace("beans", "cars").replace("bean", "robert")
将“bean”的所有实例替换为“cars”,将“bean”的所有实例替换为“robert”。这是因为.replace()
返回原始字符串的修改实例。因此,你可以分阶段思考。它基本上是这样工作的:
>>> first_string = "bean likes to sell his beans"
>>> second_string = first_string.replace("beans", "cars")
>>> third_string = second_string.replace("bean", "robert")
>>> print(first_string, second_string, third_string)
('bean likes to sell his beans', 'bean likes to sell his cars',
'robert likes to sell his cars')
如果使用正则表达式,可以使用
\b
指定单词边界:
import re
sentence = 'bean likes to sell his beans'
sentence = re.sub(r'\bbean\b', 'robert', sentence)
# 'robert likes to sell his beans'
此处“beans”没有更改(改为“roberts”),因为末尾的“s”不是单词之间的边界:\b
匹配空字符串,但仅在单词的开头或结尾
第二个完整性替换:
sentence = re.sub(r'\bbeans\b', 'cars', sentence)
# 'robert likes to sell his cars'
如果您一次替换一个单词,您可能会多次替换单词(并且无法得到您想要的)。为了避免这种情况,可以使用函数或lambda:
d = {'bean':'robert', 'beans':'cars'}
str_in = 'bean likes to sell his beans'
str_out = re.sub(r'\b(\w+)\b', lambda m:d.get(m.group(1), m.group(1)), str_in)
这样,一旦bean
被robert
替换,它就不会再被修改(即使robert
也在您的输入单词列表中)
根据georg的建议,我用dict.get(key,default_value)编辑了这个答案。
替代解决方案(也由格奥尔格建议):
我知道这已经有很长时间了,但这看起来更优雅吗
reduce(lambda x,y : re.sub('\\b('+y[0]+')\\b',y[1],x) ,[("bean","robert"),("beans","cars")],"bean likes to sell his beans")
在实际任务中,我不能这样做,因为这种替换的顺序是不确定的。假设括号不是必需的,它们只是使正则表达式更可读(至少对我来说是这样)。出于某种原因,这似乎并不是所有的U.S删除
if
,如果您使用\bbeans?\b
作为正则表达式,则直接查看dict,并在lambda中使用m.group(0)
(用于整个匹配)。我希望这足够通用,因此1个正则表达式可以处理任何输入文本+任何要替换的单词列表。所以我不想在正则表达式中使用bean
,我明白了。只是它会检查每一个单词,我认为这是主要的瓶颈。我同意应该更快地用硬编码替换每个单词1个正则表达式。但仍然存在一个问题,即确保一个单词一旦被替换,就不会被另一个正则表达式再次替换。不,这个问题不再是一个问题\bbeans?\b
同时匹配bean
和bean
,因此在lambda中得到的是d['bean']
和d['beans']
,因此这两者的处理方式不同。