Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/350.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 替换所有出现的特定单词_Python_Regex_Python 2.7 - Fatal编程技术网

Python 替换所有出现的特定单词

Python 替换所有出现的特定单词,python,regex,python-2.7,Python,Regex,Python 2.7,假设我有以下句子: bean likes to sell his beans 我想用其他单词替换所有出现的特定单词。例如,bean到robert和bean到cars >>> "bean likes to sell his beans".replace("bean","robert") 'robert likes to sell his roberts' 我不能只使用str.replace,因为在这种情况下,它会将bean更改为roberts >>> "be

假设我有以下句子:

bean likes to sell his beans
我想用其他单词替换所有出现的特定单词。例如,
bean
robert
bean
cars

>>> "bean likes to sell his beans".replace("bean","robert")
'robert likes to sell his roberts'
我不能只使用
str.replace
,因为在这种情况下,它会将
bean
更改为
roberts

>>> "bean likes to sell his beans".replace("bean","robert")
'robert likes to sell his roberts'
我只需要更改整个单词,而不需要更改另一个单词中出现的单词。我认为我可以通过使用正则表达式来实现这一点,但我不知道如何正确地做到这一点

"bean likes to sell his beans".replace("beans", "cars").replace("bean", "robert")
将“bean”的所有实例替换为“cars”,将“bean”的所有实例替换为“robert”。这是因为
.replace()
返回原始字符串的修改实例。因此,你可以分阶段思考。它基本上是这样工作的:

 >>> first_string = "bean likes to sell his beans"
 >>> second_string = first_string.replace("beans", "cars")
 >>> third_string = second_string.replace("bean", "robert")
 >>> print(first_string, second_string, third_string)

 ('bean likes to sell his beans', 'bean likes to sell his cars', 
  'robert likes to sell his cars')

如果使用正则表达式,可以使用
\b
指定单词边界:

import re

sentence = 'bean likes to sell his beans'

sentence = re.sub(r'\bbean\b', 'robert', sentence)
# 'robert likes to sell his beans'
此处“beans”没有更改(改为“roberts”),因为末尾的“s”不是单词之间的边界:
\b
匹配空字符串,但仅在单词的开头或结尾

第二个完整性替换:

sentence = re.sub(r'\bbeans\b', 'cars', sentence)
# 'robert likes to sell his cars'

如果您一次替换一个单词,您可能会多次替换单词(并且无法得到您想要的)。为了避免这种情况,可以使用函数或lambda:

d = {'bean':'robert', 'beans':'cars'}
str_in = 'bean likes to sell his beans'
str_out = re.sub(r'\b(\w+)\b', lambda m:d.get(m.group(1), m.group(1)), str_in)
这样,一旦
bean
robert
替换,它就不会再被修改(即使
robert
也在您的输入单词列表中)

根据georg的建议,我用dict.get(key,default_value)编辑了这个答案。 替代解决方案(也由格奥尔格建议):


我知道这已经有很长时间了,但这看起来更优雅吗

reduce(lambda x,y : re.sub('\\b('+y[0]+')\\b',y[1],x) ,[("bean","robert"),("beans","cars")],"bean likes to sell his beans")

在实际任务中,我不能这样做,因为这种替换的顺序是不确定的。假设括号不是必需的,它们只是使正则表达式更可读(至少对我来说是这样)。出于某种原因,这似乎并不是所有的U.S删除
if
,如果您使用
\bbeans?\b
作为正则表达式,则直接查看dict,并在lambda中使用
m.group(0)
(用于整个匹配)。我希望这足够通用,因此1个正则表达式可以处理任何输入文本+任何要替换的单词列表。所以我不想在正则表达式中使用
bean
,我明白了。只是它会检查每一个单词,我认为这是主要的瓶颈。我同意应该更快地用硬编码替换每个单词1个正则表达式。但仍然存在一个问题,即确保一个单词一旦被替换,就不会被另一个正则表达式再次替换。不,这个问题不再是一个问题
\bbeans?\b
同时匹配
bean
bean
,因此在lambda中得到的是
d['bean']
d['beans']
,因此这两者的处理方式不同。