Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/oop/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何替换字符串的多个子字符串?_Python_Text_Replace - Fatal编程技术网

Python 如何替换字符串的多个子字符串?

Python 如何替换字符串的多个子字符串?,python,text,replace,Python,Text,Replace,我想使用.replace函数替换多个字符串 我现在有 string.replace("condition1", "") 但是我想喝点什么 string.replace("condition1", "").replace("condition2", "text") 虽然这感觉不是很好的语法 正确的方法是什么?有点像在grep/regex中,你可以做\1和\2将字段替换为某些搜索字符串,你可以做一个漂亮的小循环函数 def replace_all(text, dic): for i, j

我想使用.replace函数替换多个字符串

我现在有

string.replace("condition1", "")
但是我想喝点什么

string.replace("condition1", "").replace("condition2", "text")
虽然这感觉不是很好的语法


正确的方法是什么?有点像在grep/regex中,你可以做
\1
\2
将字段替换为某些搜索字符串,你可以做一个漂亮的小循环函数

def replace_all(text, dic):
    for i, j in dic.iteritems():
        text = text.replace(i, j)
    return text
其中,
text
是完整的字符串,
dic
是字典-每个定义都是一个字符串,将替换与术语的匹配

注意:在Python 3中,
iteritems()
已替换为
items()


小心:Python字典没有可靠的迭代顺序。只有在以下情况下,此解决方案才能解决您的问题:

  • 替换顺序无关紧要
  • 更换可以更改以前更换的结果
更新:上述与插入顺序相关的语句不适用于大于或等于3.6的Python版本,因为标准DICT已更改为使用插入顺序进行迭代

例如:

d = { "cat": "dog", "dog": "pig"}
my_sentence = "This is my cat and this is my dog."
replace_all(my_sentence, d)
print(my_sentence)
可能的输出#1:

输出:

"This is my pig and this is my pig."


小心#2:如果您的
文本
字符串太大或字典中有许多对,则效率低下。

下面是一个简短的示例,可以使用正则表达式实现这一技巧:

import re

rep = {"condition1": "", "condition2": "text"} # define desired replacements here

# use these three lines to do the replacement
rep = dict((re.escape(k), v) for k, v in rep.iteritems()) 
#Python 3 renamed dict.iteritems to dict.items so use rep.items() for latest versions
pattern = re.compile("|".join(rep.keys()))
text = pattern.sub(lambda m: rep[re.escape(m.group(0))], text)
例如:

>>> pattern.sub(lambda m: rep[re.escape(m.group(0))], "(condition1) and --condition2--")
'() and --text--'
>>> multiple_replace("(condition1) and --condition2--",
...                  {"condition1": "", "condition2": "text"})
'() and --text--'

>>> multiple_replace('hello, world', {'hello' : 'goodbye', 'world' : 'earth'})
'goodbye, earth'

>>> multiple_replace("Do you like cafe? No, I prefer tea.",
...                  {'cafe': 'tea', 'tea': 'cafe', 'like': 'prefer'})
'Do you prefer tea? No, I prefer cafe.'
df = pd.DataFrame({'text': ['Billy is going to visit Rome in November', 'I was born in 10/10/2010', 'I will be there at 20:00']})

to_replace=['Billy','Rome','January|February|March|April|May|June|July|August|September|October|November|December', '\d{2}:\d{2}', '\d{2}/\d{2}/\d{4}']
replace_with=['name','city','month','time', 'date']

print(df.text.replace(to_replace, replace_with, regex=True))

你真的不应该这样做,但我觉得这样做太酷了:

>>> replacements = {'cond1':'text1', 'cond2':'text2'}
>>> cmd = 'answer = s'
>>> for k,v in replacements.iteritems():
>>>     cmd += ".replace(%s, %s)" %(k,v)
>>> exec(cmd)
现在,
answer
是依次替换的结果

同样地,这是非常有攻击性的,你不应该经常使用。但是,如果需要的话,您可以做类似的事情,这真是太好了。

注意:测试您的案例,请参阅注释。 下面是一个示例,它对具有许多小替换的长字符串更有效

source = "Here is foo, it does moo!"

replacements = {
    'is': 'was', # replace 'is' with 'was'
    'does': 'did',
    '!': '?'
}

def replace(source, replacements):
    finder = re.compile("|".join(re.escape(k) for k in replacements.keys())) # matches every string we want replaced
    result = []
    pos = 0
    while True:
        match = finder.search(source, pos)
        if match:
            # cut off the part up until match
            result.append(source[pos : match.start()])
            # cut off the matched part and replace it in place
            result.append(replacements[source[match.start() : match.end()]])
            pos = match.end()
        else:
            # the rest after the last match
            result.append(source[pos:])
            break
    return "".join(result)

print replace(source, replacements)

关键在于避免长字符串的许多串联。我们将源字符串切碎为片段,在形成列表时替换部分片段,然后将整个内容重新合并为字符串。

我建议使用字符串模板。只需将要替换的字符串放入字典中,即可完成所有设置!来自


以下是使用reduce的第一个解决方案的变体,以防您喜欢它的功能性:

马蒂诺的更好版本:

repls = ('hello', 'goodbye'), ('world', 'earth')
s = 'hello, world'
reduce(lambda a, kv: a.replace(*kv), repls, s)

我是在F.J.s的优秀答案的基础上得出这一结论的:

import re

def multiple_replacer(*key_values):
    replace_dict = dict(key_values)
    replacement_function = lambda match: replace_dict[match.group(0)]
    pattern = re.compile("|".join([re.escape(k) for k, v in key_values]), re.M)
    return lambda string: pattern.sub(replacement_function, string)

def multiple_replace(string, *key_values):
    return multiple_replacer(*key_values)(string)
一次性使用:

>>> replacements = (u"café", u"tea"), (u"tea", u"café"), (u"like", u"love")
>>> print multiple_replace(u"Do you like café? No, I prefer tea.", *replacements)
Do you love tea? No, I prefer café.
请注意,由于更换过程只需一次,所以“café”将变为“tea”,但不会变回“café”

如果需要多次进行相同的替换,可以轻松创建替换功能:

>>> my_escaper = multiple_replacer(('"','\\"'), ('\t', '\\t'))
>>> many_many_strings = (u'This text will be escaped by "my_escaper"',
                       u'Does this work?\tYes it does',
                       u'And can we span\nmultiple lines?\t"Yes\twe\tcan!"')
>>> for line in many_many_strings:
...     print my_escaper(line)
... 
This text will be escaped by \"my_escaper\"
Does this work?\tYes it does
And can we span
multiple lines?\t\"Yes\twe\tcan!\"
改进:

  • 将代码转换为函数
  • 增加了多行支持
  • 修正了逃跑时的错误
  • 易于为特定的多重替换创建功能

享受!:-)

这只是对F.J和MiniQuark伟大答案以及bgusach最后但决定性的改进的更简明的回顾。实现多个同时字符串替换所需的功能如下:

def multiple_replace(string, rep_dict):
    pattern = re.compile("|".join([re.escape(k) for k in sorted(rep_dict,key=len,reverse=True)]), flags=re.DOTALL)
    return pattern.sub(lambda x: rep_dict[x.group(0)], string)
def escape_keys(d):
    """ transform dictionary d by applying re.escape to the keys """
    return dict((re.escape(k), v) for k, v in d.items())

>>> multiple_replace(s, escape_keys(re_str_dict))
"I don't want to change this name:\n  Philip II of Spain"
用法:

>>>multiple_replace("Do you like cafe? No, I prefer tea.", {'cafe':'tea', 'tea':'cafe', 'like':'prefer'})
'Do you prefer tea? No, I prefer cafe.'

如果您愿意,您可以从这个更简单的功能开始制作自己的专用替换功能。

或者只是为了快速破解:

for line in to_read:
    read_buffer = line              
    stripped_buffer1 = read_buffer.replace("term1", " ")
    stripped_buffer2 = stripped_buffer1.replace("term2", " ")
    write_to_file = to_write.write(stripped_buffer2)

下面是另一种使用字典的方法:

listA="The cat jumped over the house".split()
modify = {word:word for number,word in enumerate(listA)}
modify["cat"],modify["jumped"]="dog","walked"
print " ".join(modify[x] for x in listA)

我需要一个解决方案,其中要替换的字符串可以是正则表达式, 例如,通过将多个空白字符替换为单个空白字符来帮助规范化长文本。基于其他人(包括MiniQuark和mmj)的一系列答案,我得出以下结论:

def multiple_replace(string, reps, re_flags = 0):
    """ Transforms string, replacing keys from re_str_dict with values.
    reps: dictionary, or list of key-value pairs (to enforce ordering;
          earlier items have higher priority).
          Keys are used as regular expressions.
    re_flags: interpretation of regular expressions, such as re.DOTALL
    """
    if isinstance(reps, dict):
        reps = reps.items()
    pattern = re.compile("|".join("(?P<_%d>%s)" % (i, re_str[0])
                                  for i, re_str in enumerate(reps)),
                         re_flags)
    return pattern.sub(lambda x: reps[int(x.lastgroup[1:])][1], string)
对我来说,最重要的是,您也可以使用正则表达式,例如,仅替换整个单词,或规范化空白:

>>> s = "I don't want to change this name:\n  Philip II of Spain"
>>> re_str_dict = {r'\bI\b': 'You', r'[\n\t ]+': ' '}
>>> multiple_replace(s, re_str_dict)
"You don't want to change this name: Philip II of Spain"
如果要将字典键用作普通字符串, 在调用多个_replace之前,可以使用以下函数转义这些函数:

def multiple_replace(string, rep_dict):
    pattern = re.compile("|".join([re.escape(k) for k in sorted(rep_dict,key=len,reverse=True)]), flags=re.DOTALL)
    return pattern.sub(lambda x: rep_dict[x.group(0)], string)
def escape_keys(d):
    """ transform dictionary d by applying re.escape to the keys """
    return dict((re.escape(k), v) for k, v in d.items())

>>> multiple_replace(s, escape_keys(re_str_dict))
"I don't want to change this name:\n  Philip II of Spain"
以下函数有助于在字典键中查找错误的正则表达式(因为来自多个替换的错误消息不太清楚):

请注意,它不会链接替换,而是同时执行替换。这使得它在不限制其功能的情况下更加高效。为了模拟链接的效果,您可能只需要添加更多的字符串替换对,并确保对的预期顺序:

>>> multiple_replace("button", {"but": "mut", "mutton": "lamb"})
'mutton'
>>> multiple_replace("button", [("button", "lamb"),
...                             ("but", "mut"), ("mutton", "lamb")])
'lamb'

从Andrew宝贵的答案开始,我开发了一个脚本,从一个文件加载字典,并详细说明打开文件夹中的所有文件以进行替换。脚本从外部文件加载映射,您可以在其中设置分隔符。我是一个初学者,但我发现这个脚本在多个文件中进行多个替换时非常有用。它在几秒钟内加载了一本包含1000多个条目的词典。它并不优雅,但对我来说很管用

import glob
import re

mapfile = input("Enter map file name with extension eg. codifica.txt: ")
sep = input("Enter map file column separator eg. |: ")
mask = input("Enter search mask with extension eg. 2010*txt for all files to be processed: ")
suff = input("Enter suffix with extension eg. _NEW.txt for newly generated files: ")

rep = {} # creation of empy dictionary

with open(mapfile) as temprep: # loading of definitions in the dictionary using input file, separator is prompted
    for line in temprep:
        (key, val) = line.strip('\n').split(sep)
        rep[key] = val

for filename in glob.iglob(mask): # recursion on all the files with the mask prompted

    with open (filename, "r") as textfile: # load each file in the variable text
        text = textfile.read()

        # start replacement
        #rep = dict((re.escape(k), v) for k, v in rep.items()) commented to enable the use in the mapping of re reserved characters
        pattern = re.compile("|".join(rep.keys()))
        text = pattern.sub(lambda m: rep[m.group(0)], text)

        #write of te output files with the prompted suffice
        target = open(filename[:-4]+"_NEW.txt", "w")
        target.write(text)
        target.close()

这是我解决问题的办法。我在聊天机器人中使用它来同时替换不同的单词

def mass_replace(文本,dct):
new_string=“”
旧字符串=文本
而len(旧字符串)>0:
s=“”
sk=“”
对于dct.keys()中的k:
如果旧_string.startswith(k):
s=dct[k]
sk=k
如果是:
新字符串+=s
旧字符串=旧字符串[len(sk):]
其他:
新字符串+=旧字符串[0]
旧字符串=旧字符串[1:]
返回新字符串
打印mass_replace(“狗猎杀猫”,“狗”:“猫”,“猫”:“狗”})
这将成为猫猎狗的另一个例子: 输入列表

error_list = ['[br]', '[ex]', 'Something']
words = ['how', 'much[ex]', 'is[br]', 'the', 'fish[br]', 'noSomething', 'really']
期望的输出是

words = ['how', 'much', 'is', 'the', 'fish', 'no', 'really']
代码:

[n[0][0] if len(n[0]) else n[1] for n in [[[w.replace(e,"") for e in error_list if e in w],w] for w in words]] 

在我的例子中,我需要用名称简单地替换唯一键,因此我想到了以下方法:

a = 'This is a test string.'
b = {'i': 'I', 's': 'S'}
for x,y in b.items():
    a = a.replace(x, y)
>>> a
'ThIS IS a teSt StrIng.'

这是我的0.02美元。这是基于Andrew Clark的回答,更清楚一点,
words = ['how', 'much', 'is', 'the', 'fish', 'no', 'really']
[n[0][0] if len(n[0]) else n[1] for n in [[[w.replace(e,"") for e in error_list if e in w],w] for w in words]] 
a = 'This is a test string.'
b = {'i': 'I', 's': 'S'}
for x,y in b.items():
    a = a.replace(x, y)
>>> a
'ThIS IS a teSt StrIng.'
def multireplace(string, replacements):
    """
    Given a string and a replacement map, it returns the replaced string.

    :param str string: string to execute replacements on
    :param dict replacements: replacement dictionary {value to find: value to replace}
    :rtype: str

    """
    # Place longer ones first to keep shorter substrings from matching
    # where the longer ones should take place
    # For instance given the replacements {'ab': 'AB', 'abc': 'ABC'} against 
    # the string 'hey abc', it should produce 'hey ABC' and not 'hey ABc'
    substrs = sorted(replacements, key=len, reverse=True)

    # Create a big OR regex that matches any of the substrings to replace
    regexp = re.compile('|'.join(map(re.escape, substrs)))

    # For each match, look up the new string in the replacements
    return regexp.sub(lambda match: replacements[match.group(0)], string)
reduce(lambda a, b: a.replace(*b)
    , [('o','W'), ('t','X')] #iterable of pairs: (oldval, newval)
    , 'tomato' #The string from which to replace values
    )
s = "The quick brown fox jumps over the lazy dog"
for r in (("brown", "red"), ("lazy", "quick")):
    s = s.replace(*r)

#output will be:  The quick red fox jumps over the quick dog
# text = "The quick brown fox jumps over the lazy dog"
# replacements = [("brown", "red"), ("lazy", "quick")]
[text := text.replace(a, b) for a, b in replacements]
# text = 'The quick red fox jumps over the quick dog'
df = pd.DataFrame({'text': ['Billy is going to visit Rome in November', 'I was born in 10/10/2010', 'I will be there at 20:00']})

to_replace=['Billy','Rome','January|February|March|April|May|June|July|August|September|October|November|December', '\d{2}:\d{2}', '\d{2}/\d{2}/\d{4}']
replace_with=['name','city','month','time', 'date']

print(df.text.replace(to_replace, replace_with, regex=True))
0    name is going to visit city in month
1                      I was born in date
2                 I will be there at time
    from flashtext import KeywordProcessor
    self.processor = KeywordProcessor(case_sensitive=False)
    for k, v in self.my_dict.items():
        self.processor.add_keyword(k, v)
    new_string = self.processor.replace_keywords(string)
my_string = 'This is a test string.'
dict_mapping = {'i': 's', 's': 'S'}
result_good = my_string.translate(str.maketrans(dict_mapping))
result_bad = my_string
for x, y in dict_mapping.items():
    result_bad = result_bad.replace(x, y)
print(result_good)  # ThsS sS a teSt Strsng.
print(result_bad)   # ThSS SS a teSt StrSng.
remove_words = {"we", "this"}
target_sent = "we should modify this string"
target_sent_words = target_sent.split()
filtered_sent = " ".join(list(filter(lambda word: word not in remove_words, target_sent_words)))