有效地替换20k潜在匹配中的字符串（Python）_Python_Performance

有效地替换20k潜在匹配中的字符串（Python）

python performance

有效地替换20k潜在匹配中的字符串（Python）,python,performance,Python,Performance,我想替换字符串的子字符串，并想检查20k+候选项有没有比将20k分为900个候选对象的子组并循环它们更有效的方法这是我的功能： def replaceNames(mailString, nameList, replacement=" Nachname"): anzNames = len(nameList) seq = np.arange(start=0, stop=anzNames, step=900).tolist() seq.append(anzNames)

我想替换字符串的子字符串，并想检查20k+候选项

有没有比将20k分为900个候选对象的子组并循环它们更有效的方法

这是我的功能：

def replaceNames(mailString, nameList, replacement=" Nachname"):
    anzNames = len(nameList)
    seq = np.arange(start=0, stop=anzNames, step=900).tolist()
    seq.append(anzNames)
    for i in range(0, len(seq) - 1):
        tempNamesString = "|".join(nameList[seq[i]:seq[i + 1]])
        mailString = re.sub(tempNamesString, replacement, mailString)
    return (mailString)

谢谢

我的建议是：

尽可能使用

string

操作，而不是使用

re

（regex），因为它更快

# Sample string of 1 million "my_rand_str"
In [9]: x = ["my_rand_str"] * 1000000 

In [10]: %%time                                                                                                             
    ...: replaced = [a.replace("str", "replaced") for a in x]                                                               
    ...:                                                                                                                    
    ...:                                                                                                                
Wall time: 219 ms                                                                            

In [11]: %%time                                                                                                             
    ...: replaced =  [re.sub("str", "replaced", a) for a in x]                                                             
    ...:                                                                                                                    
    ...:                                                                                                                
Wall time: 1.33 s

如果您坚持使用正则表达式，请使用预编译的正则表达式

In [25]: tobe_replaced = re.compile("str")

In [28]: %%time
    ...: replaced = [tobe_replaced.sub("replaced", a) for a in x]
    ...:
    ...:
    ...:
Wall time: 1.02 s

如果可能的话，可以在一个大字符串上执行，而不是循环，因为循环更昂贵

In [29]: %%time
    ...: replaced = tobe_replaced.sub("replaced", "\n".join(x)).split("\n")
    ...:
    ...:
Wall time: 291 ms

In [30]: %%time
    ...: replaced = "\n".join(x).replace("str", "replaced").split("\n")
    ...:
    ...:
    ...:
Wall time: 132 ms

希望这有帮助。

您能提供此函数的输入和输出示例吗？我很难理解你想做什么。