Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/313.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/performance/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
有效地替换20k潜在匹配中的字符串(Python)_Python_Performance - Fatal编程技术网

有效地替换20k潜在匹配中的字符串(Python)

有效地替换20k潜在匹配中的字符串(Python),python,performance,Python,Performance,我想替换字符串的子字符串,并想检查20k+候选项 有没有比将20k分为900个候选对象的子组并循环它们更有效的方法 这是我的功能: def replaceNames(mailString, nameList, replacement=" Nachname"): anzNames = len(nameList) seq = np.arange(start=0, stop=anzNames, step=900).tolist() seq.append(anzNames)

我想替换字符串的子字符串,并想检查20k+候选项

有没有比将20k分为900个候选对象的子组并循环它们更有效的方法

这是我的功能:

def replaceNames(mailString, nameList, replacement=" Nachname"):
    anzNames = len(nameList)
    seq = np.arange(start=0, stop=anzNames, step=900).tolist()
    seq.append(anzNames)
    for i in range(0, len(seq) - 1):
        tempNamesString = "|".join(nameList[seq[i]:seq[i + 1]])
        mailString = re.sub(tempNamesString, replacement, mailString)
    return (mailString)
谢谢

我的建议是:

  • 尽可能使用
    string
    操作,而不是使用
    re
    (regex),因为它更快

    # Sample string of 1 million "my_rand_str"
    In [9]: x = ["my_rand_str"] * 1000000 
    
    In [10]: %%time                                                                                                             
        ...: replaced = [a.replace("str", "replaced") for a in x]                                                               
        ...:                                                                                                                    
        ...:                                                                                                                
    Wall time: 219 ms                                                                            
    
    In [11]: %%time                                                                                                             
        ...: replaced =  [re.sub("str", "replaced", a) for a in x]                                                             
        ...:                                                                                                                    
        ...:                                                                                                                
    Wall time: 1.33 s                                                                           
    
  • 如果您坚持使用正则表达式,请使用预编译的正则表达式

    In [25]: tobe_replaced = re.compile("str")
    
    In [28]: %%time
        ...: replaced = [tobe_replaced.sub("replaced", a) for a in x]
        ...:
        ...:
        ...:
    Wall time: 1.02 s
    
  • 如果可能的话,可以在一个大字符串上执行,而不是循环,因为循环更昂贵

    In [29]: %%time
        ...: replaced = tobe_replaced.sub("replaced", "\n".join(x)).split("\n")
        ...:
        ...:
    Wall time: 291 ms
    
    In [30]: %%time
        ...: replaced = "\n".join(x).replace("str", "replaced").split("\n")
        ...:
        ...:
        ...:
    Wall time: 132 ms
    

  • 希望这有帮助。

    您能提供此函数的输入和输出示例吗?我很难理解你想做什么。