Python 列表理解真的比str.replace慢很多吗？_Python_Performance_List Comprehension

Python 列表理解真的比str.replace慢很多吗？

python performance

Python 列表理解真的比str.replace慢很多吗？,python,performance,list-comprehension,Python,Performance,List Comprehension,我正在测试不同版本的字符串清理，遇到了以下效果。我很难说这是否真的是IPython的%timeit警告缓存的结果，或者这是真的。请告知： str.replace： def sanit2(s): for c in ["'", '%', '"']: s=s.replace(c,'') return s In [44]: %timeit sanit2(r""" ' ' % a % ' """) The slowest run took 12

我正在测试不同版本的字符串清理，遇到了以下效果。我很难说这是否真的是IPython的

%timeit

警告缓存的结果，或者这是真的。请告知：

str.replace

：

def sanit2(s):    
    for c in ["'", '%', '"']:
        s=s.replace(c,'')
    return s


In [44]: %timeit sanit2(r"""   '   '    % a % '   """)
The slowest run took 12.43 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 985 ns per loop

列表理解：

def sanit3(s):    
    removed = [x for x in s if not x in ["'", '%', '"']]
    return ''.join(removed)


In [42]: %timeit sanit3(r"""   '   '    % a % '   """)
The slowest run took 8.95 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 2.12 µs per loop

这似乎也适用于相对较长的字符串：

In [46]: reallylong = r"""   '   '    % a % '   """ * 1000

In [47]: len(reallylong)
Out[47]: 22000


In [48]: %timeit sanit2(reallylong)
The slowest run took 4.94 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 96.9 µs per loop


In [49]: %timeit sanit3(reallylong)
1000 loops, best of 3: 1.9 ms per loop

UPDATE：我假设

str.replace

也有或多或少的O（n）复杂性，所以我预计

sanit2

和

sanit3

都有大约O（n^2）复杂性

我根据字符串长度测试了

str.replace

的成本：

In [59]: orig_str = r"""   '   '    % a % '   """


In [60]: for i in range(1,11):
   ....:     longer = orig_str * i * 1000
   ....:     %timeit longer.replace('%', '')
   ....:
10000 loops, best of 3: 44.2 µs per loop
10000 loops, best of 3: 87.8 µs per loop
10000 loops, best of 3: 131 µs per loop
10000 loops, best of 3: 177 µs per loop
1000 loops, best of 3: 219 µs per loop
1000 loops, best of 3: 259 µs per loop
1000 loops, best of 3: 311 µs per loop
1000 loops, best of 3: 349 µs per loop
1000 loops, best of 3: 398 µs per loop
1000 loops, best of 3: 435 µs per loop


In [61]: t="""10000 loops, best of 3: 44.2 s per loop
   ....: 10000 loops, best of 3: 87.8 s per loop
   ....: 10000 loops, best of 3: 131 s per loop
   ....: 10000 loops, best of 3: 177 s per loop
   ....: 1000 loops, best of 3: 219 s per loop
   ....: 1000 loops, best of 3: 259 s per loop
   ....: 1000 loops, best of 3: 311 s per loop
   ....: 1000 loops, best of 3: 349 s per loop
   ....: 1000 loops, best of 3: 398 s per loop
   ....: 1000 loops, best of 3: 435 s per loop"""

看起来是线性的，但我计算过以确保：

In [63]: averages=[]   


In [66]: for idx, line in enumerate(t.split('\n')):
   ....:     repl_time = line.rsplit(':',1)[1].split(' ')[1]
   ....:     averages.append(float(repl_time)/(idx+1))
   ....:

In [67]: averages
Out[67]:
[44.2,
 43.9,
 43.666666666666664,
 44.25,
 43.8,
 43.166666666666664,
 44.42857142857143,
 43.625,
 44.22222222222222,
 43.5]

是的，

str.replace

几乎完全是O（n）。因此，在迭代要替换的字符列表的基础上，

sanit2

应该具有O（n^2）复杂性，就像

sanit3

（

x for x in s

=>迭代要替换的字符串的字符，O（n）。

…x in[“'”，“%，“”]

也应该是O（n）

列表。\uu包含

成本（n^2））

因此，作为对

chepner

的回答，是的，

sanit2

执行固定数量的函数调用（并且很少，在本例中只有3个），但是由于

str.replace

的内部成本，似乎

sanit2

的复杂性顺序应该与

sanit3

类似

差异是否都是由于

str.replace

是在C中实现的，或者函数调用（

list.\uu包含\uuu

）也起着重要作用？

sant2

对C中实现的字符串方法进行固定数量的调用，与

的长度无关

sanit3

对

列表进行可变次数的调用（在s
中每个元素调用一次）。\uuuu包含\uuuu

，它本身使用的是一个O（n）算法，而不是O（1）算法。它还必须构造一个

列表

对象，然后调用

''。在该列表上加入
sanit2
的速度更快并不奇怪。
如果您想要一个快速的解决方案reallyllong.translate（None，“%”%”）
，在python中逐字符查找字符串会很慢，替换发生在c级。另一个很好的例子是collections.Counter（reallyllong）
vs{c:reallyllong.count（c）对于集合中的c（reallyllong）}
。第二个将很容易优于第一个，因为它发生在c级别。@Padraiccnningham:太好了！我想这就是我将用作实际解决方案的原因。但是，我在这个问题中的目标是了解性能差异背后的原因。两个调用都是O（n），因为您要替换的字符集是固定的，并且与输入长度无关。sanit2
对str.replace
进行固定次数的调用，sanit3
对[“'”、“%”、“'”]
中的x进行线性次数的调用，每个调用取O（1）时间。两者之间的区别在于你调用的方法的数量和C与Python中的工作量的组合。@4u，这基本上取决于C级字符串操作的优化程度。我认为运算速度快得多，这就是为什么OP如此令人惊讶的原因。str.replace必须检查s中的每个字符*要检查的字符，即使使用一个集合进行查找并创建一次集合，同时增加要检查的字符的长度，仍然会留下替换锤击列表的方法。@PadraicCunningham：“我认为OP如此惊讶的原因是速度快了多少”-这正是我想要弄明白的（我能看到我自己更快，但为什么这么快）@chepner:请看更新。