Python 为什么mp.Pool().map()比ProcessPoolExecutor().map()慢
我有一段愚蠢的代码来解释我在工作中遇到的一种行为:Python 为什么mp.Pool().map()比ProcessPoolExecutor().map()慢,python,multiprocessing,pool,concurrent.futures,Python,Multiprocessing,Pool,Concurrent.futures,我有一段愚蠢的代码来解释我在工作中遇到的一种行为: from multiprocessing import Pool from concurrent.futures import ProcessPoolExecutor from struct import pack from time import time def packer(integer): return pack('i', integer) if __name__=='__main__': pool1 =
from multiprocessing import Pool
from concurrent.futures import ProcessPoolExecutor
from struct import pack
from time import time
def packer(integer):
return pack('i', integer)
if __name__=='__main__':
pool1 = Pool()
pool2 = ProcessPoolExecutor()
nums = list(range(10**4))
start = time()
res1 = pool1.map(packer, nums)
print (f'total mp pool: {time() - start}')
start = time()
res2 = pool2.map(packer, nums)
print (f'total futures pool: {time() - start}')
pool1.close()
我得到(Python 3.8.1):
在工作中,我将代码从mp.Pool()
修改为concurrent.futures
,以允许在进程和线程之间移动
然后,我发现异常传播在concurrent.futures
中非常可怕。回到mp.Pool()
,我发现性能有所下降
我知道concurrent.futures.ProcessPoolExecutor
应该是一个更高级别的API,它比mp.Pool()快多少
我看到ProcessPoolExecutor.map
只是:
super().map(partial(_process_chunk, fn),
_get_chunks(*iterables, chunksize=chunksize),
timeout=timeout)
其中super
是:
这就是我迷路的地方
mp.Pool
和ProcessPoolExecutor
是否进入不同的兔子洞?通过手动调用mp
/Pool
/map
,使用正确的参数,是否可以从ProcessPoolExecutor
获取“好东西”?您显示的计时不支持您的声明,Pool.map()
花费的时间更少。顺便说一句,您可以使用多处理.dummy.Pool
来使用线程而不是进程。您是对的,我必须更准确地重述我在工作中的经历
super().map(partial(_process_chunk, fn),
_get_chunks(*iterables, chunksize=chunksize),
timeout=timeout)
def map(self, fn, *iterables, timeout=None, chunksize=1):
"""Returns an iterator equivalent to map(fn, iter).
Args:
fn: A callable that will take as many arguments as there are
passed iterables.
timeout: The maximum number of seconds to wait. If None, then there
is no limit on the wait time.
chunksize: The size of the chunks the iterable will be broken into
before being passed to a child process. This argument is only
used by ProcessPoolExecutor; it is ignored by
ThreadPoolExecutor.
Returns:
An iterator equivalent to: map(func, *iterables) but the calls may
be evaluated out-of-order.
Raises:
TimeoutError: If the entire result iterator could not be generated
before the given timeout.
Exception: If fn(*args) raises for any values.
"""
if timeout is not None:
end_time = timeout + time.monotonic()
fs = [self.submit(fn, *args) for args in zip(*iterables)]
# Yield must be hidden in closure so that the futures are submitted
# before the first iterator value is required.
def result_iterator():
try:
# reverse to keep finishing order
fs.reverse()
while fs:
# Careful not to keep a reference to the popped future
if timeout is None:
yield fs.pop().result()
else:
yield fs.pop().result(end_time - time.monotonic())
finally:
for future in fs:
future.cancel()
return result_iterator()