Python 为什么multiprocessing.Pool.map\u async中的get()操作需要这么长时间?
因此,我尝试在Python中并行一些代码,在Python 为什么multiprocessing.Pool.map\u async中的get()操作需要这么长时间?,python,parallel-processing,multiprocessing,parallelism-amdahl,Python,Parallel Processing,Multiprocessing,Parallelism Amdahl,因此,我尝试在Python中并行一些代码,在多处理.Pool()实例上使用.map\u async()方法 我注意到, Line1大约需要千分之一秒, Line2大约需要0.3秒 有没有更好的方法来实现这一点,或者有什么方法可以绕过由Line2, 或 我做错什么了吗 (我对这一点还不太熟悉。) 我做错什么了吗 不要惊慌,很多用户都是这样做的——支付的比收到的多。 这是一个常见的讲座,不是关于使用一些“有前途的”语法构造函数,而是关于支付使用它的实际成本 故事很长,效果很简单-你期望一个低挂果实,
多处理.Pool()
实例上使用.map\u async()方法
我注意到,Line1
大约需要千分之一秒,Line2
大约需要0.3秒
有没有更好的方法来实现这一点,或者有什么方法可以绕过由Line2
,或
我做错什么了吗 (我对这一点还不太熟悉。) 我做错什么了吗 不要惊慌,很多用户都是这样做的——支付的比收到的多。 这是一个常见的讲座,不是关于使用一些“有前途的”语法构造函数,而是关于支付使用它的实际成本 故事很长,效果很简单-你期望一个低挂果实,但不得不支付流程实例化、工作包重新分发和结果收集的巨大成本,所有这些只是为了做几轮
func()
-调用
哇?停下来
并行化给我带来了,这将加快处理速度?!? 让我们定量地衡量实际的代码执行时间,而不是情绪,对吗 基准测试始终是一项公平的举措。
它帮助我们,凡人,逃避公正的期望
让我们自己进入由知识支持的证据的定量记录中:
import multiprocessing as mp
import numpy as np
pool = mp.Pool( processes = 4 )
inp = np.linspace( 0.01, 1.99, 100 )
result = pool.map_async( func, inp ) #Line1 ( func is some Python function which acts on input )
output = result.get() #Line2
def HowMuchWillWePAY2MAP( aFun2TEST = a_NOP_FUN, PROCESSES_TO_SPAWN = 4, RUNS_TO_RUN = 1 ):
from zmq import Stopwatch; aClk = Stopwatch()
try:
import numpy as np
import multiprocessing as mp
pool = mp.Pool( processes = PROCESSES_TO_SPAWN )
inp = np.linspace( 0.01, 1.99, 100 )
aClk.start()
for i in xrange( RUNS_TO_RUN ):
pass; result = pool.map_async( aFun2TEST, inp )
output = result.get()
pass
except:
pass
finally:
try:
_ = aClk.stop()
except:
_ = -1
pass
pass; pMASK = "CLK:: {0:_>24d} [us] @{1: >4d}-PROCs ran{2: >6d} RUNS {3:}"
print( pMASK.format( _,
PROCESSES_TO_SPAWN,
RUNS_TO_RUN,
" ".join( repr( aFun2TEST ).split( " ")[:2] )
)
)
现状测试: 在向前移动之前,应记录这一对:
from zmq import Stopwatch; aClk = Stopwatch() # this is a handy tool to do so
如果希望使用任何其他工具(如所述的multiprocessing.Pool()
或其他工具)扩展实验,这将设置性能封套之间的跨度,从纯[SEQ]调用到未优化的joblib.Parallel()
或任何其他
测试用例A: 意图:
为了度量{process | job}实例化的成本,我们需要一个NOP工作包负载,它将几乎不花费任何“那里”,而是返回“回来”,并且不需要支付任何额外的附加成本(无论是任何输入参数的传输还是返回任何值)
因此,安装开销附加成本比较如下:
def a_NOP_FUN( aNeverConsumedPAR ):
""" __doc__
The intent of this FUN() is indeed to do nothing at all,
so as to be able to benchmark
all the process-instantiation
add-on overhead costs.
"""
pass
在
多处理.Pool()
实例上使用轻量级.map\u async()
方法的策略:
所以,
第一组痛苦和惊喜
直接来自于在并发池中不做任何事情的实际成本,即joblib.Parallel():
import multiprocessing as mp
import numpy as np
pool = mp.Pool( processes = 4 )
inp = np.linspace( 0.01, 1.99, 100 )
result = pool.map_async( func, inp ) #Line1 ( func is some Python function which acts on input )
output = result.get() #Line2
def HowMuchWillWePAY2MAP( aFun2TEST = a_NOP_FUN, PROCESSES_TO_SPAWN = 4, RUNS_TO_RUN = 1 ):
from zmq import Stopwatch; aClk = Stopwatch()
try:
import numpy as np
import multiprocessing as mp
pool = mp.Pool( processes = PROCESSES_TO_SPAWN )
inp = np.linspace( 0.01, 1.99, 100 )
aClk.start()
for i in xrange( RUNS_TO_RUN ):
pass; result = pool.map_async( aFun2TEST, inp )
output = result.get()
pass
except:
pass
finally:
try:
_ = aClk.stop()
except:
_ = -1
pass
pass; pMASK = "CLK:: {0:_>24d} [us] @{1: >4d}-PROCs ran{2: >6d} RUNS {3:}"
print( pMASK.format( _,
PROCESSES_TO_SPAWN,
RUNS_TO_RUN,
" ".join( repr( aFun2TEST ).split( " ")[:2] )
)
)
如果您的平台将停止分配请求的内存块,那么我们将面临另一类问题(如果试图以物理资源不可知的方式并行,则会出现一类隐藏的玻璃天花板)。人们可以编辑SIZE1D
缩放,以便至少适合平台RAM寻址/大小调整功能,然而,现实世界问题计算的性能范围仍然是我们非常感兴趣的:
def a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR( aNeverConsumedPAR, SIZE1D = 1000 ):
""" __doc__
The intent of this FUN() is to do nothing but
a MEM-allocation
so as to be able to benchmark
all the process-instantiation
add-on overhead costs.
"""
import numpy as np # yes, deferred import, libs do defer imports
aMemALLOC = np.zeros( ( SIZE1D, # so as to set
SIZE1D, # realistic ceilings
SIZE1D, # as how big the "Big Data"
SIZE1D # may indeed grow into
),
dtype = np.float64,
order = 'F'
) # .ALLOC + .SET
aMemALLOC[2,3,4,5] = 8.7654321 # .SET
aMemALLOC[3,3,4,5] = 1.2345678 # .SET
return aMemALLOC[2:3,3,4,5]
可能产生一种支付成本,介于
0.1[s]
和+9[s]
(!!)只是为了什么也不做,但现在也不忘一些现实的MEM分配附加成本“那里”
CLK::\uuuuuuuuuuuuuuuuuuuuuuuuuuu116310[us]@4-JOBs run 10运行ap\u async()
刚刚开始处理。另一方面,get()
必须等待所有进程完成并产生结果。您还期望发生什么?如果您的目标是在结果可用时获得结果,而不是等待所有任务完成,您通常会迭代imap
的结果(或者如果您不关心排序,imap\u unordered
,以提高速度)。
CLK:: __________________117463 [us] @ 4-JOBs ran 10 RUNS <function a_NOP_FUN
CLK:: __________________111182 [us] @ 3-JOBs ran 100 RUNS <function a_NOP_FUN
CLK:: __________________110229 [us] @ 3-JOBs ran 100 RUNS <function a_NOP_FUN
CLK:: __________________110095 [us] @ 3-JOBs ran 100 RUNS <function a_NOP_FUN
CLK:: __________________111794 [us] @ 3-JOBs ran 100 RUNS <function a_NOP_FUN
CLK:: __________________110030 [us] @ 3-JOBs ran 100 RUNS <function a_NOP_FUN
CLK:: __________________110697 [us] @ 3-JOBs ran 100 RUNS <function a_NOP_FUN
CLK:: _________________4605843 [us] @ 123-JOBs ran 100 RUNS <function a_NOP_FUN
CLK:: __________________336208 [us] @ 123-JOBs ran 100 RUNS <function a_NOP_FUN
CLK:: __________________298816 [us] @ 123-JOBs ran 100 RUNS <function a_NOP_FUN
CLK:: __________________355492 [us] @ 123-JOBs ran 100 RUNS <function a_NOP_FUN
CLK:: __________________320837 [us] @ 123-JOBs ran 100 RUNS <function a_NOP_FUN
CLK:: __________________308365 [us] @ 123-JOBs ran 100 RUNS <function a_NOP_FUN
CLK:: __________________372762 [us] @ 123-JOBs ran 100 RUNS <function a_NOP_FUN
CLK:: __________________304228 [us] @ 123-JOBs ran 100 RUNS <function a_NOP_FUN
CLK:: __________________337537 [us] @ 123-JOBs ran 100 RUNS <function a_NOP_FUN
CLK:: __________________941775 [us] @ 123-JOBs ran 10000 RUNS <function a_NOP_FUN
CLK:: __________________987440 [us] @ 123-JOBs ran 10000 RUNS <function a_NOP_FUN
CLK:: _________________1080024 [us] @ 123-JOBs ran 10000 RUNS <function a_NOP_FUN
CLK:: _________________1108432 [us] @ 123-JOBs ran 10000 RUNS <function a_NOP_FUN
CLK:: _________________7525874 [us] @ 123-JOBs ran100000 RUNS <function a_NOP_FUN
def a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR( aNeverConsumedPAR, SIZE1D = 1000 ):
""" __doc__
The intent of this FUN() is to do nothing but
a MEM-allocation
so as to be able to benchmark
all the process-instantiation
add-on overhead costs.
"""
import numpy as np # yes, deferred import, libs do defer imports
aMemALLOC = np.zeros( ( SIZE1D, # so as to set
SIZE1D, # realistic ceilings
SIZE1D, # as how big the "Big Data"
SIZE1D # may indeed grow into
),
dtype = np.float64,
order = 'F'
) # .ALLOC + .SET
aMemALLOC[2,3,4,5] = 8.7654321 # .SET
aMemALLOC[3,3,4,5] = 1.2345678 # .SET
return aMemALLOC[2:3,3,4,5]
>>> HowMuchWillWePAY2RUN( a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR, 200, 1000 )
CLK:: __________________116310 [us] @ 4-JOBs ran 10 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________120054 [us] @ 4-JOBs ran 10 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________129441 [us] @ 10-JOBs ran 100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________123721 [us] @ 10-JOBs ran 100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________127126 [us] @ 10-JOBs ran 100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________124028 [us] @ 10-JOBs ran 100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________305234 [us] @ 100-JOBs ran 100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________243386 [us] @ 100-JOBs ran 100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________241410 [us] @ 100-JOBs ran 100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________267275 [us] @ 100-JOBs ran 100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________244207 [us] @ 100-JOBs ran 100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________653879 [us] @ 100-JOBs ran 1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________405149 [us] @ 100-JOBs ran 1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________351182 [us] @ 100-JOBs ran 1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________362030 [us] @ 100-JOBs ran 1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: _________________9325428 [us] @ 200-JOBs ran 1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________680429 [us] @ 200-JOBs ran 1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________533559 [us] @ 200-JOBs ran 1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: _________________1125190 [us] @ 200-JOBs ran 1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________591109 [us] @ 200-JOBs ran 1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR