Python 为什么multiprocessing.Pool.map\u async中的get（）操作需要这么长时间？_Python_Parallel Processing_Multiprocessing_Parallelism Amdahl

Python 为什么multiprocessing.Pool.map\u async中的get（）操作需要这么长时间？

python parallel-processing

Python 为什么multiprocessing.Pool.map\u async中的get（）操作需要这么长时间？,python,parallel-processing,multiprocessing,parallelism-amdahl,Python,Parallel Processing,Multiprocessing,Parallelism Amdahl,因此，我尝试在Python中并行一些代码，在多处理.Pool（）实例上使用.map\u async（）方法我注意到， Line1大约需要千分之一秒， Line2大约需要0.3秒有没有更好的方法来实现这一点，或者有什么方法可以绕过由Line2，或我做错什么了吗（我对这一点还不太熟悉。）我做错什么了吗不要惊慌，很多用户都是这样做的——支付的比收到的多。这是一个常见的讲座，不是关于使用一些“有前途的”语法构造函数，而是关于支付使用它的实际成本故事很长，效果很简单-你期望一个低挂果实，

因此，我尝试在Python中并行一些代码，在

多处理.Pool（）

实例上使用.map\u async（）方法

我注意到，

Line1

大约需要千分之一秒，

Line2

大约需要0.3秒

有没有更好的方法来实现这一点，或者有什么方法可以绕过由

Line2

，
或
我做错什么了吗

（我对这一点还不太熟悉。）

我做错什么了吗

不要惊慌，很多用户都是这样做的——支付的比收到的多。这是一个常见的讲座，不是关于使用一些“有前途的”语法构造函数，而是关于支付使用它的实际成本

故事很长，效果很简单-你期望一个低挂果实，但不得不支付流程实例化、工作包重新分发和结果收集的巨大成本，所有这些只是为了做几轮

func（）

-调用

哇？停下来
并行化给我带来了，这将加快处理速度？！？

让我们定量地衡量实际的代码执行时间，而不是情绪，对吗

基准测试始终是一项公平的举措。
它帮助我们，凡人，逃避公正的期望
让我们自己进入由知识支持的证据的定量记录中：

import multiprocessing as mp
import numpy as np

pool   = mp.Pool( processes = 4 )
inp    = np.linspace( 0.01, 1.99, 100 )
result = pool.map_async( func, inp ) #Line1 ( func is some Python function which acts on input )
output = result.get()                #Line2

def HowMuchWillWePAY2MAP( aFun2TEST = a_NOP_FUN, PROCESSES_TO_SPAWN = 4, RUNS_TO_RUN = 1 ):
    from zmq import Stopwatch; aClk = Stopwatch()
    try:
         import numpy           as np
         import multiprocessing as mp

         pool = mp.Pool( processes = PROCESSES_TO_SPAWN )
         inp  = np.linspace( 0.01, 1.99, 100 )

         aClk.start()
         for i in xrange( RUNS_TO_RUN ):
             pass;    result = pool.map_async( aFun2TEST, inp )
             output = result.get()
         pass
    except:
         pass
    finally:
         try:
             _ = aClk.stop()
         except:
             _ = -1
             pass
    pass;  pMASK = "CLK:: {0:_>24d} [us] @{1: >4d}-PROCs ran{2: >6d} RUNS {3:}"
    print( pMASK.format( _,
                         PROCESSES_TO_SPAWN,
                         RUNS_TO_RUN,
                         " ".join( repr( aFun2TEST ).split( " ")[:2] )
                         )
            )

现状测试：在向前移动之前，应记录这一对：

from zmq import Stopwatch; aClk = Stopwatch() # this is a handy tool to do so

如果希望使用任何其他工具（如所述的

multiprocessing.Pool（）

或其他工具）扩展实验，这将设置性能封套之间的跨度，从纯[SEQ]调用到未优化的

joblib.Parallel（）

或任何其他

测试用例A： 意图：
为了度量{process | job}实例化的成本，我们需要一个NOP工作包负载，它将几乎不花费任何“那里”，而是返回“回来”，并且不需要支付任何额外的附加成本（无论是任何输入参数的传输还是返回任何值）

因此，安装开销附加成本比较如下：

def a_NOP_FUN( aNeverConsumedPAR ):
    """                                                 __doc__
    The intent of this FUN() is indeed to do nothing at all,
                             so as to be able to benchmark
                             all the process-instantiation
                             add-on overhead costs.
    """
    pass

在

多处理.Pool（）

实例上使用轻量级

.map\u async（）

方法的策略：

所以，
第一组痛苦和惊喜
直接来自于在并发池中不做任何事情的实际成本，即joblib.Parallel（）：

import multiprocessing as mp
import numpy as np

pool   = mp.Pool( processes = 4 )
inp    = np.linspace( 0.01, 1.99, 100 )
result = pool.map_async( func, inp ) #Line1 ( func is some Python function which acts on input )
output = result.get()                #Line2

def HowMuchWillWePAY2MAP( aFun2TEST = a_NOP_FUN, PROCESSES_TO_SPAWN = 4, RUNS_TO_RUN = 1 ):
    from zmq import Stopwatch; aClk = Stopwatch()
    try:
         import numpy           as np
         import multiprocessing as mp

         pool = mp.Pool( processes = PROCESSES_TO_SPAWN )
         inp  = np.linspace( 0.01, 1.99, 100 )

         aClk.start()
         for i in xrange( RUNS_TO_RUN ):
             pass;    result = pool.map_async( aFun2TEST, inp )
             output = result.get()
         pass
    except:
         pass
    finally:
         try:
             _ = aClk.stop()
         except:
             _ = -1
             pass
    pass;  pMASK = "CLK:: {0:_>24d} [us] @{1: >4d}-PROCs ran{2: >6d} RUNS {3:}"
    print( pMASK.format( _,
                         PROCESSES_TO_SPAWN,
                         RUNS_TO_RUN,
                         " ".join( repr( aFun2TEST ).split( " ")[:2] )
                         )
            )

如果您的平台将停止分配请求的内存块，那么我们将面临另一类问题（如果试图以物理资源不可知的方式并行，则会出现一类隐藏的玻璃天花板）。人们可以编辑

SIZE1D

缩放，以便至少适合平台RAM寻址/大小调整功能，然而，现实世界问题计算的性能范围仍然是我们非常感兴趣的：

def a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR( aNeverConsumedPAR, SIZE1D = 1000 ):
    """                                                 __doc__
    The intent of this FUN() is to do nothing but
                             a MEM-allocation
                             so as to be able to benchmark
                             all the process-instantiation
                             add-on overhead costs.
    """
    import numpy as np              # yes, deferred import, libs do defer imports
    aMemALLOC = np.zeros( ( SIZE1D, #       so as to set
                            SIZE1D, #       realistic ceilings
                            SIZE1D, #       as how big the "Big Data"
                            SIZE1D  #       may indeed grow into
                            ),
                          dtype = np.float64,
                          order = 'F'
                          )         # .ALLOC + .SET
    aMemALLOC[2,3,4,5] = 8.7654321  # .SET
    aMemALLOC[3,3,4,5] = 1.2345678  # .SET

    return aMemALLOC[2:3,3,4,5]

可能产生
一种支付成本，介于0.1[s]
和
+9[s]
（！！）
只是为了什么也不做，但现在也不忘一些现实的MEM分配附加成本“那里”

CLK:：\uuuuuuuuuuuuuuuuuuuuuuuuuuu116310[us]@4-JOBs run 10运行ap\u async（）刚刚开始处理。另一方面，get（）必须等待所有进程完成并产生结果。您还期望发生什么？如果您的目标是在结果可用时获得结果，而不是等待所有任务完成，您通常会迭代imap 的结果（或者如果您不关心排序，imap\u unordered，以提高速度）。 CLK:: __________________117463 [us] @ 4-JOBs ran 10 RUNS <function a_NOP_FUN CLK:: __________________111182 [us] @ 3-JOBs ran 100 RUNS <function a_NOP_FUN CLK:: __________________110229 [us] @ 3-JOBs ran 100 RUNS <function a_NOP_FUN CLK:: __________________110095 [us] @ 3-JOBs ran 100 RUNS <function a_NOP_FUN CLK:: __________________111794 [us] @ 3-JOBs ran 100 RUNS <function a_NOP_FUN CLK:: __________________110030 [us] @ 3-JOBs ran 100 RUNS <function a_NOP_FUN CLK:: __________________110697 [us] @ 3-JOBs ran 100 RUNS <function a_NOP_FUN CLK:: _________________4605843 [us] @ 123-JOBs ran 100 RUNS <function a_NOP_FUN CLK:: __________________336208 [us] @ 123-JOBs ran 100 RUNS <function a_NOP_FUN CLK:: __________________298816 [us] @ 123-JOBs ran 100 RUNS <function a_NOP_FUN CLK:: __________________355492 [us] @ 123-JOBs ran 100 RUNS <function a_NOP_FUN CLK:: __________________320837 [us] @ 123-JOBs ran 100 RUNS <function a_NOP_FUN CLK:: __________________308365 [us] @ 123-JOBs ran 100 RUNS <function a_NOP_FUN CLK:: __________________372762 [us] @ 123-JOBs ran 100 RUNS <function a_NOP_FUN CLK:: __________________304228 [us] @ 123-JOBs ran 100 RUNS <function a_NOP_FUN CLK:: __________________337537 [us] @ 123-JOBs ran 100 RUNS <function a_NOP_FUN CLK:: __________________941775 [us] @ 123-JOBs ran 10000 RUNS <function a_NOP_FUN CLK:: __________________987440 [us] @ 123-JOBs ran 10000 RUNS <function a_NOP_FUN CLK:: _________________1080024 [us] @ 123-JOBs ran 10000 RUNS <function a_NOP_FUN CLK:: _________________1108432 [us] @ 123-JOBs ran 10000 RUNS <function a_NOP_FUN CLK:: _________________7525874 [us] @ 123-JOBs ran100000 RUNS <function a_NOP_FUN def a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR( aNeverConsumedPAR, SIZE1D = 1000 ): """ __doc__ The intent of this FUN() is to do nothing but a MEM-allocation so as to be able to benchmark all the process-instantiation add-on overhead costs. """ import numpy as np # yes, deferred import, libs do defer imports aMemALLOC = np.zeros( ( SIZE1D, # so as to set SIZE1D, # realistic ceilings SIZE1D, # as how big the "Big Data" SIZE1D # may indeed grow into ), dtype = np.float64, order = 'F' ) # .ALLOC + .SET aMemALLOC[2,3,4,5] = 8.7654321 # .SET aMemALLOC[3,3,4,5] = 1.2345678 # .SET return aMemALLOC[2:3,3,4,5] >>> HowMuchWillWePAY2RUN( a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR, 200, 1000 ) CLK:: __________________116310 [us] @ 4-JOBs ran 10 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR CLK:: __________________120054 [us] @ 4-JOBs ran 10 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR CLK:: __________________129441 [us] @ 10-JOBs ran 100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR CLK:: __________________123721 [us] @ 10-JOBs ran 100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR CLK:: __________________127126 [us] @ 10-JOBs ran 100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR CLK:: __________________124028 [us] @ 10-JOBs ran 100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR CLK:: __________________305234 [us] @ 100-JOBs ran 100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR CLK:: __________________243386 [us] @ 100-JOBs ran 100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR CLK:: __________________241410 [us] @ 100-JOBs ran 100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR CLK:: __________________267275 [us] @ 100-JOBs ran 100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR CLK:: __________________244207 [us] @ 100-JOBs ran 100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR CLK:: __________________653879 [us] @ 100-JOBs ran 1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR CLK:: __________________405149 [us] @ 100-JOBs ran 1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR CLK:: __________________351182 [us] @ 100-JOBs ran 1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR CLK:: __________________362030 [us] @ 100-JOBs ran 1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR CLK:: _________________9325428 [us] @ 200-JOBs ran 1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR CLK:: __________________680429 [us] @ 200-JOBs ran 1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR CLK:: __________________533559 [us] @ 200-JOBs ran 1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR CLK:: _________________1125190 [us] @ 200-JOBs ran 1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR CLK:: __________________591109 [us] @ 200-JOBs ran 1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR