Warning: file_get_contents(/data/phpspider/zhask/data//catemap/6/google-chrome/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 为什么multiprocessing.Pool.map\u async中的get()操作需要这么长时间?_Python_Parallel Processing_Multiprocessing_Parallelism Amdahl - Fatal编程技术网

Python 为什么multiprocessing.Pool.map\u async中的get()操作需要这么长时间?

Python 为什么multiprocessing.Pool.map\u async中的get()操作需要这么长时间?,python,parallel-processing,multiprocessing,parallelism-amdahl,Python,Parallel Processing,Multiprocessing,Parallelism Amdahl,因此,我尝试在Python中并行一些代码,在多处理.Pool()实例上使用.map\u async()方法 我注意到, Line1大约需要千分之一秒, Line2大约需要0.3秒 有没有更好的方法来实现这一点,或者有什么方法可以绕过由Line2, 或 我做错什么了吗 (我对这一点还不太熟悉。) 我做错什么了吗 不要惊慌,很多用户都是这样做的——支付的比收到的多。 这是一个常见的讲座,不是关于使用一些“有前途的”语法构造函数,而是关于支付使用它的实际成本 故事很长,效果很简单-你期望一个低挂果实,

因此,我尝试在Python中并行一些代码,在
多处理.Pool()
实例上使用.map\u async()方法

我注意到,
Line1
大约需要千分之一秒,
Line2
大约需要0.3秒

有没有更好的方法来实现这一点,或者有什么方法可以绕过由
Line2


我做错什么了吗

(我对这一点还不太熟悉。)

我做错什么了吗

不要惊慌,很多用户都是这样做的——支付的比收到的多。 这是一个常见的讲座,不是关于使用一些“有前途的”语法构造函数,而是关于支付使用它的实际成本

故事很长,效果很简单-你期望一个低挂果实,但不得不支付流程实例化、工作包重新分发和结果收集的巨大成本,所有这些只是为了做几轮
func()
-调用


哇?停下来
并行化给我带来了,这将加快处理速度?!?

让我们定量地衡量实际的代码执行时间,而不是情绪,对吗

基准测试始终是一项公平的举措。
它帮助我们,凡人,逃避公正的期望
让我们自己进入由知识支持的证据的定量记录中:

import multiprocessing as mp
import numpy as np

pool   = mp.Pool( processes = 4 )
inp    = np.linspace( 0.01, 1.99, 100 )
result = pool.map_async( func, inp ) #Line1 ( func is some Python function which acts on input )
output = result.get()                #Line2
def HowMuchWillWePAY2MAP( aFun2TEST = a_NOP_FUN, PROCESSES_TO_SPAWN = 4, RUNS_TO_RUN = 1 ):
    from zmq import Stopwatch; aClk = Stopwatch()
    try:
         import numpy           as np
         import multiprocessing as mp

         pool = mp.Pool( processes = PROCESSES_TO_SPAWN )
         inp  = np.linspace( 0.01, 1.99, 100 )

         aClk.start()
         for i in xrange( RUNS_TO_RUN ):
             pass;    result = pool.map_async( aFun2TEST, inp )
             output = result.get()
         pass
    except:
         pass
    finally:
         try:
             _ = aClk.stop()
         except:
             _ = -1
             pass
    pass;  pMASK = "CLK:: {0:_>24d} [us] @{1: >4d}-PROCs ran{2: >6d} RUNS {3:}"
    print( pMASK.format( _,
                         PROCESSES_TO_SPAWN,
                         RUNS_TO_RUN,
                         " ".join( repr( aFun2TEST ).split( " ")[:2] )
                         )
            )

现状测试: 在向前移动之前,应记录这一对:

from zmq import Stopwatch; aClk = Stopwatch() # this is a handy tool to do so
如果希望使用任何其他工具(如所述的
multiprocessing.Pool()
或其他工具)扩展实验,这将设置性能封套之间的跨度,从纯[SEQ]调用到未优化的
joblib.Parallel()
或任何其他


测试用例A: 意图:
为了度量{process | job}实例化的成本,我们需要一个NOP工作包负载,它将几乎不花费任何“那里”,而是返回“回来”,并且不需要支付任何额外的附加成本(无论是任何输入参数的传输还是返回任何值)


因此,安装开销附加成本比较如下:

def a_NOP_FUN( aNeverConsumedPAR ):
    """                                                 __doc__
    The intent of this FUN() is indeed to do nothing at all,
                             so as to be able to benchmark
                             all the process-instantiation
                             add-on overhead costs.
    """
    pass

多处理.Pool()
实例上使用轻量级
.map\u async()
方法的策略:
所以,
第一组痛苦和惊喜
直接来自于在并发池中不做任何事情的实际成本,即joblib.Parallel():

import multiprocessing as mp
import numpy as np

pool   = mp.Pool( processes = 4 )
inp    = np.linspace( 0.01, 1.99, 100 )
result = pool.map_async( func, inp ) #Line1 ( func is some Python function which acts on input )
output = result.get()                #Line2
def HowMuchWillWePAY2MAP( aFun2TEST = a_NOP_FUN, PROCESSES_TO_SPAWN = 4, RUNS_TO_RUN = 1 ):
    from zmq import Stopwatch; aClk = Stopwatch()
    try:
         import numpy           as np
         import multiprocessing as mp

         pool = mp.Pool( processes = PROCESSES_TO_SPAWN )
         inp  = np.linspace( 0.01, 1.99, 100 )

         aClk.start()
         for i in xrange( RUNS_TO_RUN ):
             pass;    result = pool.map_async( aFun2TEST, inp )
             output = result.get()
         pass
    except:
         pass
    finally:
         try:
             _ = aClk.stop()
         except:
             _ = -1
             pass
    pass;  pMASK = "CLK:: {0:_>24d} [us] @{1: >4d}-PROCs ran{2: >6d} RUNS {3:}"
    print( pMASK.format( _,
                         PROCESSES_TO_SPAWN,
                         RUNS_TO_RUN,
                         " ".join( repr( aFun2TEST ).split( " ")[:2] )
                         )
            )
如果您的平台将停止分配请求的内存块,那么我们将面临另一类问题(如果试图以物理资源不可知的方式并行,则会出现一类隐藏的玻璃天花板)。人们可以编辑
SIZE1D
缩放,以便至少适合平台RAM寻址/大小调整功能,然而,现实世界问题计算的性能范围仍然是我们非常感兴趣的:

def a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR( aNeverConsumedPAR, SIZE1D = 1000 ):
    """                                                 __doc__
    The intent of this FUN() is to do nothing but
                             a MEM-allocation
                             so as to be able to benchmark
                             all the process-instantiation
                             add-on overhead costs.
    """
    import numpy as np              # yes, deferred import, libs do defer imports
    aMemALLOC = np.zeros( ( SIZE1D, #       so as to set
                            SIZE1D, #       realistic ceilings
                            SIZE1D, #       as how big the "Big Data"
                            SIZE1D  #       may indeed grow into
                            ),
                          dtype = np.float64,
                          order = 'F'
                          )         # .ALLOC + .SET
    aMemALLOC[2,3,4,5] = 8.7654321  # .SET
    aMemALLOC[3,3,4,5] = 1.2345678  # .SET

    return aMemALLOC[2:3,3,4,5]
可能产生
一种支付成本,介于
0.1[s]
+9[s]
(!!)
只是为了什么也不做,但现在也不忘一些现实的MEM分配附加成本“那里”


CLK::\uuuuuuuuuuuuuuuuuuuuuuuuuuu116310[us]@4-JOBs run 10运行
ap\u async()
刚刚开始处理。另一方面,
get()
必须等待所有进程完成并产生结果。您还期望发生什么?如果您的目标是在结果可用时获得结果,而不是等待所有任务完成,您通常会迭代
imap
的结果(或者如果您不关心排序,
imap\u unordered
,以提高速度)。
 CLK:: __________________117463 [us] @   4-JOBs ran    10 RUNS <function a_NOP_FUN
 CLK:: __________________111182 [us] @   3-JOBs ran   100 RUNS <function a_NOP_FUN
 CLK:: __________________110229 [us] @   3-JOBs ran   100 RUNS <function a_NOP_FUN
 CLK:: __________________110095 [us] @   3-JOBs ran   100 RUNS <function a_NOP_FUN
 CLK:: __________________111794 [us] @   3-JOBs ran   100 RUNS <function a_NOP_FUN
 CLK:: __________________110030 [us] @   3-JOBs ran   100 RUNS <function a_NOP_FUN
 CLK:: __________________110697 [us] @   3-JOBs ran   100 RUNS <function a_NOP_FUN
 CLK:: _________________4605843 [us] @ 123-JOBs ran   100 RUNS <function a_NOP_FUN
 CLK:: __________________336208 [us] @ 123-JOBs ran   100 RUNS <function a_NOP_FUN
 CLK:: __________________298816 [us] @ 123-JOBs ran   100 RUNS <function a_NOP_FUN
 CLK:: __________________355492 [us] @ 123-JOBs ran   100 RUNS <function a_NOP_FUN
 CLK:: __________________320837 [us] @ 123-JOBs ran   100 RUNS <function a_NOP_FUN
 CLK:: __________________308365 [us] @ 123-JOBs ran   100 RUNS <function a_NOP_FUN
 CLK:: __________________372762 [us] @ 123-JOBs ran   100 RUNS <function a_NOP_FUN
 CLK:: __________________304228 [us] @ 123-JOBs ran   100 RUNS <function a_NOP_FUN
 CLK:: __________________337537 [us] @ 123-JOBs ran   100 RUNS <function a_NOP_FUN
 CLK:: __________________941775 [us] @ 123-JOBs ran 10000 RUNS <function a_NOP_FUN
 CLK:: __________________987440 [us] @ 123-JOBs ran 10000 RUNS <function a_NOP_FUN
 CLK:: _________________1080024 [us] @ 123-JOBs ran 10000 RUNS <function a_NOP_FUN
 CLK:: _________________1108432 [us] @ 123-JOBs ran 10000 RUNS <function a_NOP_FUN
 CLK:: _________________7525874 [us] @ 123-JOBs ran100000 RUNS <function a_NOP_FUN
def a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR( aNeverConsumedPAR, SIZE1D = 1000 ):
    """                                                 __doc__
    The intent of this FUN() is to do nothing but
                             a MEM-allocation
                             so as to be able to benchmark
                             all the process-instantiation
                             add-on overhead costs.
    """
    import numpy as np              # yes, deferred import, libs do defer imports
    aMemALLOC = np.zeros( ( SIZE1D, #       so as to set
                            SIZE1D, #       realistic ceilings
                            SIZE1D, #       as how big the "Big Data"
                            SIZE1D  #       may indeed grow into
                            ),
                          dtype = np.float64,
                          order = 'F'
                          )         # .ALLOC + .SET
    aMemALLOC[2,3,4,5] = 8.7654321  # .SET
    aMemALLOC[3,3,4,5] = 1.2345678  # .SET

    return aMemALLOC[2:3,3,4,5]
>>> HowMuchWillWePAY2RUN( a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR, 200, 1000 )
CLK:: __________________116310 [us] @   4-JOBs ran    10 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________120054 [us] @   4-JOBs ran    10 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________129441 [us] @  10-JOBs ran   100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________123721 [us] @  10-JOBs ran   100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________127126 [us] @  10-JOBs ran   100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________124028 [us] @  10-JOBs ran   100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________305234 [us] @ 100-JOBs ran   100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________243386 [us] @ 100-JOBs ran   100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________241410 [us] @ 100-JOBs ran   100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________267275 [us] @ 100-JOBs ran   100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________244207 [us] @ 100-JOBs ran   100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________653879 [us] @ 100-JOBs ran  1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________405149 [us] @ 100-JOBs ran  1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________351182 [us] @ 100-JOBs ran  1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________362030 [us] @ 100-JOBs ran  1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: _________________9325428 [us] @ 200-JOBs ran  1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________680429 [us] @ 200-JOBs ran  1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________533559 [us] @ 200-JOBs ran  1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: _________________1125190 [us] @ 200-JOBs ran  1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________591109 [us] @ 200-JOBs ran  1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR