Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/290.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python numpy ufuncs速度与环路速度之比_Python_Performance_Numpy_For Loop_Numpy Ufunc_C_Cython_Numba_Numexpr_Scipy - Fatal编程技术网

Python numpy ufuncs速度与环路速度之比

Python numpy ufuncs速度与环路速度之比,python,performance,numpy,for-loop,numpy-ufunc,c,cython,numba,numexpr,scipy,Python,Performance,Numpy,For Loop,Numpy Ufunc,C,Cython,Numba,Numexpr,Scipy,我读过很多“使用numpy避免循环”。所以,我试过了。我正在使用这段代码(简化版)。一些辅助数据: In[1]: import numpy as np resolution = 1000 # this parameter varies tim = np.linspace(-np.pi, np.pi, resolution) prec = np.arange(1, resolution +

我读过很多“使用numpy避免循环”。所以,我试过了。我正在使用这段代码(简化版)。一些辅助数据:

 In[1]: import numpy as np
        resolution = 1000                             # this parameter varies
        tim = np.linspace(-np.pi, np.pi, resolution) 
        prec = np.arange(1, resolution + 1)
        prec = 2 * prec - 1
        values = np.zeros_like(tim)
我的第一个实现是使用
for
循环:

 In[2]: for i, ti in enumerate(tim):
           values[i] = np.sum(np.sin(prec * ti))
然后,我去掉了显式的for循环,实现了这一点:

 In[3]: values = np.sum(np.sin(tim[:, np.newaxis] * prec), axis=1)
这种解决方案对于小型阵列来说速度更快,但当我放大时,我得到了这样的时间依赖性:

我错过了什么或者这是正常的行为?如果不是,在哪里挖掘

编辑:根据评论,这里是一些附加信息。使用IPython的
%timeit
%timeit
测量时间,每次运行都在新内核上执行。我的笔记本电脑是acer aspire v7-482pg(i7,8GB)。我正在使用:

  • python 3.5.2
  • numpy 1.11.2+mkl
  • 视窗10

    • 这是正常和预期的行为。它太简单了,无法在任何地方应用“使用numpy避免for循环”语句。如果你在处理内部循环,它(几乎)总是正确的。但在外部循环的情况下(就像您的情况一样),例外情况要多得多。特别是如果另一种选择是使用广播,因为这会通过使用更多的内存来加快操作速度

      只需为“使用numpy避免for循环”语句添加一点背景信息即可:

      NumPy数组存储为类型为的连续数组。Python
      int
      与C
      int
      不同!因此,每当迭代数组中的每个项时,都需要从数组中插入该项,将其转换为Python
      int
      ,然后对其执行任何操作,最后可能需要再次将其转换为c整数(称为装箱和取消装箱值)。例如,您希望使用Python对数组中的项进行求和:

      import numpy as np
      arr = np.arange(1000)
      %%timeit
      acc = 0
      for item in arr:
          acc += item
      # 1000 loops, best of 3: 478 µs per loop
      
      你最好使用numpy:

      %timeit np.sum(arr)
      # 10000 loops, best of 3: 24.2 µs per loop
      
      即使将循环推进到Python C代码中,也离numpy性能相差甚远:

      %timeit sum(arr)
      # 1000 loops, best of 3: 387 µs per loop
      
      这条规则可能会有例外,但这些例外非常稀少。至少只要有一些等价的numpy功能。因此,如果要迭代单个元素,那么应该使用numpy


      有时一个简单的python循环就足够了。它并没有被广泛宣传,但与Python函数相比,numpy函数有着巨大的开销。例如,考虑一个3元数组:

      arr = np.arange(3)
      %timeit np.sum(arr)
      %timeit sum(arr)
      
      哪一个更快

      解决方案:Python函数的性能优于numpy解决方案:

      # 10000 loops, best of 3: 21.9 µs per loop  <- numpy
      # 100000 loops, best of 3: 6.27 µs per loop <- python
      
      def fun_func(tim, prec, values):
          x = tim[:, np.newaxis]
          x = x * prec
          x = np.sin(x)
          x = np.sum(x, axis=1)
          return x
      
      Line #      Hits         Time  Per Hit   % Time  Line Contents
      ==============================================================
           1                                           def broadcast_solution(tim, prec, values):
           2         1           37     37.0      0.0      x = tim[:, np.newaxis]
           3         1      1783345 1783345.0    13.9      x = x * prec
           4         1      9879333 9879333.0    77.1      x = np.sin(x)
           5         1      1153789 1153789.0     9.0      x = np.sum(x, axis=1)
           6         1           11     11.0      0.0      return x
      
      
      Line #      Hits         Time  Per Hit   % Time  Line Contents
      ==============================================================
           8                                           def loop_solution(tim, prec, values):
           9     10001        62502      6.2      0.5      for i, ti in enumerate(tim):
          10     10000      1287698    128.8     10.5          x = prec * ti
          11     10000      9758633    975.9     79.7          x = np.sin(x)
          12     10000      1058995    105.9      8.6          x = np.sum(x)
          13     10000        75760      7.6      0.6          values[i] = x
      
      95%用于循环内部,我甚至将循环体拆分为几个部分来验证这一点:

      def fun_func(tim, prec, values):
          for i, ti in enumerate(tim):
              x = prec * ti
              x = np.sin(x)
              x = np.sum(x)
              values[i] = x
      %lprun -f fun_func fun_func(tim, prec, values)
      Line #      Hits         Time  Per Hit   % Time  Line Contents
      ==============================================================
           1                                           def fun_func(tim, prec, values):
           2       101          609      6.0      3.5      for i, ti in enumerate(tim):
           3       100         4521     45.2     26.3          x = prec * ti
           4       100         4646     46.5     27.0          x = np.sin(x)
           5       100         6731     67.3     39.1          x = np.sum(x)
           6       100          714      7.1      4.1          values[i] = x
      
      这里的时间消费者是
      np.multiply
      np.sin
      np.sum
      ,您可以通过比较每次呼叫的时间和开销来轻松检查:

      arr = np.ones(1, float)
      %timeit np.sum(arr)
      # 10000 loops, best of 3: 22.6 µs per loop
      
      因此,只要与计算运行时相比,计算函数调用开销较小,您就会有类似的运行时。即使有100个项目,您也非常接近开销时间。诀窍在于知道他们在哪一点上收支平衡。对于1000个项目,呼叫开销仍然很大:

      %lprun -f fun_func fun_func(tim, prec, values)
      Line #      Hits         Time  Per Hit   % Time  Line Contents
      ==============================================================
           1                                           def fun_func(tim, prec, values):
           2      1001         5864      5.9      2.4      for i, ti in enumerate(tim):
           3      1000        42817     42.8     17.2          x = prec * ti
           4      1000       119327    119.3     48.0          x = np.sin(x)
           5      1000        73313     73.3     29.5          x = np.sum(x)
           6      1000         7287      7.3      2.9          values[i] = x
      
      但是使用
      分辨率=5000
      时,与运行时相比,开销非常低:

      Line #      Hits         Time  Per Hit   % Time  Line Contents
      ==============================================================
           1                                           def fun_func(tim, prec, values):
           2      5001        29412      5.9      0.9      for i, ti in enumerate(tim):
           3      5000       388827     77.8     11.6          x = prec * ti
           4      5000      2442460    488.5     73.2          x = np.sin(x)
           5      5000       441337     88.3     13.2          x = np.sum(x)
           6      5000        36187      7.2      1.1          values[i] = x
      
      当您在每个
      np.sin
      通话中花费500美元时,您不再关心20美元的开销

      需要注意的是:
      line\u profiler
      可能包括每条线路的额外开销,也可能包括每个函数调用的额外开销,因此函数调用开销变得可忽略的点可能更低

      您的广播解决方案 我从分析第一个解决方案开始,让我们对第二个解决方案也这样做:

      # 10000 loops, best of 3: 21.9 µs per loop  <- numpy
      # 100000 loops, best of 3: 6.27 µs per loop <- python
      
      def fun_func(tim, prec, values):
          x = tim[:, np.newaxis]
          x = x * prec
          x = np.sin(x)
          x = np.sum(x, axis=1)
          return x
      
      Line #      Hits         Time  Per Hit   % Time  Line Contents
      ==============================================================
           1                                           def broadcast_solution(tim, prec, values):
           2         1           37     37.0      0.0      x = tim[:, np.newaxis]
           3         1      1783345 1783345.0    13.9      x = x * prec
           4         1      9879333 9879333.0    77.1      x = np.sin(x)
           5         1      1153789 1153789.0     9.0      x = np.sum(x, axis=1)
           6         1           11     11.0      0.0      return x
      
      
      Line #      Hits         Time  Per Hit   % Time  Line Contents
      ==============================================================
           8                                           def loop_solution(tim, prec, values):
           9     10001        62502      6.2      0.5      for i, ti in enumerate(tim):
          10     10000      1287698    128.8     10.5          x = prec * ti
          11     10000      9758633    975.9     79.7          x = np.sin(x)
          12     10000      1058995    105.9      8.6          x = np.sum(x)
          13     10000        75760      7.6      0.6          values[i] = x
      
      再次使用分辨率为100的测线仪:

      def fun_func(tim, prec, values):
          for i, ti in enumerate(tim):
              values[i] = np.sum(np.sin(prec * ti))
      %lprun -f fun_func fun_func(tim, prec, values)
      Line #      Hits         Time  Per Hit   % Time  Line Contents
      ==============================================================
           1                                           def fun_func(tim, prec, values):
           2       101          752      7.4      5.7      for i, ti in enumerate(tim):
           3       100        12449    124.5     94.3          values[i] = np.sum(np.sin(prec * ti))
      
      %lprun -f fun_func fun_func(tim, prec, values)
      Line #      Hits         Time  Per Hit   % Time  Line Contents
      ==============================================================
           1                                           def fun_func(tim, prec, values):
           2         1           27     27.0      0.5      x = tim[:, np.newaxis]
           3         1          638    638.0     12.9      x = x * prec
           4         1         3963   3963.0     79.9      x = np.sin(x)
           5         1          326    326.0      6.6      x = np.sum(x, axis=1)
           6         1            4      4.0      0.1      return x
      
      Line #      Hits         Time  Per Hit   % Time  Line Contents
      ==============================================================
           1                                           def fun_func(tim, prec, values):
           2         1           28     28.0      0.0      x = tim[:, np.newaxis]
           3         1        17716  17716.0     14.6      x = x * prec
           4         1        91174  91174.0     75.3      x = np.sin(x)
           5         1        12140  12140.0     10.0      x = np.sum(x, axis=1)
           6         1           10     10.0      0.0      return x
      
      Line #      Hits         Time  Per Hit   % Time  Line Contents
      ==============================================================
           1                                           def fun_func(tim, prec, values):
           2         1           34     34.0      0.0      x = tim[:, np.newaxis]
           3         1       333685 333685.0     11.1      x = x * prec
           4         1      2391812 2391812.0    79.6      x = np.sin(x)
           5         1       280832 280832.0      9.3      x = np.sum(x, axis=1)
           6         1           14     14.0      0.0      return x
      
      这已经大大超过了开销时间,因此我们比循环快了10倍

      我还对
      分辨率=1000
      进行了分析:

      def fun_func(tim, prec, values):
          for i, ti in enumerate(tim):
              values[i] = np.sum(np.sin(prec * ti))
      %lprun -f fun_func fun_func(tim, prec, values)
      Line #      Hits         Time  Per Hit   % Time  Line Contents
      ==============================================================
           1                                           def fun_func(tim, prec, values):
           2       101          752      7.4      5.7      for i, ti in enumerate(tim):
           3       100        12449    124.5     94.3          values[i] = np.sum(np.sin(prec * ti))
      
      %lprun -f fun_func fun_func(tim, prec, values)
      Line #      Hits         Time  Per Hit   % Time  Line Contents
      ==============================================================
           1                                           def fun_func(tim, prec, values):
           2         1           27     27.0      0.5      x = tim[:, np.newaxis]
           3         1          638    638.0     12.9      x = x * prec
           4         1         3963   3963.0     79.9      x = np.sin(x)
           5         1          326    326.0      6.6      x = np.sum(x, axis=1)
           6         1            4      4.0      0.1      return x
      
      Line #      Hits         Time  Per Hit   % Time  Line Contents
      ==============================================================
           1                                           def fun_func(tim, prec, values):
           2         1           28     28.0      0.0      x = tim[:, np.newaxis]
           3         1        17716  17716.0     14.6      x = x * prec
           4         1        91174  91174.0     75.3      x = np.sin(x)
           5         1        12140  12140.0     10.0      x = np.sum(x, axis=1)
           6         1           10     10.0      0.0      return x
      
      Line #      Hits         Time  Per Hit   % Time  Line Contents
      ==============================================================
           1                                           def fun_func(tim, prec, values):
           2         1           34     34.0      0.0      x = tim[:, np.newaxis]
           3         1       333685 333685.0     11.1      x = x * prec
           4         1      2391812 2391812.0    79.6      x = np.sin(x)
           5         1       280832 280832.0      9.3      x = np.sum(x, axis=1)
           6         1           14     14.0      0.0      return x
      
      并且
      精度=5000

      def fun_func(tim, prec, values):
          for i, ti in enumerate(tim):
              values[i] = np.sum(np.sin(prec * ti))
      %lprun -f fun_func fun_func(tim, prec, values)
      Line #      Hits         Time  Per Hit   % Time  Line Contents
      ==============================================================
           1                                           def fun_func(tim, prec, values):
           2       101          752      7.4      5.7      for i, ti in enumerate(tim):
           3       100        12449    124.5     94.3          values[i] = np.sum(np.sin(prec * ti))
      
      %lprun -f fun_func fun_func(tim, prec, values)
      Line #      Hits         Time  Per Hit   % Time  Line Contents
      ==============================================================
           1                                           def fun_func(tim, prec, values):
           2         1           27     27.0      0.5      x = tim[:, np.newaxis]
           3         1          638    638.0     12.9      x = x * prec
           4         1         3963   3963.0     79.9      x = np.sin(x)
           5         1          326    326.0      6.6      x = np.sum(x, axis=1)
           6         1            4      4.0      0.1      return x
      
      Line #      Hits         Time  Per Hit   % Time  Line Contents
      ==============================================================
           1                                           def fun_func(tim, prec, values):
           2         1           28     28.0      0.0      x = tim[:, np.newaxis]
           3         1        17716  17716.0     14.6      x = x * prec
           4         1        91174  91174.0     75.3      x = np.sin(x)
           5         1        12140  12140.0     10.0      x = np.sum(x, axis=1)
           6         1           10     10.0      0.0      return x
      
      Line #      Hits         Time  Per Hit   % Time  Line Contents
      ==============================================================
           1                                           def fun_func(tim, prec, values):
           2         1           34     34.0      0.0      x = tim[:, np.newaxis]
           3         1       333685 333685.0     11.1      x = x * prec
           4         1      2391812 2391812.0    79.6      x = np.sin(x)
           5         1       280832 280832.0      9.3      x = np.sum(x, axis=1)
           6         1           14     14.0      0.0      return x
      
      1000大小更快,但正如我们在那里看到的,在循环解决方案中,调用开销仍然是不可忽略的。但是对于
      resolution=5000
      来说,每个步骤花费的时间几乎相同(有些慢一些,有些快一些,但总体上非常相似)

      另一个影响是,当您进行乘法运算时,实际的广播。即使使用非常智能的numpy解决方案,这仍然包括一些额外的计算。对于
      resolution=10000
      您可以看到广播乘法相对于循环解决方案开始占用更多的“%time”:

      # 10000 loops, best of 3: 21.9 µs per loop  <- numpy
      # 100000 loops, best of 3: 6.27 µs per loop <- python
      
      def fun_func(tim, prec, values):
          x = tim[:, np.newaxis]
          x = x * prec
          x = np.sin(x)
          x = np.sum(x, axis=1)
          return x
      
      Line #      Hits         Time  Per Hit   % Time  Line Contents
      ==============================================================
           1                                           def broadcast_solution(tim, prec, values):
           2         1           37     37.0      0.0      x = tim[:, np.newaxis]
           3         1      1783345 1783345.0    13.9      x = x * prec
           4         1      9879333 9879333.0    77.1      x = np.sin(x)
           5         1      1153789 1153789.0     9.0      x = np.sum(x, axis=1)
           6         1           11     11.0      0.0      return x
      
      
      Line #      Hits         Time  Per Hit   % Time  Line Contents
      ==============================================================
           8                                           def loop_solution(tim, prec, values):
           9     10001        62502      6.2      0.5      for i, ti in enumerate(tim):
          10     10000      1287698    128.8     10.5          x = prec * ti
          11     10000      9758633    975.9     79.7          x = np.sin(x)
          12     10000      1058995    105.9      8.6          x = np.sum(x)
          13     10000        75760      7.6      0.6          values[i] = x
      

      但除了实际花费的时间外,还有另一件事:内存消耗。循环解决方案需要
      O(n)
      内存,因为您总是处理
      n
      元素。然而,广播解决方案需要
      O(n*n)
      内存。如果在循环中使用
      resolution=20000
      ,您可能需要等待一段时间,但它仍然只需要
      8bytes/element*20000 element~=160kB
      ,但在广播中您需要
      ~3GB
      。这忽略了常数因子(比如临时数组或中间数组)!如果你再往前走,你的内存会很快用完


      是时候再次总结要点了:

      • 如果对numpy数组中的单个项执行python循环,那么就错了
      • 如果在numpy数组的子数组上循环,请确保每个循环中的函数调用开销与在函数中花费的时间相比是可忽略的
      • 如果广播numpy数组,请确保内存没有用完
      但优化最重要的一点仍然是:

      • 只有当代码太慢时才优化它!如果速度太慢,则仅在分析代码之后进行优化

      • 不要盲目相信简化的语句,也不要在没有分析的情况下进行优化


      最后一个想法: