提高Python中for循环的性能（可能使用numpy或numba）_Python_Performance_Numpy_Vectorization_Numba

提高Python中for循环的性能（可能使用numpy或numba）

python performance numpy

提高Python中for循环的性能（可能使用numpy或numba）,python,performance,numpy,vectorization,numba,Python,Performance,Numpy,Vectorization,Numba,我想改进此函数中for循环的性能 import numpy as np import random def play_game(row, n=1000000): """Play the game! This game is a kind of random walk. Arguments: row (int[]): row index to use in the p matrix for each step in the

我想改进此函数中

for

循环的性能

import numpy as np
import random

def play_game(row, n=1000000):
    """Play the game! This game is a kind of random walk.

    Arguments:
        row (int[]): row index to use in the p matrix for each step in the
                     walk. Then length of this array is the same as n.

        n (int): number of steps in the random walk
    """
    p = np.array([[ 0.499,  0.499,  0.499],
                  [ 0.099,  0.749,  0.749]])
    X0 = 100
    Y0 = X0 % 3
    X = np.zeros(n)
    tempX = X0
    Y = Y0

    for j in range(n):
        tempX = X[j] = tempX + 2 * (random.random() < p.item(row.item(j), Y)) - 1
        Y = tempX % 3

    return np.r_[X0, X]

将numpy导入为np
随机输入
def play_游戏（世界其他地区，n=1000000）：
“玩这个游戏！这个游戏是一种随机行走。
论据：
行（int[]）：要在p矩阵中为中的每个步骤使用的行索引
那么这个数组的长度等于n。
n（int）：随机行走的步数
"""
p=np.数组（[[0.499,0.499,0.499]，
[ 0.099,  0.749,  0.749]])
X0=100
Y0=X0%3
X=np.零（n）
tempX=X0
Y=Y0
对于范围（n）内的j：
tempX=X[j]=tempX+2*（random.random（）


困难在于，基于X
的值，在每一步计算Y
的值，然后在下一步中使用Y
更新X
的值
我想知道是不是有什么小把戏能带来很大的不同。使用Numba是一个公平的游戏（我尝试过，但没有多大成功）。但是，我不想使用Cython。
快速观察告诉我们函数代码中迭代之间存在数据依赖关系。现在，存在不同类型的数据依赖关系。您正在查看的数据依赖类型是索引依赖，即任何迭代中的数据选择都取决于以前的迭代计算。这种依赖性似乎很难在迭代之间跟踪，所以这篇文章并不是一个真正的矢量化解决方案。相反，我们会尽可能多地预先计算循环中使用的值。基本思想是在循环内做最少的工作
下面简要说明如何进行预计算，从而获得更有效的解决方案：

鉴于p
相对较小的形状，根据输入的行
将从中提取行元素，您可以使用p[row]
从p
中预先选择所有这些行
对于每个迭代，您都在计算一个随机数。您可以将其替换为一个随机数组，您可以在循环之前设置该数组，因此，您也可以预先计算这些随机值
根据到目前为止预先计算的值，您将拥有p
中所有行的列索引。请注意，这些列索引将是一个包含所有可能的列索引的大数据数组，在我们的代码中，根据每次迭代计算只选择一个。使用每次迭代列索引，可以增加或减少X0
，以获得每次迭代的输出

实现如下所示-
randarr = np.random.rand(n)
p = np.array([[ 0.499,  0.419,  0.639],
              [ 0.099,  0.749,  0.319]])

def play_game_partvect(row,n,randarr,p):

    X0 = 100
    Y0 = X0 % 3

    signvals = 2*(randarr[:,None] < p[row]) - 1
    col_idx = (signvals + np.arange(3)) % 3

    Y = Y0
    currval = X0
    out = np.empty(n+1)
    out[0] = X0
    for j in range(n):
        currval = currval + signvals[j,Y]
        out[j+1] = currval
        Y = col_idx[j,Y]

    return out

因此，我们看到了大约7.5x+的加速，这还不错
 如果您使用的是Python2，那么使用xrange（）
而不是range（）
可能会有一点帮助。我使用的是Python3。不使用col\u idx
而只在循环中计算Y=currval%3
会更快。另外，在for循环中，使用.item（）
比使用[]订阅更快，因为返回的对象是Python标量而不是numpy标量，并且使用Python标量时算术更快。
def play_game(row,n,randarr,p):
    X0 = 100
    Y0 = X0 % 3
    X = np.zeros(n)
    tempX = X0
    Y = Y0
    for j in range(n):
        tempX = X[j] = tempX + 2 * (randarr[j] < p.item(row.item(j), Y)) - 1
        Y = tempX % 3
    return np.r_[X0, X]

In [2]: # Inputs
   ...: n = 1000
   ...: row = np.random.randint(0,2,(n))
   ...: randarr = np.random.rand(n)
   ...: p = np.array([[ 0.499,  0.419,  0.639],
   ...:               [ 0.099,  0.749,  0.319]])
   ...: 

In [3]: np.allclose(play_game_partvect(row,n,randarr,p),play_game(row,n,randarr,p))
Out[3]: True

In [4]: %timeit play_game(row,n,randarr,p)
100 loops, best of 3: 11.6 ms per loop

In [5]: %timeit play_game_partvect(row,n,randarr,p)
1000 loops, best of 3: 1.51 ms per loop

In [6]: # Inputs
   ...: n = 10000
   ...: row = np.random.randint(0,2,(n))
   ...: randarr = np.random.rand(n)
   ...: p = np.array([[ 0.499,  0.419,  0.639],
   ...:               [ 0.099,  0.749,  0.319]])
   ...: 

In [7]: np.allclose(play_game_partvect(row,n,randarr,p),play_game(row,n,randarr,p))
Out[7]: True

In [8]: %timeit play_game(row,n,randarr,p)
10 loops, best of 3: 116 ms per loop

In [9]: %timeit play_game_partvect(row,n,randarr,p)
100 loops, best of 3: 14.8 ms per loop