Python 使用索引数组与numpy中的迭代_Python_Numpy

Python 使用索引数组与numpy中的迭代

python numpy

Python 使用索引数组与numpy中的迭代,python,numpy,Python,Numpy,我有许多大约50K个元素的numpy数组。我想比较它们，只使用它们中的某些位置（平均10%），以及性能问题。这看起来是索引数组的一个很好的用例。我可以编写以下代码： def equal_1（数组1、数组2、索引）： return（array1[index]==array2[index]）。all（）：这在实践中很快，但它对每个数组的所有索引都迭代一次我也可以使用另一种方法： def equal_2（数组1、数组2、索引）：对于索引中的i：如果阵列1[i]！=array2[i]：返回错

我有许多大约50K个元素的numpy数组。我想比较它们，只使用它们中的某些位置（平均10%），以及性能问题。这看起来是索引数组的一个很好的用例。我可以编写以下代码：

def equal_1（数组1、数组2、索引）：
return（array1[index]==array2[index]）。all（）：

这在实践中很快，但它对每个数组的所有索引都迭代一次

我也可以使用另一种方法：

def equal_2（数组1、数组2、索引）：
对于索引中的i：
如果阵列1[i]！=array2[i]：
返回错误
返回真值

这只会迭代数组，直到找到差异为止

我为我的用例对这两种方法进行了基准测试

在相等的数组中，或者差异在末尾的数组中，索引数组函数的速度大约快30倍。当数组开头有差异时，第二个函数的速度大约快30倍

有没有一种方法可以两全其美（numpy speed+second function laziness）？

出于您的目的，您可能需要使用来自

numba

的即时编译器

@jit

import numpy as np
from numba import jit

a1 = np.arange(50000)
a2 = np.arange(50000)
# set some values to evaluation as false
a2[40000:45000] = 1
indices = np.random.choice(np.arange(50000), replace=False, size=5000)
indices.sort()

def equal_1(array1, array2, index):
    return (array1[index] == array2[index]).all()

def equal_2(array1, array2, index):
    for i in index:
        if array1[i] != array2[i]:
            return False
    return True

@jit  #just as this decorator to your function
def equal_3(array1, array2, index):
    for i in index:
        if array1[i] != array2[i]:
            return False
    return True

测试：

In [44]: %%timeit -n10 -r1
    ...: equal_1(a1,a2,indices)
    ...:
10 loops, best of 1: 72.6 µs per loop

In [45]: %%timeit -n10 -r1
    ...: equal_2(a1,a2,indices)
    ...:
10 loops, best of 1: 657 µs per loop

In [46]: %%timeit -n10 -r1
    ...: equal_3(a1,a2,indices)
    ...:
10 loops, best of 1: 7.65 µs per loop

只需添加

@jit

就可以在python操作中获得大约100倍的速度。

索引是什么样子的？它本身是一个数组吗？正如@pshep123所提到的：对于集合，您应该使用复数变量名以使代码更清晰。您需要在编译代码级别利用“短路”求值。Numpy在执行诸如min/max和

np.nan

@pshep123之类的操作时会这样做，

index

将是一组对array1和array2有效的索引。在我的典型用例中，它包含5K到10K之间的元素。记录在案，在我的测试中，从equal_1到equal_3的速度比使用numba、jit的版本快3到4倍。一点也不坏。