Python 在两个长度不等的数组上完成并行操作的优雅方法_Python_Arrays_Performance_Numpy_Numba

Python 在两个长度不等的数组上完成并行操作的优雅方法

python arrays performance numpy

Python 在两个长度不等的数组上完成并行操作的优雅方法,python,arrays,performance,numpy,numba,Python,Arrays,Performance,Numpy,Numba,我想在numba中编写一个函数，在两个数组上运行一个数学运算，并在两个数组的元素计数不相同时进行调整例如：假设我想要一个函数，将数组a的每个元素添加到数组b的元素中，有以下3种可能的情况： 1） a和b都有相同数量的项目，确实c[ii]=a[ii]+b[ii] 2） a比b有更多的项目：在b的上限之前，完成c[ii]=a[ii]+b[-1] 3） a的项目少于b：在a的上限之前，执行c[ii]=a[ii]+b[ii]，并完成c[ii]=a[-1]+b[ii] 为此，我编写了下面的代码，它在处理

我想在

numba

中编写一个函数，在两个数组上运行一个数学运算，并在两个数组的元素计数不相同时进行调整

例如：假设我想要一个函数，将数组

的每个元素添加到数组

的元素中，有以下3种可能的情况：

1）

和

都有相同数量的项目，确实

c[ii]=a[ii]+b[ii]

2）

比

有更多的项目：在

的上限之前，完成

c[ii]=a[ii]+b[-1]

3）

的项目少于

：在

的上限之前，执行

c[ii]=a[ii]+b[ii]

，并完成

c[ii]=a[-1]+b[ii]

为此，我编写了下面的代码，它在处理数百万个值时运行良好且速度很快，但我可以清楚地看到三个几乎相同的代码块，它们感觉非常浪费。另外，在循环中运行的

if/else

也感觉很糟糕

from numba import jit, float64, int32

@jit(float64[:](float64[:], float64[:]), nopython=True)
def add(a, b):

    # Both shapes are equal: add between a[i] and b[i]
    if a.shape[0] == b.shape[0]:
        c = np.empty(a.shape)

        for i in range(a.shape[0]):
            c[i] = a[i] + b[i]

        return c

    # a has more entries than b: add between a[i] and b[i] until b.shape[0]-1 is reached.
    # finish the loop with add between a[i] and b[-1]
    elif a.shape[0] > b.shape[0]:
        c = np.empty(a.shape)
        i_ = b.shape[0]-1 # upper limit of b's shape

        for i in range(a.shape[0]):
            if i < b.shape[0]:
                c[i] = a[i] + b[i]
            else:
                c[i] = a[i] + b[i_]

        return c

    # b has more entries than a: add between a[i] and b[i] until a.shape[0]-1 is reached.
    # finish the loop with add between a[-1] and b[i]    
    else:
        c = np.empty(b.shape)
        i_ = a.shape[0]-1 # upper limit of a's shape

        for i in range(b.shape[0]):
            if i < a.shape[0]:
                c[i] = a[i] + b[i]
            else:
                c[i] = a[i_] + b[i]

        return c

从numba导入jit、float64、int32
@jit（float64[：]（float64[：]，float64[：]），nopython=True）
def添加（a、b）：
#两种形状相等：在a[i]和b[i]之间相加
如果a.shape[0]==b.shape[0]：
c=np.空（a.形）
对于范围内的i（a.shape[0]）：
c[i]=a[i]+b[i]
返回c
#a比b有更多的条目：在a[i]和b[i]之间添加，直到达到b.shape[0]-1。
#使用a[i]和b[-1]之间的add完成循环
elif a.shape[0]>b.shape[0]：
c=np.空（a.形）
i_=b.形状[0]-1#b.形状的上限
对于范围内的i（a.shape[0]）：
如果i


我不熟悉numba
和jit
python代码编译，所以这可能是我想要的“最有效的方法”
但是，如果有一种更优雅的方式来做到这一点而不牺牲速度，我很想知道如何做到
但我可以清楚地看到三个几乎相同的代码块，它们感觉非常浪费
是的，你在代码中重复了很多次。另一方面，很容易看出每个案例的作用
您只需使用两个循环即可：
import numba as nb

@nb.njit(nb.float64[:](nb.float64[:], nb.float64[:]))
def add2(a, b):
    size1, size2 = a.shape[0], b.shape[0]
    maxsize, minsize = max(size1, size2), min(size1, size2)
    c = np.empty(maxsize)

    # Calculate the elements which are present in a and b
    for idx in range(minsize):
        c[idx] = a[idx] + b[idx]

    # Check which array is longer and which fillvalue should be applied
    if size1 > size2:
        missing = a
        filler = b[-1]
    else:
        missing = b
        filler = a[-1]

    # Calculate the elements after a or b ended. In case they have equal lengths
    # the range is of length 0 so it won't enter.
    for idx in range(minsize, maxsize):
        c[idx] = missing[idx] + filler

    return c

重复次数少了很多，但可能没有那么清楚
另外，在循环中运行的if

else

也让人感觉很糟糕

from numba import jit, float64, int32

@jit(float64[:](float64[:], float64[:]), nopython=True)
def add(a, b):

    # Both shapes are equal: add between a[i] and b[i]
    if a.shape[0] == b.shape[0]:
        c = np.empty(a.shape)

        for i in range(a.shape[0]):
            c[i] = a[i] + b[i]

        return c

    # a has more entries than b: add between a[i] and b[i] until b.shape[0]-1 is reached.
    # finish the loop with add between a[i] and b[-1]
    elif a.shape[0] > b.shape[0]:
        c = np.empty(a.shape)
        i_ = b.shape[0]-1 # upper limit of b's shape

        for i in range(a.shape[0]):
            if i < b.shape[0]:
                c[i] = a[i] + b[i]
            else:
                c[i] = a[i] + b[i_]

        return c

    # b has more entries than a: add between a[i] and b[i] until a.shape[0]-1 is reached.
    # finish the loop with add between a[-1] and b[i]    
    else:
        c = np.empty(b.shape)
        i_ = a.shape[0]-1 # upper limit of a's shape

        for i in range(b.shape[0]):
            if i < a.shape[0]:
                c[i] = a[i] + b[i]
            else:
                c[i] = a[i_] + b[i]

        return c

事实上，它并没有看上去那么糟糕，因为分支预测使得这个

if

非常便宜。只要两个数组仍有元素，并且只有在一个数组耗尽时才切换到

False

（并在此后保持

False

），则为

True

。这对你的计算机来说很容易预测，因此这张支票将非常便宜（几乎是免费的）。

一夜之间，我意识到我能做的就是动态剪辑指数：

@njit(float64[:](float64[:], float64[:]))
def add_clamped(a,b):
    # Find the maximum indices to use for clipping purposes
    max_a, max_b = a.shape[0]-1, b.shape[0]-1
    maxsize = max(a.shape[0], b.shape[0])
    c = np.empty(maxsize)    

    # Run throught the arrays and clip indices on the fly
    for idx in range(maxsize):
        idx_a = min(idx, max_a)
        idx_b = min(idx, max_b)

        # Do some crazy expensive math here
        c[idx] = a[idx_a] + b[idx_b]    

    return c

作为测试，我比较了1000多万条记录的算法，结果如下：

add_original:  0.01952 seconds
add_MSeifert:  0.02058 seconds
add_clamped:   0.02562 seconds

因此，速度不如@MSeifert的答案快，但将代码保持为1个循环，并将所有核心数学保持在一个位置（当执行比添加2个数组更复杂的操作时）。

if/else

循环在编译的

样式代码中是正常的。这只在纯Python环境中是不好的。非常感谢您的建议！昨晚我突然想到，我还可以在飞行中剪辑索引。我在我的问题中添加了代码，请告诉我您的想法。@Fnord您可能应该添加该代码作为另一个答案。这也是一个聪明的想法，但我会在循环外执行

max

，而只在循环内执行

min

！你还让我意识到根本不需要

max

。