Python 在两个长度不等的数组上完成并行操作的优雅方法

Python 在两个长度不等的数组上完成并行操作的优雅方法,python,arrays,performance,numpy,numba,Python,Arrays,Performance,Numpy,Numba,我想在numba中编写一个函数,在两个数组上运行一个数学运算,并在两个数组的元素计数不相同时进行调整 例如:假设我想要一个函数,将数组a的每个元素添加到数组b的元素中,有以下3种可能的情况: 1) a和b都有相同数量的项目,确实c[ii]=a[ii]+b[ii] 2) a比b有更多的项目:在b的上限之前,完成c[ii]=a[ii]+b[-1] 3) a的项目少于b:在a的上限之前,执行c[ii]=a[ii]+b[ii],并完成c[ii]=a[-1]+b[ii] 为此,我编写了下面的代码,它在处理

我想在
numba
中编写一个函数,在两个数组上运行一个数学运算,并在两个数组的元素计数不相同时进行调整

例如:假设我想要一个函数,将数组
a
的每个元素添加到数组
b
的元素中,有以下3种可能的情况:

1)
a
b
都有相同数量的项目,确实
c[ii]=a[ii]+b[ii]

2)
a
b
有更多的项目:在
b
的上限之前,完成
c[ii]=a[ii]+b[-1]

3)
a
的项目少于
b
:在
a
的上限之前,执行
c[ii]=a[ii]+b[ii]
,并完成
c[ii]=a[-1]+b[ii]

为此,我编写了下面的代码,它在处理数百万个值时运行良好且速度很快,但我可以清楚地看到三个几乎相同的代码块,它们感觉非常浪费。另外,在循环中运行的
if/else
也感觉很糟糕

from numba import jit, float64, int32

@jit(float64[:](float64[:], float64[:]), nopython=True)
def add(a, b):

    # Both shapes are equal: add between a[i] and b[i]
    if a.shape[0] == b.shape[0]:
        c = np.empty(a.shape)

        for i in range(a.shape[0]):
            c[i] = a[i] + b[i]

        return c

    # a has more entries than b: add between a[i] and b[i] until b.shape[0]-1 is reached.
    # finish the loop with add between a[i] and b[-1]
    elif a.shape[0] > b.shape[0]:
        c = np.empty(a.shape)
        i_ = b.shape[0]-1 # upper limit of b's shape

        for i in range(a.shape[0]):
            if i < b.shape[0]:
                c[i] = a[i] + b[i]
            else:
                c[i] = a[i] + b[i_]

        return c

    # b has more entries than a: add between a[i] and b[i] until a.shape[0]-1 is reached.
    # finish the loop with add between a[-1] and b[i]    
    else:
        c = np.empty(b.shape)
        i_ = a.shape[0]-1 # upper limit of a's shape

        for i in range(b.shape[0]):
            if i < a.shape[0]:
                c[i] = a[i] + b[i]
            else:
                c[i] = a[i_] + b[i]

        return c
从numba导入jit、float64、int32
@jit(float64[:](float64[:],float64[:]),nopython=True)
def添加(a、b):
#两种形状相等:在a[i]和b[i]之间相加
如果a.shape[0]==b.shape[0]:
c=np.空(a.形)
对于范围内的i(a.shape[0]):
c[i]=a[i]+b[i]
返回c
#a比b有更多的条目:在a[i]和b[i]之间添加,直到达到b.shape[0]-1。
#使用a[i]和b[-1]之间的add完成循环
elif a.shape[0]>b.shape[0]:
c=np.空(a.形)
i_=b.形状[0]-1#b.形状的上限
对于范围内的i(a.shape[0]):
如果i
我不熟悉
numba
jit
python代码编译,所以这可能是我想要的“最有效的方法”

但是,如果有一种更优雅的方式来做到这一点而不牺牲速度,我很想知道如何做到

但我可以清楚地看到三个几乎相同的代码块,它们感觉非常浪费

是的,你在代码中重复了很多次。另一方面,很容易看出每个案例的作用

您只需使用两个循环即可:

import numba as nb

@nb.njit(nb.float64[:](nb.float64[:], nb.float64[:]))
def add2(a, b):
    size1, size2 = a.shape[0], b.shape[0]
    maxsize, minsize = max(size1, size2), min(size1, size2)
    c = np.empty(maxsize)

    # Calculate the elements which are present in a and b
    for idx in range(minsize):
        c[idx] = a[idx] + b[idx]

    # Check which array is longer and which fillvalue should be applied
    if size1 > size2:
        missing = a
        filler = b[-1]
    else:
        missing = b
        filler = a[-1]

    # Calculate the elements after a or b ended. In case they have equal lengths
    # the range is of length 0 so it won't enter.
    for idx in range(minsize, maxsize):
        c[idx] = missing[idx] + filler

    return c
重复次数少了很多,但可能没有那么清楚

另外,在循环中运行的if
/
else
也让人感觉很糟糕

from numba import jit, float64, int32

@jit(float64[:](float64[:], float64[:]), nopython=True)
def add(a, b):

    # Both shapes are equal: add between a[i] and b[i]
    if a.shape[0] == b.shape[0]:
        c = np.empty(a.shape)

        for i in range(a.shape[0]):
            c[i] = a[i] + b[i]

        return c

    # a has more entries than b: add between a[i] and b[i] until b.shape[0]-1 is reached.
    # finish the loop with add between a[i] and b[-1]
    elif a.shape[0] > b.shape[0]:
        c = np.empty(a.shape)
        i_ = b.shape[0]-1 # upper limit of b's shape

        for i in range(a.shape[0]):
            if i < b.shape[0]:
                c[i] = a[i] + b[i]
            else:
                c[i] = a[i] + b[i_]

        return c

    # b has more entries than a: add between a[i] and b[i] until a.shape[0]-1 is reached.
    # finish the loop with add between a[-1] and b[i]    
    else:
        c = np.empty(b.shape)
        i_ = a.shape[0]-1 # upper limit of a's shape

        for i in range(b.shape[0]):
            if i < a.shape[0]:
                c[i] = a[i] + b[i]
            else:
                c[i] = a[i_] + b[i]

        return c

事实上,它并没有看上去那么糟糕,因为分支预测使得这个
if
非常便宜。只要两个数组仍有元素,并且只有在一个数组耗尽时才切换到
False
(并在此后保持
False
),则为
True
。这对你的计算机来说很容易预测,因此这张支票将非常便宜(几乎是免费的)。

一夜之间,我意识到我能做的就是动态剪辑指数:

@njit(float64[:](float64[:], float64[:]))
def add_clamped(a,b):
    # Find the maximum indices to use for clipping purposes
    max_a, max_b = a.shape[0]-1, b.shape[0]-1
    maxsize = max(a.shape[0], b.shape[0])
    c = np.empty(maxsize)    

    # Run throught the arrays and clip indices on the fly
    for idx in range(maxsize):
        idx_a = min(idx, max_a)
        idx_b = min(idx, max_b)

        # Do some crazy expensive math here
        c[idx] = a[idx_a] + b[idx_b]    

    return c
作为测试,我比较了1000多万条记录的算法,结果如下:

add_original:  0.01952 seconds
add_MSeifert:  0.02058 seconds
add_clamped:   0.02562 seconds

因此,速度不如@MSeifert的答案快,但将代码保持为1个循环,并将所有核心数学保持在一个位置(当执行比添加2个数组更复杂的操作时)。

if/else
循环在编译的
c
样式代码中是正常的。这只在纯Python环境中是不好的。非常感谢您的建议!昨晚我突然想到,我还可以在飞行中剪辑索引。我在我的问题中添加了代码,请告诉我您的想法。@Fnord您可能应该添加该代码作为另一个答案。这也是一个聪明的想法,但我会在循环外执行
max
,而只在循环内执行
min
!你还让我意识到根本不需要
max