Python 在两个长度不等的数组上完成并行操作的优雅方法
我想在Python 在两个长度不等的数组上完成并行操作的优雅方法,python,arrays,performance,numpy,numba,Python,Arrays,Performance,Numpy,Numba,我想在numba中编写一个函数,在两个数组上运行一个数学运算,并在两个数组的元素计数不相同时进行调整 例如:假设我想要一个函数,将数组a的每个元素添加到数组b的元素中,有以下3种可能的情况: 1) a和b都有相同数量的项目,确实c[ii]=a[ii]+b[ii] 2) a比b有更多的项目:在b的上限之前,完成c[ii]=a[ii]+b[-1] 3) a的项目少于b:在a的上限之前,执行c[ii]=a[ii]+b[ii],并完成c[ii]=a[-1]+b[ii] 为此,我编写了下面的代码,它在处理
numba
中编写一个函数,在两个数组上运行一个数学运算,并在两个数组的元素计数不相同时进行调整
例如:假设我想要一个函数,将数组a
的每个元素添加到数组b
的元素中,有以下3种可能的情况:
1) a
和b
都有相同数量的项目,确实c[ii]=a[ii]+b[ii]
2) a
比b
有更多的项目:在b
的上限之前,完成c[ii]=a[ii]+b[-1]
3) a
的项目少于b
:在a
的上限之前,执行c[ii]=a[ii]+b[ii]
,并完成c[ii]=a[-1]+b[ii]
为此,我编写了下面的代码,它在处理数百万个值时运行良好且速度很快,但我可以清楚地看到三个几乎相同的代码块,它们感觉非常浪费。另外,在循环中运行的if/else
也感觉很糟糕
from numba import jit, float64, int32
@jit(float64[:](float64[:], float64[:]), nopython=True)
def add(a, b):
# Both shapes are equal: add between a[i] and b[i]
if a.shape[0] == b.shape[0]:
c = np.empty(a.shape)
for i in range(a.shape[0]):
c[i] = a[i] + b[i]
return c
# a has more entries than b: add between a[i] and b[i] until b.shape[0]-1 is reached.
# finish the loop with add between a[i] and b[-1]
elif a.shape[0] > b.shape[0]:
c = np.empty(a.shape)
i_ = b.shape[0]-1 # upper limit of b's shape
for i in range(a.shape[0]):
if i < b.shape[0]:
c[i] = a[i] + b[i]
else:
c[i] = a[i] + b[i_]
return c
# b has more entries than a: add between a[i] and b[i] until a.shape[0]-1 is reached.
# finish the loop with add between a[-1] and b[i]
else:
c = np.empty(b.shape)
i_ = a.shape[0]-1 # upper limit of a's shape
for i in range(b.shape[0]):
if i < a.shape[0]:
c[i] = a[i] + b[i]
else:
c[i] = a[i_] + b[i]
return c
从numba导入jit、float64、int32
@jit(float64[:](float64[:],float64[:]),nopython=True)
def添加(a、b):
#两种形状相等:在a[i]和b[i]之间相加
如果a.shape[0]==b.shape[0]:
c=np.空(a.形)
对于范围内的i(a.shape[0]):
c[i]=a[i]+b[i]
返回c
#a比b有更多的条目:在a[i]和b[i]之间添加,直到达到b.shape[0]-1。
#使用a[i]和b[-1]之间的add完成循环
elif a.shape[0]>b.shape[0]:
c=np.空(a.形)
i_=b.形状[0]-1#b.形状的上限
对于范围内的i(a.shape[0]):
如果i
我不熟悉numba
和jit
python代码编译,所以这可能是我想要的“最有效的方法”
但是,如果有一种更优雅的方式来做到这一点而不牺牲速度,我很想知道如何做到
但我可以清楚地看到三个几乎相同的代码块,它们感觉非常浪费
是的,你在代码中重复了很多次。另一方面,很容易看出每个案例的作用
您只需使用两个循环即可:
import numba as nb
@nb.njit(nb.float64[:](nb.float64[:], nb.float64[:]))
def add2(a, b):
size1, size2 = a.shape[0], b.shape[0]
maxsize, minsize = max(size1, size2), min(size1, size2)
c = np.empty(maxsize)
# Calculate the elements which are present in a and b
for idx in range(minsize):
c[idx] = a[idx] + b[idx]
# Check which array is longer and which fillvalue should be applied
if size1 > size2:
missing = a
filler = b[-1]
else:
missing = b
filler = a[-1]
# Calculate the elements after a or b ended. In case they have equal lengths
# the range is of length 0 so it won't enter.
for idx in range(minsize, maxsize):
c[idx] = missing[idx] + filler
return c
重复次数少了很多,但可能没有那么清楚
另外,在循环中运行的if/else
也让人感觉很糟糕
from numba import jit, float64, int32
@jit(float64[:](float64[:], float64[:]), nopython=True)
def add(a, b):
# Both shapes are equal: add between a[i] and b[i]
if a.shape[0] == b.shape[0]:
c = np.empty(a.shape)
for i in range(a.shape[0]):
c[i] = a[i] + b[i]
return c
# a has more entries than b: add between a[i] and b[i] until b.shape[0]-1 is reached.
# finish the loop with add between a[i] and b[-1]
elif a.shape[0] > b.shape[0]:
c = np.empty(a.shape)
i_ = b.shape[0]-1 # upper limit of b's shape
for i in range(a.shape[0]):
if i < b.shape[0]:
c[i] = a[i] + b[i]
else:
c[i] = a[i] + b[i_]
return c
# b has more entries than a: add between a[i] and b[i] until a.shape[0]-1 is reached.
# finish the loop with add between a[-1] and b[i]
else:
c = np.empty(b.shape)
i_ = a.shape[0]-1 # upper limit of a's shape
for i in range(b.shape[0]):
if i < a.shape[0]:
c[i] = a[i] + b[i]
else:
c[i] = a[i_] + b[i]
return c
事实上,它并没有看上去那么糟糕,因为分支预测使得这个
if
非常便宜。只要两个数组仍有元素,并且只有在一个数组耗尽时才切换到False
(并在此后保持False
),则为True
。这对你的计算机来说很容易预测,因此这张支票将非常便宜(几乎是免费的)。一夜之间,我意识到我能做的就是动态剪辑指数:
@njit(float64[:](float64[:], float64[:]))
def add_clamped(a,b):
# Find the maximum indices to use for clipping purposes
max_a, max_b = a.shape[0]-1, b.shape[0]-1
maxsize = max(a.shape[0], b.shape[0])
c = np.empty(maxsize)
# Run throught the arrays and clip indices on the fly
for idx in range(maxsize):
idx_a = min(idx, max_a)
idx_b = min(idx, max_b)
# Do some crazy expensive math here
c[idx] = a[idx_a] + b[idx_b]
return c
作为测试,我比较了1000多万条记录的算法,结果如下:
add_original: 0.01952 seconds
add_MSeifert: 0.02058 seconds
add_clamped: 0.02562 seconds
因此,速度不如@MSeifert的答案快,但将代码保持为1个循环,并将所有核心数学保持在一个位置(当执行比添加2个数组更复杂的操作时)。
if/else
循环在编译的c
样式代码中是正常的。这只在纯Python环境中是不好的。非常感谢您的建议!昨晚我突然想到,我还可以在飞行中剪辑索引。我在我的问题中添加了代码,请告诉我您的想法。@Fnord您可能应该添加该代码作为另一个答案。这也是一个聪明的想法,但我会在循环外执行max
,而只在循环内执行min
!你还让我意识到根本不需要max
。