Python Numba-如何并行填充二维数组_Python_Vectorization_Numba

Python Numba-如何并行填充二维数组

python

Python Numba-如何并行填充二维数组,python,vectorization,numba,Python,Vectorization,Numba,我有一个函数，它在浮点64（x，y）上的2D矩阵上运行。基本概念：对于每一行组合（行数选择2），计算减法（行1-行2）后正值的数量。在int64（y，y）的二维矩阵中，如果该值高于某个阈值，则将该值存储在索引[row1，row2]中，如果低于某个阈值，则将该值存储在索引[row2，row1]中我已经实现了这一点，并用@njit（parallel=False）对其进行了修饰，效果很好@njit（parallel=True）似乎没有加速。为了加快整个过程，我看了一下@guvectorize，它也能

我有一个函数，它在浮点64（x，y）上的2D矩阵上运行。基本概念：对于每一行组合（行数选择2），计算减法（行1-行2）后正值的数量。在int64（y，y）的二维矩阵中，如果该值高于某个阈值，则将该值存储在索引[row1，row2]中，如果低于某个阈值，则将该值存储在索引[row2，row1]中

我已经实现了这一点，并用@njit（parallel=False）对其进行了修饰，效果很好@njit（parallel=True）似乎没有加速。为了加快整个过程，我看了一下@guvectorize，它也能起作用。然而，在这种情况下，我也不知道如何使用@guvectorize和parallel true

我看了一下，解决方案是使用@vecorize，但我无法将解决方案转移到我的问题上，因此我现在寻求帮助：）

基本JIT和矢量化实现

将numpy导入为np
从numba导入jit、GUEVECTORIZE、prange
导入时间信息
@jit（并行=错误）
def检查对（原始数据）：
#要填充的二维数组
结果=np.full（（len（原始数据），len（原始数据）），-1）
#迭代所有可能的基因组合
对于范围（0，len（原始数据））中的r1：
对于范围内的r2（r1+1，len（原始数据））：
差=np.减法（原始数据[：，r1]，原始数据[：，r2]）
num_pos=len（np.where（diff>0）[0]）
#任意检查以说明
如果num_pos>=5：
结果[r1，r2]=num_pos
其他：
结果[r2，r1]=num_pos
返回结果
@jit（并行=真）
def检查对（原始数据）：
#要填充的二维数组
结果=np.full（（len（原始数据），len（原始数据）），-1）
#迭代所有可能的基因组合
对于范围（0，len（原始数据））中的r1：
对于prange中的r2（r1+1，len（原始数据））：
差=np.减法（原始数据[：，r1]，原始数据[：，r2]）
num_pos=len（np.where（diff>0）[0]）
#任意检查以说明
如果num_pos>=5：
结果[r1，r2]=num_pos
其他：
结果[r2，r1]=num_pos
返回结果
@guvectorize（[“void（float64[：，：]，int64[：，：]）”，
“（n，m）->（m，m）”，target='cpu'）
def检查对（原始数据、结果）：
对于范围（0，len（结果））中的r1：
对于范围内的r2（r1+1，len（结果））：
差=np.减法（原始数据[：，r1]，原始数据[：，r2]）
num_pos=len（np.where（diff>0）[0]）
#任意检查以说明
如果num_pos>=5：
结果[r1，r2]=num_pos
其他：
结果[r2，r1]=num_pos
@guvectorize（[“void（float64[：，：]，int64[：，：]）”，
“（n，m）->（m，m）”，target='parallel'）
def检查对（原始数据、结果）：
对于范围（0，len（结果））中的r1：
对于范围内的r2（r1+1，len（结果））：
差=np.减法（原始数据[：，r1]，原始数据[：，r2]）
num_pos=len（np.where（diff>0）[0]）
#任意检查以说明
如果num_pos>=5：
结果[r1，r2]=num_pos
其他：
结果[r2，r1]=num_pos
如果名称=“\uuuuu main\uuuuuuuu”：
np.随机种子（404）
a=np.random.random（（512512））.astype（np.float64）
res=np.full（（len（a），len（a）），-1）

并以

%timeit检查对（a）
%timeit检查对多（a）
%timeit检查\u pairs\u guvec\u sg（a，res）
%timeit检查\u配对\u guvec\u多重（a、res）

导致：

614 ms ± 2.54 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
507 ms ± 6.87 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
622 ms ± 3.88 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
671 ms ± 4.35 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

我绞尽脑汁研究如何将其实现为@vectorized或适当的并行@guvectorize，以真正并行地填充生成的2D数组

我想这是我尝试将其进一步扩展到gpu之前的第一步

非常感谢您的帮助。

在编写代码时，请考虑其他编译语言例如，考虑一个大致相同的行实现

diff = np.subtract(raw_data[:, r1], raw_data[:, r2])
num_pos = len(np.where(diff > 0)[0])

< C++ >

伪代码

分配一个数组差异，在原始数据上循环[i*size\u dim\u 1+r1]（循环索引为i）
分配一个布尔数组，在整个数组diff上循环，并检查diff[i]>0
循环布尔数组，获取b_arr==True的索引，并通过vector:：push_back（）将其保存到向量
检查向量的大小

代码中的主要问题是：

为简单操作创建临时数组
非连续存储器存取

优化代码 删除临时数组和简化

@nb.njit(parallel=False)
def check_pairs_simp(raw_data):
    # 2D array to be filled
    result = np.full((raw_data.shape[0],raw_data.shape[1]), -1)
    
    # Iterate over all possible gene combinations
    for r1 in range(0, raw_data.shape[1]):
        for r2 in range(r1+1, raw_data.shape[1]):
            num_pos=0
            for i in range(raw_data.shape[0]):
                if (raw_data[i,r1]>raw_data[i,r2]):
                    num_pos+=1
            
            # Arbitrary check to illustrate
            if num_pos >= 5: 
               result[r1,r2] = num_pos
            else:
               result[r2,r1] = num_pos
    
    return result

@nb.njit(parallel=True,fastmath=True)
def check_pairs_simp_rev_p(raw_data_in):
    #Create a transposed array not just a view 
    raw_data=np.ascontiguousarray(raw_data_in.T)
    
    # 2D array to be filled
    result = np.full((raw_data.shape[0],raw_data.shape[1]), -1)
    
    # Iterate over all possible gene combinations
    for r1 in nb.prange(0, raw_data.shape[0]):
        for r2 in range(r1+1, raw_data.shape[0]):
            num_pos=0
            for i in range(raw_data.shape[1]):
                if (raw_data[r1,i]>raw_data[r2,i]):
                    num_pos+=1
            
            # Arbitrary check to illustrate
            if num_pos >= 5: 
               result[r1,r2] = num_pos
            else:
               result[r2,r1] = num_pos
    
    return result

删除临时阵列和简化+连续内存访问

@nb.njit(parallel=False)
def check_pairs_simp_rev(raw_data_in):
    #Create a transposed array not just a view 
    raw_data=np.ascontiguousarray(raw_data_in.T)
    
    # 2D array to be filled
    result = np.full((raw_data.shape[0],raw_data.shape[1]), -1)
    
    # Iterate over all possible gene combinations
    for r1 in range(0, raw_data.shape[0]):
        for r2 in range(r1+1, raw_data.shape[0]):
            num_pos=0
            for i in range(raw_data.shape[1]):
                if (raw_data[r1,i]>raw_data[r2,i]):
                    num_pos+=1
            
            # Arbitrary check to illustrate
            if num_pos >= 5: 
               result[r1,r2] = num_pos
            else:
               result[r2,r1] = num_pos
    
    return result

%timeit check_pairs_sg(a)
488 ms ± 8.68 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit check_pairs_simp(a)
186 ms ± 3.83 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit check_pairs_simp_rev(a)
12.1 ms ± 226 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit check_pairs_simp_rev_p(a)
5.43 ms ± 49.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

删除临时阵列和简化+连续内存访问+并行化

@nb.njit(parallel=False)
def check_pairs_simp(raw_data):
    # 2D array to be filled
    result = np.full((raw_data.shape[0],raw_data.shape[1]), -1)
    
    # Iterate over all possible gene combinations
    for r1 in range(0, raw_data.shape[1]):
        for r2 in range(r1+1, raw_data.shape[1]):
            num_pos=0
            for i in range(raw_data.shape[0]):
                if (raw_data[i,r1]>raw_data[i,r2]):
                    num_pos+=1
            
            # Arbitrary check to illustrate
            if num_pos >= 5: 
               result[r1,r2] = num_pos
            else:
               result[r2,r1] = num_pos
    
    return result

@nb.njit(parallel=True,fastmath=True)
def check_pairs_simp_rev_p(raw_data_in):
    #Create a transposed array not just a view 
    raw_data=np.ascontiguousarray(raw_data_in.T)
    
    # 2D array to be filled
    result = np.full((raw_data.shape[0],raw_data.shape[1]), -1)
    
    # Iterate over all possible gene combinations
    for r1 in nb.prange(0, raw_data.shape[0]):
        for r2 in range(r1+1, raw_data.shape[0]):
            num_pos=0
            for i in range(raw_data.shape[1]):
                if (raw_data[r1,i]>raw_data[r2,i]):
                    num_pos+=1
            
            # Arbitrary check to illustrate
            if num_pos >= 5: 
               result[r1,r2] = num_pos
            else:
               result[r2,r1] = num_pos
    
    return result

计时

@nb.njit(parallel=False)
def check_pairs_simp_rev(raw_data_in):
    #Create a transposed array not just a view 
    raw_data=np.ascontiguousarray(raw_data_in.T)
    
    # 2D array to be filled
    result = np.full((raw_data.shape[0],raw_data.shape[1]), -1)
    
    # Iterate over all possible gene combinations
    for r1 in range(0, raw_data.shape[0]):
        for r2 in range(r1+1, raw_data.shape[0]):
            num_pos=0
            for i in range(raw_data.shape[1]):
                if (raw_data[r1,i]>raw_data[r2,i]):
                    num_pos+=1
            
            # Arbitrary check to illustrate
            if num_pos >= 5: 
               result[r1,r2] = num_pos
            else:
               result[r2,r1] = num_pos
    
    return result

%timeit check_pairs_sg(a)
488 ms ± 8.68 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit check_pairs_simp(a)
186 ms ± 3.83 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit check_pairs_simp_rev(a)
12.1 ms ± 226 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit check_pairs_simp_rev_p(a)
5.43 ms ± 49.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

“我想这是我尝试将其进一步扩展到gpu之前的第一步。”。你确定在转向

numba

之前不能先摆脱嵌套的

for

循环吗？我不清楚我应该如何运行它来尝试这样做，因为你有两个

如果uuu name_uu=='\uu main_uuu'

保护，我可以使用

itertools

在另一个之后生成一个组合元组，只有一个循环，但这有什么帮助？ps：我删除了第二个mainWell，看起来你也可以使用

np.roll（）

来抵消1，而不是内部循环，但我不能说，因为我不知道函数在做什么。事实上，我很确定大部分都可以矢量化，为

循环删除一些，如果也检查。在确保numpy方法适用之前，先跳到numba，然后跳到GPU，这忽略了问题。与其使用if（raw_data[r1，i]-raw_data[r2，i]）>0
还可以使用if raw_data[r1，i]>raw_data[r2，i]
@mseifer感谢您的评论。这将为您带来另外10%的收益。非常感谢您的见解。你的回答不仅有用