Python cuda内核for循环中的Break语句出现问题_Python_Cuda_Numba

Python cuda内核for循环中的Break语句出现问题

python cuda

Python cuda内核for循环中的Break语句出现问题,python,cuda,numba,Python,Cuda,Numba,我最近在玩cuda/numba代码。我有一个MxN矩阵，比如说，cumul_a，其中每一行都是累积概率分布。我想通过从均匀随机分布映射样本，从这些累积分布中抽取样本。简单来说，假设从均匀随机分布中抽取的样本为0.3。cuda内核应该选择一行“cumul_a”，并将该行的每个元素从该行的第一个元素开始与0.3进行比较。一旦得到大于0.3的值，内核应该将元素的索引存储在输出参数中，并中断for循环。我无法让这个看似简单的内核工作。break语句是否在内核中引起任何问题？下面提供了最低限度的工作示例

我最近在玩cuda/numba代码。我有一个MxN矩阵，比如说，cumul_a，其中每一行都是累积概率分布。我想通过从均匀随机分布映射样本，从这些累积分布中抽取样本。简单来说，假设从均匀随机分布中抽取的样本为0.3。cuda内核应该选择一行“cumul_a”，并将该行的每个元素从该行的第一个元素开始与0.3进行比较。一旦得到大于0.3的值，内核应该将元素的索引存储在输出参数中，并中断for循环。我无法让这个看似简单的内核工作。break语句是否在内核中引起任何问题？下面提供了最低限度的工作示例

    from __future__ import division
    from __future__ import print_function

    import numpy as np

    from numba import vectorize, cuda, jit
    np.set_printoptions(precision=4, suppress=True)

    # Number of rows
    M = 10
    # Number of columns
    N = 20

    # ======= 1-D GRIDS =======
    # Set the number of threads in a block
    threadsperblock_1d = 4
    # Calculate the number of thread blocks in the grid
    blockspergrid_1d = np.int(np.ceil(M / threadsperblock_1d))
    # ======= 1-D GRIDS =======

    @cuda.jit('void(float32[:, :], float32[:], int32[:])')
    def get_randomchoice(cumul_a, random_nos, output):
      x = cuda.grid(1) 
      if x < cumul_a.shape[0]:
        for y in range(cumul_a.shape[1]):
          if random_nos[x] > cumul_a[x, y]:
            output[x] = y
            break # return

    if __name__ == '__main__':
      # Prepare the matrix whise each row is a cumulative probability distribution
      A = np.random.rand(M, N).astype(np.float32)
      A = np.divide(A,np.sum(A,axis=1,keepdims=True))
      cumul_A = np.cumsum(A, axis=1)

      # Put an assertion that cumul_A is indeed cumulative
      assert np.allclose(cumul_A[:,-1],np.ones(M))

      # Draw values from uniform distribution
      RandValues = np.random.rand(M).astype(np.float32)

      # Output array in numpy
      Y = np.zeros(M, dtype=np.int32)
      for iStep in range(M):
        Y[iStep] = np.argwhere(RandValues[iStep] <= cumul_A[iStep])[0]

      print('From numpy:\n{}'.format(Y))

      # Transfer to GPU
      cumul_A_gpu = cuda.to_device(cumul_A)
      RandValues_gpu = cuda.to_device(RandValues)
      # Return array from GPU
      random_idx_gpu = cuda.device_array(M, dtype=np.int32)
      get_randomchoice[blockspergrid_1d, threadsperblock_1d](cumul_A_gpu, RandValues_gpu, random_idx_gpu)
      random_idx = random_idx_gpu.copy_to_host()

      print('From cuda:\n{}'.format(random_idx))

任何帮助都将不胜感激。

这是虚惊一场！代码中有个小故障。行if random_nos[x]>cumul_a[x，y]：将是if random_nos[x]