Parallel processing 用计数器替换blockId_Parallel Processing_Cuda_Atomic_Nvidia

Parallel processing 用计数器替换blockId

parallel-processing cuda

Parallel processing 用计数器替换blockId,parallel-processing,cuda,atomic,nvidia,Parallel Processing,Cuda,Atomic,Nvidia,最初我在代码中使用了blockIdx.x，但我想删除它，而是使用一个全局值，并在我的块中使用它，而不是blockIdx.x。因为我的代码太大了，当我使用大的输入大小运行它时，它会挂起，我认为这会有所帮助。我以原子方式递增计数器，但当我运行代码时，它会挂起。谁能看看我的代码，看看我是否做错了什么 __device__ int counter = 0; __global__ void kernel(int * ginput, int * goutput) { const int tid = t

最初我在代码中使用了blockIdx.x，但我想删除它，而是使用一个全局值，并在我的块中使用它，而不是blockIdx.x。因为我的代码太大了，当我使用大的输入大小运行它时，它会挂起，我认为这会有所帮助。我以原子方式递增计数器，但当我运行代码时，它会挂起。谁能看看我的代码，看看我是否做错了什么

__device__ int counter = 0;

__global__ void kernel(int * ginput, int * goutput)
{
  const int tid = threadIdx.x;
  const int id = threadIdx.x + blockIdx.x * blockDim.x;
  in myval = ginput[id];  

  if (tid == 0) {
    atomicAdd(&counter, 1);
  }

  __syncthreads();
  if (counter == 0) {
    goutput[tid] = ...;
  }
  if (counter > 0) {
   ...
  }

}

如果我在代码中使用blockIdx.x而不是counter，它可以工作，但我只想用计数器替换它

如果您希望

计数器

替换您对

blockIdx.x

的使用（即，您希望每个块都有一个从

计数器

读取的唯一值），那么类似的方法应该可以工作：

__device__ int counter = 0;

__global__ void kernel(int * ginput, int * goutput)
{
  const int tid = threadIdx.x;
  const int id = threadIdx.x + blockIdx.x * blockDim.x;
  __shared__ int my_block_id;


  if (tid == 0) {
    my_block_id = atomicAdd(&counter, 1);
  }

  __syncthreads();
  if (my_block_id == 0) {
    goutput[tid] = ...;
  }
  if (my_block_id > 0) {
   ...
  }

}

你的方法会很麻烦，因为如果你这样做：

if (counter > 5) ....

您可能正在从全局内存中读取一个新的更新值

计数器

，并且任何数量的块都可能已更新该值，因此该行为将是不可预测的