Cuda 对于线程组或线程块块，是否有blockIdx等价物？_Cuda

Cuda 对于线程组或线程块块，是否有blockIdx等价物？

cuda

Cuda 对于线程组或线程块块，是否有blockIdx等价物？,cuda,Cuda,我开始使用合作小组，发现自己经常希望有一种方法来取代第二行 thread_block_tile<32> tile = tiled_partition<32>(this_thread_block()); int tileId = this_thread_block().thread_rank()/tile.size(); 这将产生： Hello from tile4 rank 0: 0 Hello from tile4 rank 0: 4 Hello from tile4

我开始使用合作小组，发现自己经常希望有一种方法来取代第二行

thread_block_tile<32> tile = tiled_partition<32>(this_thread_block());
int tileId = this_thread_block().thread_rank()/tile.size();

这将产生：

Hello from tile4 rank 0: 0
Hello from tile4 rank 0: 4
Hello from tile4 rank 0: 8
Hello from tile4 rank 0: 12

这似乎与假设相符

我还有两个问题：

我的假设是否适用于所提出的计算

tileId

的方法

有没有一种更简单的方法来实现我错过的期望行为

示例用例

__device__
int someFkt(thread_block_tile<16> tile, int* data)
{
   // some stuff that works best using 16 threads
}

__global__
void some_kernel(int* data)
{
   thread_block_tile<16> tile = tiled_partition<16>(this_thread_block());
   int tileId = this_thread_block().thread_rank()/tile.size();
   int result = someFkt(tile,data+tileId*tile.size());
}

\u设备__
int someFkt（线程\块\平铺，int*数据）
{
//有些东西最好使用16个线程
}
__全球的__
使某些内核无效（int*数据）
{
thread_block_tile tile=平铺分区（此线程块（））；
int tileId=this_thread_block（）.thread_rank（）/tile.size（）；
int result=someFkt（tile，data+tileId*tile.size（））；
}

正确的是

tileId

从

变为

（此线程块（）.size（））/32

如果磁贴大小为

对于同一块中的所有线程，

tileId

确实是相同的。这些

tileId

s对于所有块也是相同的，因此所有块都具有

tileId

0,1

仅

线程\u块

其索引：

而

thread\u block

提供了以下额外的特定于块的功能：

dim3组_索引（）；//网格内的三维块索引

dim3螺纹_索引（）；//块内的三维螺纹索引

不确定这是否是您的示例用例中的输入错误：

正确答案是：

int tileId = this_thread_block().thread_rank()/16;

有一个拼写错误（16而不是32），谢谢你的帮助。更正中的

.size（）

是打字错误吗？@generic\u opto\u guy这是打字错误，谢谢。我修好了。

int tileId = this_thread_block().thread_rank()/32;

int tileId = this_thread_block().thread_rank()/16;