在CUDA中，如何检测块中的所有线程都没有调用_syncthreads（）？_Cuda_Synchronization

在CUDA中，如何检测块中的所有线程都没有调用_syncthreads（）？

cuda synchronization

在CUDA中，如何检测块中的所有线程都没有调用_syncthreads（）？,cuda,synchronization,Cuda,Synchronization,我在CUDA中遇到了一个奇怪的、难以重现的问题，这个问题最终涉及到未定义的行为。我希望线程0在共享内存中设置一些值，所有线程都应该使用这些值 __shared__ bool p; p = false; if (threadIdx.x == 0) p = true; __syncthreads(); assert(p); 现在资产负债表；失败似乎是随机的，因为我将代码铲了一遍，并对其进行了注释以发现问题我在以下未定义的行为背景下有效地使用了此结构： #include <assert.h&

我在CUDA中遇到了一个奇怪的、难以重现的问题，这个问题最终涉及到未定义的行为。我希望线程0在共享内存中设置一些值，所有线程都应该使用这些值

__shared__ bool p;
p = false;
if (threadIdx.x == 0) p = true;
__syncthreads();
assert(p);

现在资产负债表；失败似乎是随机的，因为我将代码铲了一遍，并对其进行了注释以发现问题

我在以下未定义的行为背景下有效地使用了此结构：

#include <assert.h>

__global__ void test() {
    if (threadIdx.x == 0) __syncthreads(); // call __syncthreads in thread 0 only: this is a very bad idea
    // everything below may exhibit undefined behaviour


    // If the above __syncthreads runs only in thread 0, this will fail for all threads not in the first warp
    __shared__ bool p;
    p = false;
    if (threadIdx.x == 0) p = true;
    __syncthreads();
    assert(p);
}

int main() {
    test << <1, 32 + 1 >> > (); // nothing happens if you have only one warp, so we use one more thread
    cudaDeviceSynchronize();
    return 0;
}

早期的_usynchthreads只由一个线程访问，当然隐藏在某些函数中，因此很难找到。在我的安装sm50，GTX980上，这个内核没有像广告中那样死锁。。。对于第一个扭曲之外的所有线程，断言都失败

TL；博士

有没有标准的方法来检测块中所有线程都没有调用_syncthreads？也许我缺少一些调试器设置

我可以构造自己的非常慢的checked\uu syncthreads，它可以使用原子和全局内存来检测情况，但我更希望有一个标准的解决方案。

在原始代码中有一个线程数据竞争条件。线程0可能前进到并执行p=true，但在此之后，另一个线程可能根本没有前进，仍然会返回p=false行，覆盖结果

对于这个特定示例，最简单的修复方法就是只让线程0写入p，类似于

__shared__ bool p;
if (threadIdx.x == 0) p = true; 
__syncthreads();
assert(p);

尝试阅读cuda memcheck的synccheck部分。我没有意识到这一点，谢谢。这让我想知道为什么在UB不调用初始同步线程的情况下我不会有问题。它可能会改变日程安排，否则您提到的竞争就不会出现。