Cuda 了解nvprof事件的结果”;l2“写入”扇区“未命中”;及;l2“子项1”写入“扇区未命中”;

Cuda 了解nvprof事件的结果”;l2“写入”扇区“未命中”;及;l2“子项1”写入“扇区未命中”;,cuda,nvidia,profiler,Cuda,Nvidia,Profiler,通过阅读博文,我能够理解“l2_subp0_read_sector_misses”和“l2_subp1_read_sector_misses”。现在我有一个关于事件“l2_subp0_write_sector_misses”和“l2_subp1_write_sector_misses”的类似问题 让我们首先以给定链接(矢量添加)中的相同示例为例 内核代码: __global__ void AddVectors(const float* A, const float* B, float* C, i

通过阅读博文,我能够理解“l2_subp0_read_sector_misses”和“l2_subp1_read_sector_misses”。现在我有一个关于事件“l2_subp0_write_sector_misses”和“l2_subp1_write_sector_misses”的类似问题

让我们首先以给定链接(矢量添加)中的相同示例为例

内核代码:

__global__ void AddVectors(const float* A, const float* B, float* C, int N)
{
    int blockStartIndex  = blockIdx.x * blockDim.x * N;
    int threadStartIndex = blockStartIndex + threadIdx.x;
    int threadEndIndex   = threadStartIndex + ( N * blockDim.x );
    int i;

    for( i=threadStartIndex; i<threadEndIndex; i+=blockDim.x ){
        C[i] = A[i] + B[i];
    }
}
\uuuu全局\uuuuu无效添加向量(常量浮点*A、常量浮点*B、浮点*C、整数N)
{
int blockStartIndex=blockIdx.x*blockDim.x*N;
int threadStartIndex=blockStartIndex+threadIdx.x;
int threadEndIndex=threadStartIndex+(N*blockDim.x);
int i;

对于(i=threadStartIndex;i发现二级缓存是直写缓存,因此对二级缓存的所有写访问都报告为二级未命中