C++ CUDA内核不返回任何内容_C++_Cuda

C++ CUDA内核不返回任何内容

c++ cuda

C++ CUDA内核不返回任何内容,c++,cuda,C++,Cuda,我正在Visual Studio Community 2015中使用CUDA Toolkit 8。当我尝试使用NVidia的PDF手册中的简单矢量加法时（减去错误检查，我没有*.h的值），它总是以未定义的值返回，这意味着输出数组从未填充。当我用0预先填充它时，这就是我最后得到的全部其他人也有这个问题，一些人说这是由于编译错误的计算能力造成的。然而，我使用的是NVidia GTX 750 Ti，这应该是计算能力5。我尝试过编译ComputeCapability 2.0（我的SDK的最低要求）和5

我正在Visual Studio Community 2015中使用CUDA Toolkit 8。当我尝试使用NVidia的PDF手册中的简单矢量加法时（减去错误检查，我没有*.h的值），它总是以未定义的值返回，这意味着输出数组从未填充。当我用0预先填充它时，这就是我最后得到的全部

其他人也有这个问题，一些人说这是由于编译错误的计算能力造成的。然而，我使用的是NVidia GTX 750 Ti，这应该是计算能力5。我尝试过编译ComputeCapability 2.0（我的SDK的最低要求）和5.0

我也无法使任何预编译的示例正常工作，例如vectoradd.exe，它说“无法分配设备向量A（错误代码初始化错误）”，而oceanfft.exe说，“错误无法找到GLSL顶点和片段着色器！”这没有意义，因为GLSL和片段着色是非常基本的功能

我的驱动程序版本是361.43，其他应用程序，如CUDA模式下的Blender Cycles和Stellarium都能完美运行

以下是应该有效的代码：

#包括“cuda_runtime.h”
#包括“设备启动参数.h”
#包括
#包括
#包括
#定义n10
__全局无效添加（int*a、int*b、int*c）{
int tid=blockIdx.x；//处理此索引处的数据
如果（tid对于（int i=0；i这显然是由于使用了与CUDA 8工具包不兼容的驱动程序版本造成的。安装与版本8工具包一起分发的驱动程序解决了thr问题
[从评论中收集答案并添加为社区wiki条目，以将问题从CUDA标签的未回答队列中删除]可以确认代码是否正常工作。您的CUDA安装程序似乎已损坏。361.43不是CUDA 8 windows安装程序附带的驱动程序，在获得操作系统之前，您可能应该一直使用该驱动程序。我的建议是，按照中的说明重新安装CUDA 8。好的，我现在下载并尝试最新的CUDA 8.0.61工具包，并将选择安装驱动程序。我已经有了一个显示驱动程序，我一定跳过了工具包的驱动程序，因为我认为没有必要在我当前的驱动程序上安装。非常感谢驱动程序提示。我安装了8.0.61 CUDA工具包，现在我有了捆绑的376.51显示驱动程序。预编译示例es现在可以工作了，除了oceanfft.exe。我会重新启动，看看它和Visual Studio是否可以工作。非常感谢，@RobertCrovella！oceanfft.exe和Visual Studio现在工作得很好。如果您将建议作为答案，我将单击“接受”。
#include "cuda_runtime.h"
#include "device_launch_parameters.h"

#include <stdio.h>
#include <iostream>
#include <algorithm>
#define N 10

__global__ void add(int *a, int *b, int *c) {
    int tid = blockIdx.x; // handle the data at this index
    if (tid < N)
        c[tid] = a[tid] + b[tid];
}

int main(void) {
    int a[N], b[N], c[N];
    int *dev_a, *dev_b, *dev_c;
    // allocate the memory on the GPU
    cudaMalloc((void**)&dev_a, N * sizeof(int));
    cudaMalloc((void**)&dev_b, N * sizeof(int));
    cudaMalloc((void**)&dev_c, N * sizeof(int));
    // fill the arrays 'a' and 'b' on the CPU
    for (int i = 0; i<N; i++) {
        a[i] = -i;
        b[i] = i * i;
    }
    // copy the arrays 'a' and 'b' to the GPU
    cudaMemcpy(dev_a, a, N * sizeof(int),cudaMemcpyHostToDevice);
    cudaMemcpy(dev_b, b, N * sizeof(int),cudaMemcpyHostToDevice);
    add << <N, 1 >> >(dev_a, dev_b, dev_c);
    // copy the array 'c' back from the GPU to the CPU
    cudaMemcpy(c, dev_c, N * sizeof(int),cudaMemcpyDeviceToHost);
    // display the results
    for (int i = 0; i<N; i++) {
        printf("%d + %d = %d\n", a[i], b[i], c[i]);
    }
    // free the memory allocated on the GPU
    cudaFree(dev_a);
    cudaFree(dev_b);
    cudaFree(dev_c);
    return 0;
}