在Cuda中，为什么不从设备复制到主机？_Cuda

在Cuda中，为什么不从设备复制到主机？

cuda

在Cuda中，为什么不从设备复制到主机？,cuda,Cuda,我正在通过“CUDA的例子”书的例子。下面的代码并没有给我一个答案，它应该工作。哪里错了将感谢您的帮助和回答我得到一个输出，它是在GPU上进行的计算得出的结果是：&d 按回车键停止 #include "cuda_runtime.h" #include "device_launch_parameters.h" #include <iostream> #include <stdio.h> using namespace std; __global__ void ad

我正在通过“CUDA的例子”书的例子。下面的代码并没有给我一个答案，它应该工作。哪里错了

将感谢您的帮助和回答

我得到一个输出，它是在GPU上进行的计算得出的结果是：&d 按回车键停止

#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <iostream>
#include <stdio.h>

using namespace std;

__global__ void add_integers_cuda(int a, int b, int *c)
{
    *c = a + b;
}

int main(void)
{
    int c;
    int *dev_ptr;

    cudaMalloc((void **)&dev_ptr, sizeof(int)); //allocate sizeof(int) bytes of contiguous memory in the gpu device and return the address of first byte to dev_ptr.

// call the kernel
    add_integers_cuda <<<1,1>>>(2,7,dev_ptr);

    cudaMemcpy(&c, dev_ptr, sizeof(int), cudaMemcpyDeviceToHost);

    printf("Calculation done on GPU yields the answer: &d\n",c );

    cudaFree(dev_ptr);

    printf("Press enter to stop.");
    cin.ignore(255, '\n');

    return 0;

}

#包括“cuda_runtime.h”
#包括“设备启动参数.h”
#包括
#包括
使用名称空间std；
__全局无效加整数（整数a，整数b，整数*c）
{
*c=a+b；
}
内部主（空）
{
INTC；
int*dev_ptr；
cudamaloc（（void**）&dev_ptr，sizeof（int））；//在gpu设备中分配连续内存的sizeof（int）字节，并将第一个字节的地址返回dev_ptr。
//调用内核
加上整数（2,7，dev_ptr）；
cudaMemcpy（&c、开发ptr、sizeof（int）、cudaMemcpyDeviceToHost）；
printf（“在GPU上进行的计算得出的结果为：&d\n”，c）；
cudaFree（开发项目组）；
printf（“按回车键停止”）；
cin.ignore（255，“\n”）；
返回0；
}

“

&d

不是正确的

printf

格式字符：

printf("Calculation done on GPU yields the answer: &d\n",c );

您将无法获得预期的输出

您应该改用

%d

：

printf("Calculation done on GPU yields the answer: %d\n",c );

当然，这个问题与CUDA无关

如果您只是在学习和遇到困难，您可能还希望使用

CUDA memcheck运行CUDA代码和/或使用。但是，这两种方法都不会指出上述错误。
谢谢！这真是一个愚蠢的错误！谢谢您指出。您是对的，这与CUDA无关！：）