Java 如何使用JCuda在CPU和GPU上执行相同的功能

Java 如何使用JCuda在CPU和GPU上执行相同的功能,java,cuda,jcuda,Java,Cuda,Jcuda,我处理JCuda文档中的代码。目前,它只是在GPU上添加向量。 我应该如何重用CPU主机上的附加功能? 我知道,我必须将uuu全局uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu。我怀疑我必须使用另一个nvcc选项 我的目标是在GPU和CPU上运行相同的函数,并检查执行时间,我知道如何检查它 .cu文件使用nvcc编译-ptx file.cu-o file.pt

我处理JCuda文档中的代码。目前,它只是在GPU上添加向量。 我应该如何重用CPU主机上的附加功能? 我知道,我必须将uuu全局uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu。我怀疑我必须使用另一个nvcc选项

我的目标是在GPU和CPU上运行相同的函数,并检查执行时间,我知道如何检查它

.cu文件使用nvcc编译-ptx file.cu-o file.ptx


由于JCUDA用于与CUDA交互的API接口,在JCUDA中您无法也可能永远无法做到这一点

虽然CUDA现在可以将主机函数启动到流中,但JCUDA目前没有公开该API,而且它的工作方式与设备代码现在的工作方式不同。该限制也适用于PyCUDA和其他基于驱动程序API的框架

您可能需要使用JNI或其他方法从库中检索主机函数并以这种方式调用它

extern "C"

__global__ void add(int n, float *a, float *b, float *sum)
{
    int i = blockIdx.x * blockDim.x + threadIdx.x;
    if (i<n)
    {
        sum[i] = a[i] + b[i];
    }
}
public static void main(String[] args) {
        cuInit(0);
        CUdevice device = new CUdevice();
        cuDeviceGet(device, 0);
        CUcontext context = new CUcontext();
        cuCtxCreate(context, 0, device);

        CUmodule module = new CUmodule();
        cuModuleLoad(module, "kernels/JCudaVectorAdd.ptx");

        CUfunction function = new CUfunction();
        cuModuleGetFunction(function, module, "add");
        ...
        Pointer kernelParameters = Pointer.to(
                Pointer.to(new int[]{numElements}),
                Pointer.to(deviceInputA),
                Pointer.to(deviceInputB),
                Pointer.to(deviceOutput)
        );