Python 如何将二维数组传递到pycuda中的内核中?
我找到了一个答案,但不清楚是否应该重塑阵列。在将2d数组传递给pycuda内核之前,是否需要将其重塑为1d 无需重塑2DPython 如何将二维数组传递到pycuda中的内核中?,python,arrays,cuda,pycuda,Python,Arrays,Cuda,Pycuda,我找到了一个答案,但不清楚是否应该重塑阵列。在将2d数组传递给pycuda内核之前,是否需要将其重塑为1d 无需重塑2Dgpuarray以将其传递给CUDA内核 正如我在您链接的答案中所说的,2D numpy或PyCUDA数组只是一个倾斜线性内存的分配,默认情况下按行主顺序存储。两者都有两个成员,它们告诉您访问数组所需的一切-形状和步幅。例如: In [8]: X=np.arange(0,15).reshape((5,3)) In [9]: print X.shape (5, 3) In [
gpuarray
以将其传递给CUDA内核
正如我在您链接的答案中所说的,2D numpy或PyCUDA数组只是一个倾斜线性内存的分配,默认情况下按行主顺序存储。两者都有两个成员,它们告诉您访问数组所需的一切-形状
和步幅
。例如:
In [8]: X=np.arange(0,15).reshape((5,3))
In [9]: print X.shape
(5, 3)
In [10]: print X.strides
(12, 4)
形状是不言自明的,步幅是以字节为单位的存储间距。内核代码的最佳实践是,将PyCUDA提供的指针视为使用stride
的第一个元素分配的指针,并将其视为内存中行的字节间距。一个简单的例子可能如下所示:
import pycuda.driver as drv
from pycuda.compiler import SourceModule
import pycuda.autoinit
import numpy as np
mod = SourceModule("""
__global__ void diag_kernel(float *dest, int stride, int N)
{
const int tid = threadIdx.x + blockDim.x * blockIdx.x;
if (tid < N) {
float* p = (float*)((char*)dest + tid*stride) + tid;
*p = 1.0f;
}
}
""")
diag_kernel = mod.get_function("diag_kernel")
a = np.zeros((10,10), dtype=np.float32)
a_N = np.int32(a.shape[0])
a_stride = np.int32(a.strides[0])
a_bytes = a.size * a.dtype.itemsize
a_gpu = drv.mem_alloc(a_bytes)
drv.memcpy_htod(a_gpu, a)
diag_kernel(a_gpu, a_stride, a_N, block=(32,1,1))
drv.memcpy_dtoh(a, a_gpu)
print a
$ cuda-memcheck python ./gpuarray.py
========= CUDA-MEMCHECK
[[ 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]]
========= ERROR SUMMARY: 0 errors