Python 解释3D阵列内存中的间距、宽度、高度和深度_Python_Cuda_Pycuda

Python 解释3D阵列内存中的间距、宽度、高度和深度

python cuda

Python 解释3D阵列内存中的间距、宽度、高度和深度,python,cuda,pycuda,Python,Cuda,Pycuda,我正在python中使用CUDA和3D纹理（使用pycuda）。有一个名为的函数，它具有与相同的成员以及一些额外的成员。在其中，它要求您描述诸如width\u In_bytes，src\u pitch，src\u height，height和copy\u depth。这就是我正在努力解决的问题（在3D中）及其与C或F风格索引的相关性。例如，如果我在下面的工作示例中简单地将顺序从F更改为C，它将停止工作——我不知道为什么首先，我理解间距是指在threadIdx.x（或x方向或一列）中移动一个索引

我正在python中使用CUDA和3D纹理（使用pycuda）。有一个名为的函数，它具有与相同的成员以及一些额外的成员。在其中，它要求您描述诸如

width\u In_bytes

，

src\u pitch

，

src\u height

，

height

和

copy\u depth

。这就是我正在努力解决的问题（在3D中）及其与C或F风格索引的相关性。例如，如果我在下面的工作示例中简单地将顺序从F更改为C，它将停止工作——我不知道为什么

首先，我理解间距是指在

threadIdx.x

（或x方向或一列）中移动一个索引所需的内存字节数。因此，对于C形状（3,2,4）的float32数组，要在x中移动一个值，我希望在内存中移动4个值（）。所以我的音高是4*32位

我理解

height

是行数。（本例中为3）

我理解

width

是col的数量。（本例中为2）

我理解

depth

是z切片的数量。（本例中为4）

我理解

width_in_bytes

是它后面的z元素的x中包含的行的宽度，即行切片，（0，：，：）。这就是在y方向上横穿一个元素所需的内存地址数

因此，当我在下面的代码中将顺序从F改为C，并调整代码以相应地更改高度/宽度值时，它仍然不起作用。这只是一个逻辑上的错误，让我觉得我没有正确理解音高、宽度、高度和深度的概念
请教育我
下面是一个完整的工作脚本，它将数组作为纹理复制到GPU，并将内容复制回来

import pycuda.driver as drv import pycuda.gpuarray as gpuarray import pycuda.autoinit from pycuda.compiler import SourceModule import numpy as np w = 2 h = 3 d = 4 shape = (w, h, d) a = np.arange(24).reshape(*shape,order='F').astype('float32') print(a.shape,a.strides) print(a) descr = drv.ArrayDescriptor3D() descr.width = w descr.height = h descr.depth = d descr.format = drv.dtype_to_array_format(a.dtype) descr.num_channels = 1 descr.flags = 0 ary = drv.Array(descr) copy = drv.Memcpy3D() copy.set_src_host(a) copy.set_dst_array(ary) copy.width_in_bytes = copy.src_pitch = a.strides[1] copy.src_height = copy.height = h copy.depth = d copy() mod = SourceModule(""" texture<float, 3, cudaReadModeElementType> mtx_tex; __global__ void copy_texture(float *dest) { int x = threadIdx.x; int y = threadIdx.y; int z = threadIdx.z; int dx = blockDim.x; int dy = blockDim.y; int i = (z*dy + y)*dx + x; dest[i] = tex3D(mtx_tex, x, y, z); } """) copy_texture = mod.get_function("copy_texture") mtx_tex = mod.get_texref("mtx_tex") mtx_tex.set_array(ary) dest = np.zeros(shape, dtype=np.float32, order="F") copy_texture(drv.Out(dest), block=shape, texrefs=[mtx_tex]) print(dest)

将pycuda.driver作为drv导入将pycuda.gpuarray导入为gpuarray 导入pycuda.autoinit 从pycuda.compiler导入SourceModule 将numpy作为np导入 w=2 h=3 d=4 形状=（w、h、d） a=np.arange（24）。重塑（*形状，顺序='F'）。aType（'float32'））打印（a.形状，a.步幅）印刷品（a） descr=drv.ArrayDescriptor3D（）描述宽度=w 描述高度=h 描述深度=d descr.format=drv.dtype_到_数组_格式（a.dtype） descr.num_通道=1 descr.flags=0 ary=drv.数组（描述） copy=drv.Memcpy3D（） copy.set\u src\u主机（a）复制.set_dst_数组（ary） copy.width_in_bytes=copy.src_pitch=a.strips[1] copy.src_height=copy.height=h copy.depth=d 副本（） mod=SourceModule（“”）纹理mtx_tex； __全局\无效复制\纹理（浮动*dest） { int x=threadIdx.x； int y=threadIdx.y； intz=threadIdx.z； int dx=blockDim.x； int dy=blockDim.y； int i=（z*dy+y）*dx+x； dest[i]=tex3D（mtx_tex，x，y，z）； } """) 复制纹理=mod.get纹理函数（“复制纹理”） mtx_-tex=mod.get_-texref（“mtx_-tex”） mtx_tex.set_数组（ary） dest=np.zero（shape，dtype=np.float32，order=“F”）复制纹理（drv.Out（dest），block=shape，texrefs=[mtx\u tex]）打印（目的地）
我不确定我是否完全理解您代码中的问题，但我将尝试澄清
在CUDA中，
width
（
x
）是变化最快的维度，
height
（
y
）是中间维度，
depth
（
z
）是变化最慢的维度。
pitch
指沿
y
维度在值之间跨步所需的跨距（以字节为单位）
在Numpy中，一个数组定义为
np.empty（shape=（3,2,4），dtype=np.float32，order=“C”）
具有
步幅=（32,16,4）
，对应于
宽度=4
，
高度=2
，
深度=3
，
节距=16
在Numpy中使用
“F”
排序意味着内存中的尺寸顺序颠倒
如果我进行以下更改，您的代码似乎可以工作：

#shape = (w, h, d) shape = (d, h, w) #a = np.arange(24).reshape(*shape,order='F').astype('float32') a = np.arange(24).reshape(*shape,order='C').astype('float32') ... #dest = np.zeros(shape, dtype=np.float32, order="F") dest = np.zeros(shape, dtype=np.float32, order="C") #copy_texture(drv.Out(dest), block=shape, texrefs=[mtx_tex]) copy_texture(drv.Out(dest), block=(w,h,d), texrefs=[mtx_tex])