Python 使用ctypes将c结构传递给函数_Python_C_Cuda

Python 使用ctypes将c结构传递给函数

python c cuda

Python 使用ctypes将c结构传递给函数,python,c,cuda,Python,C,Cuda,我试图在不添加pycuda依赖项的情况下查询CUDA设备。以下是到目前为止我得到的信息： import ctypes cudart = ctypes.cdll.LoadLibrary('libcudart.so') numDevices = ctypes.c_int() cudart.cudaGetDeviceCount(ctypes.byref(numDevices)) print 'There are', numDevices.value, 'devices.' for x in xr

我试图在不添加pycuda依赖项的情况下查询CUDA设备。以下是到目前为止我得到的信息：

import ctypes

cudart = ctypes.cdll.LoadLibrary('libcudart.so')

numDevices = ctypes.c_int()
cudart.cudaGetDeviceCount(ctypes.byref(numDevices))
print 'There are', numDevices.value, 'devices.'

for x in xrange(numDevices.value):
    properties = None # XXX What goes here?
    cudart.cudaGetDeviceProperties(ctypes.byref(properties), x)
    print properties

问题是我无法创建一个空结构来传递给cudaGetDeviceProperties（）。我想这样做：

properties = cudart.cudaDeviceProp

但这就产生了一个错误：

AttributeError: /usr/local/cuda/lib64/libcudart.so: undefined symbol: cudaDeviceProp

这里是相关的

（编辑）

多亏了@mhawke，我才有了工作。对于任何想这样做的人，我将为您省去自己键入类的工作：

class CudaDeviceProp(ctypes.Structure):
    _fields_ = [ 
            ('name', ctypes.c_char * 256),
            ('totalGlobalMem', ctypes.c_size_t),
            ('sharedMemPerBlock', ctypes.c_size_t),
            ('regsPerBlock', ctypes.c_int),
            ('warpSize', ctypes.c_int),
            ('memPitch', ctypes.c_size_t),
            ('maxThreadsPerBlock', ctypes.c_int),
            ('maxThreadsDim', ctypes.c_int * 3), 
            ('maxGridSize', ctypes.c_int * 3), 
            ('clockRate', ctypes.c_int),
            ('totalConstMem', ctypes.c_size_t),
            ('major', ctypes.c_int),
            ('minor', ctypes.c_int),
            ('textureAlignment', ctypes.c_size_t),
            ('texturePitchAlignment', ctypes.c_size_t),
            ('deviceOverlap', ctypes.c_int),
            ('multiProcessorCount', ctypes.c_int),
            ('kernelExecTimeoutEnabled', ctypes.c_int),
            ('integrated', ctypes.c_int),
            ('canMapHostMemory', ctypes.c_int),
            ('computeMode', ctypes.c_int),
            ('maxTexture1D', ctypes.c_int),
            ('maxTexture1DMipmap', ctypes.c_int),
            ('maxTexture1DLinear', ctypes.c_int),
            ('maxTexture2D', ctypes.c_int * 2), 
            ('maxTexture2DMipmap', ctypes.c_int * 2), 
            ('maxTexture2DLinear', ctypes.c_int * 3), 
            ('maxTexture2DGather', ctypes.c_int * 2), 
            ('maxTexture3D', ctypes.c_int * 3), 
            ('maxTexture3DAlt', ctypes.c_int * 3), 
            ('maxTextureCubemap', ctypes.c_int),
            ('maxTexture1DLayered', ctypes.c_int * 2), 
            ('maxTexture2DLayered', ctypes.c_int * 3), 
            ('maxTextureCubemapLayered', ctypes.c_int * 2), 
            ('maxSurface1D', ctypes.c_int),
            ('maxSurface2D', ctypes.c_int * 2), 
            ('maxSurface3D', ctypes.c_int * 3), 
            ('maxSurface1DLayered', ctypes.c_int * 2), 
            ('maxSurface2DLayered', ctypes.c_int * 3), 
            ('maxSurfaceCubemap', ctypes.c_int),
            ('maxSurfaceCubemapLayered', ctypes.c_int * 2), 
            ('surfaceAlignment', ctypes.c_size_t),
            ('concurrentKernels', ctypes.c_int),
            ('ECCEnabled', ctypes.c_int),
            ('pciBusID', ctypes.c_int),
            ('pciDeviceID', ctypes.c_int),
            ('pciDomainID', ctypes.c_int),
            ('tccDriver', ctypes.c_int),
            ('asyncEngineCount', ctypes.c_int),
            ('unifiedAddressing', ctypes.c_int),
            ('memoryClockRate', ctypes.c_int),
            ('memoryBusWidth', ctypes.c_int),
            ('l2CacheSize', ctypes.c_int),
            ('maxThreadsPerMultiProcessor', ctypes.c_int),
            ('streamPrioritiesSupported', ctypes.c_int),
            ('globalL1CacheSupported', ctypes.c_int),
            ('localL1CacheSupported', ctypes.c_int),
            ('sharedMemPerMultiprocessor', ctypes.c_size_t),
            ('regsPerMultiprocessor', ctypes.c_int),
            ('managedMemSupported', ctypes.c_int),
            ('isMultiGpuBoard', ctypes.c_int),
            ('multiGpuBoardGroupID', ctypes.c_int),
            ('singleToDoublePrecisionPerfRatio', ctypes.c_int),
            ('pageableMemoryAccess', ctypes.c_int),
            ('concurrentManagedAccess', ctypes.c_int),
            ]

您需要定义

ctypes.Structure

的子类，该子类指定

cudaDeviceProp

struct中的所有字段。然后可以将结构的实例传递给函数。请注意，您需要按照正确的顺序填写所有字段。其中一些是数组，因此需要正确声明它们

import ctypes

class CudaDeviceProp(ctypes.Structure):
    _fields_ = [('ECCEnabled', ctypes.c_int),
                ('asyncEngineCount', ctypes.c_int),
                ('canMapHostMemory', ctypes.c_int),
                ('clockRate', ctypes.c_int),
                ('computeMode', ctypes.c_int),
                ('concurrentKernels', ctypes.c_int),
                ...
                ('totalGlobalMem', ctypes.c_size_t),
                ('unifiedAddressing', ctypes.c_int),
                ('warpSize', ctypes.c_int)]

properties = CudaDeviceProp()
cudart.cudaGetDeviceProperties(ctypes.byref(properties), 0)