Xcode CUDA 5.5示例在OS X 10.9上编译良好,但运行时立即出错

Xcode CUDA 5.5示例在OS X 10.9上编译良好,但运行时立即出错,xcode,macos,cuda,osx-mavericks,nvidia,Xcode,Macos,Cuda,Osx Mavericks,Nvidia,这是在一台配备GeForce 320M(计算能力1.2)的MacBookPro7,1上实现的。以前,使用OSX10.7.8、XCode 4.X和CUDA 5.0,CUDA代码编译并运行良好 然后,我更新到OSX10.9.2、XCode 5.1和CUDA 5.5。起初,deviceQuery失败。我在其他地方读到5.5.28(CUDA 5.5附带的驱动程序)不支持compute capability 1.x(sm_10),但5.5.43支持。将CUDA驱动程序更新到更新的5.5.47(GPU驱动程

这是在一台配备GeForce 320M(计算能力1.2)的MacBookPro7,1上实现的。以前,使用OSX10.7.8、XCode 4.X和CUDA 5.0,CUDA代码编译并运行良好

然后,我更新到OSX10.9.2、XCode 5.1和CUDA 5.5。起初,
deviceQuery
失败。我在其他地方读到5.5.28(CUDA 5.5附带的驱动程序)不支持compute capability 1.x(sm_10),但5.5.43支持。将CUDA驱动程序更新到更新的5.5.47(GPU驱动程序版本8.24.11 310.90.9b01)后,
deviceQuery
确实通过了以下输出

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce 320M"
  CUDA Driver Version / Runtime Version          5.5 / 5.5
  CUDA Capability Major/Minor version number:    1.2
  Total amount of global memory:                 253 MBytes (265027584 bytes)
  ( 6) Multiprocessors, (  8) CUDA Cores/MP:     48 CUDA Cores
  GPU Clock rate:                                950 MHz (0.95 GHz)
  Memory Clock rate:                             1064 Mhz
  Memory Bus Width:                              128-bit
  Maximum Texture Dimension Size (x,y,z)         1D=(8192), 2D=(65536, 32768), 3D=(2048, 2048, 2048)
  Maximum Layered 1D Texture Size, (num) layers  1D=(8192), 512 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(8192, 8192), 512 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       16384 bytes
  Total number of registers available per block: 16384
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1024
  Maximum number of threads per block:           512
  Max dimension size of a thread block (x,y,z): (512, 512, 64)
  Max dimension size of a grid size    (x,y,z): (65535, 65535, 1)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             256 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            Yes
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      No
  Device PCI Bus ID / PCI location ID:           0 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.5, CUDA Runtime Version = 5.5, NumDevs = 1, Device0 = GeForce 320M
Result = PASS
错误代码2是cudaerrormoryallocation,但我怀疑它以某种方式隐藏了失败的CUDA初始化

$ ./simpleCUBLAS 
GPU Device 0: "GeForce 320M" with compute capability 1.2

simpleCUBLAS test running..
!!!! CUBLAS initialization error
实际错误代码是调用
cublasCreate()
返回的CUBLAS\u STATUS\u NOT\u INITIALIZED


以前有人遇到过这个问题并找到了解决方法吗?提前谢谢。

我猜您的内存不足了。显示管理器正在使用您的GPU,它只有256Mb的RAM。OS 10.9显示管理器和CUDA 5.5运行时的组合内存占用可能会让您几乎没有可用内存。我建议编写并运行一个小测试程序,如下所示:

#include <iostream>

int main(void)
{
    size_t mfree, mtotal;

    cudaSetDevice(0);
    cudaMemGetInfo(&mfree, &mtotal);

    std::cout << mfree << " bytes of " << mtotal << " available." << std::endl;

    return cudaDeviceReset();
}
#包括
内部主(空)
{
自由尺寸,总尺寸;
cudaSetDevice(0);
cudaMemGetInfo(&mfree,&mtotal);

std::我可以试一下你的建议吗。不幸的是,
cudaMemGetInfo
也返回一个错误代码2(
cudaerrormoryallocation
)但是,谢谢你的想法——也许还有其他的渐进式诊断方法。我可以尝试。这很奇怪。我建议尝试我在我的问题中编辑的代码。这甚至不建立上下文,只需要使用驱动程序API。我会考虑联系Nvidia关于这一点的支持。谢谢。我尝试了你的新片段,它没有ER。rors。然后,我通过调用
cuCtxCreate
,添加了一行代码来手动创建上下文。令人惊讶!这返回错误代码2。似乎没有内存来创建上下文。
#include <iostream>

int main(void)
{
    size_t mfree, mtotal;

    cudaSetDevice(0);
    cudaMemGetInfo(&mfree, &mtotal);

    std::cout << mfree << " bytes of " << mtotal << " available." << std::endl;

    return cudaDeviceReset();
}
#include <iostream>
#include <cuda.h>

int main(void)
{
    CUdevice d;
    size_t b;
    cuInit(0);
    cuDeviceGet(&d, 0);
    cuDeviceTotalMem(&b, d);

    std::cout << "Total memory = " << b << std::endl;

    return 0;
}