Xcode CUDA 5.5示例在OS X 10.9上编译良好,但运行时立即出错
这是在一台配备GeForce 320M(计算能力1.2)的MacBookPro7,1上实现的。以前,使用OSX10.7.8、XCode 4.X和CUDA 5.0,CUDA代码编译并运行良好 然后,我更新到OSX10.9.2、XCode 5.1和CUDA 5.5。起初,Xcode CUDA 5.5示例在OS X 10.9上编译良好,但运行时立即出错,xcode,macos,cuda,osx-mavericks,nvidia,Xcode,Macos,Cuda,Osx Mavericks,Nvidia,这是在一台配备GeForce 320M(计算能力1.2)的MacBookPro7,1上实现的。以前,使用OSX10.7.8、XCode 4.X和CUDA 5.0,CUDA代码编译并运行良好 然后,我更新到OSX10.9.2、XCode 5.1和CUDA 5.5。起初,deviceQuery失败。我在其他地方读到5.5.28(CUDA 5.5附带的驱动程序)不支持compute capability 1.x(sm_10),但5.5.43支持。将CUDA驱动程序更新到更新的5.5.47(GPU驱动程
deviceQuery
失败。我在其他地方读到5.5.28(CUDA 5.5附带的驱动程序)不支持compute capability 1.x(sm_10),但5.5.43支持。将CUDA驱动程序更新到更新的5.5.47(GPU驱动程序版本8.24.11 310.90.9b01)后,deviceQuery
确实通过了以下输出
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce 320M"
CUDA Driver Version / Runtime Version 5.5 / 5.5
CUDA Capability Major/Minor version number: 1.2
Total amount of global memory: 253 MBytes (265027584 bytes)
( 6) Multiprocessors, ( 8) CUDA Cores/MP: 48 CUDA Cores
GPU Clock rate: 950 MHz (0.95 GHz)
Memory Clock rate: 1064 Mhz
Memory Bus Width: 128-bit
Maximum Texture Dimension Size (x,y,z) 1D=(8192), 2D=(65536, 32768), 3D=(2048, 2048, 2048)
Maximum Layered 1D Texture Size, (num) layers 1D=(8192), 512 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(8192, 8192), 512 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 16384
Warp size: 32
Maximum number of threads per multiprocessor: 1024
Maximum number of threads per block: 512
Max dimension size of a thread block (x,y,z): (512, 512, 64)
Max dimension size of a grid size (x,y,z): (65535, 65535, 1)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 256 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: Yes
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): No
Device PCI Bus ID / PCI location ID: 0 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.5, CUDA Runtime Version = 5.5, NumDevs = 1, Device0 = GeForce 320M
Result = PASS
错误代码2是cudaerrormoryallocation
,但我怀疑它以某种方式隐藏了失败的CUDA初始化
$ ./simpleCUBLAS
GPU Device 0: "GeForce 320M" with compute capability 1.2
simpleCUBLAS test running..
!!!! CUBLAS initialization error
实际错误代码是调用cublasCreate()
返回的CUBLAS\u STATUS\u NOT\u INITIALIZED
以前有人遇到过这个问题并找到了解决方法吗?提前谢谢。我猜您的内存不足了。显示管理器正在使用您的GPU,它只有256Mb的RAM。OS 10.9显示管理器和CUDA 5.5运行时的组合内存占用可能会让您几乎没有可用内存。我建议编写并运行一个小测试程序,如下所示:
#include <iostream>
int main(void)
{
size_t mfree, mtotal;
cudaSetDevice(0);
cudaMemGetInfo(&mfree, &mtotal);
std::cout << mfree << " bytes of " << mtotal << " available." << std::endl;
return cudaDeviceReset();
}
#包括
内部主(空)
{
自由尺寸,总尺寸;
cudaSetDevice(0);
cudaMemGetInfo(&mfree,&mtotal);
std::我可以试一下你的建议吗。不幸的是,cudaMemGetInfo
也返回一个错误代码2(cudaerrormoryallocation
)但是,谢谢你的想法——也许还有其他的渐进式诊断方法。我可以尝试。这很奇怪。我建议尝试我在我的问题中编辑的代码。这甚至不建立上下文,只需要使用驱动程序API。我会考虑联系Nvidia关于这一点的支持。谢谢。我尝试了你的新片段,它没有ER。rors。然后,我通过调用cuCtxCreate
,添加了一行代码来手动创建上下文。令人惊讶!这返回错误代码2。似乎没有内存来创建上下文。
#include <iostream>
int main(void)
{
size_t mfree, mtotal;
cudaSetDevice(0);
cudaMemGetInfo(&mfree, &mtotal);
std::cout << mfree << " bytes of " << mtotal << " available." << std::endl;
return cudaDeviceReset();
}
#include <iostream>
#include <cuda.h>
int main(void)
{
CUdevice d;
size_t b;
cuInit(0);
cuDeviceGet(&d, 0);
cuDeviceTotalMem(&b, d);
std::cout << "Total memory = " << b << std::endl;
return 0;
}