Cuda 如何分配GeForce GTX 690设备上的所有可用全局内存?

Cuda 如何分配GeForce GTX 690设备上的所有可用全局内存?,cuda,nvidia,Cuda,Nvidia,现在我需要使用cuda技术分配所有可用内存。 我使用特斯拉C2050、Quadro 600和GeForce GTX 560 Ti通过以下方式实现: 首先,我在设备上分配0字节的全局内存。第二步是通过cudaMemGetInfo函数定义设备的可用内存,并分配该可用内存。它适用于上面列出的设备。 但这种机制不适用于GeForce GTX 690 有人能帮我一下吗,我可以使用什么机制来分配GeForce GTX 690设备上的内存,或者该操作的任何范例 看起来是这样的: cudaSetDevice(d

现在我需要使用cuda技术分配所有可用内存。 我使用特斯拉C2050、Quadro 600和GeForce GTX 560 Ti通过以下方式实现: 首先,我在设备上分配0字节的全局内存。第二步是通过cudaMemGetInfo函数定义设备的可用内存,并分配该可用内存。它适用于上面列出的设备。 但这种机制不适用于GeForce GTX 690

有人能帮我一下吗,我可以使用什么机制来分配GeForce GTX 690设备上的内存,或者该操作的任何范例

看起来是这样的:

cudaSetDevice(deviceIndex);

int (*reservedMemory);

cudaMalloc(&reservedMemory, 0);

size_t freeMemory, totalMemory;

cudaMemGetInfo(&freeMemory, &totalMemory);

cudaMalloc(&reservedMemory, freeMemory);
在GeForce GTX 690上,两个现有流式多处理器中的一个在2147483648字节的内存上运行,但我只能分配1341915136字节的可用全局内存,相当于2050109440字节。 在Quadro 600上,一个现有的流式多处理器在1073414144字节的内存上运行,我可以分配所有可用的859803648字节的空闲全局内存(等于859803648字节)

以Quadro 600为例(显示了编译、链接和执行过程):

以GeForce GTX 690为例(显示了编译、链接和执行过程):

示例已存档并位于:

(z7存档-78.5 KB~80434字节) (zip存档-163 KB~167457字节)


本主题是“GeForce Lounge”和“CUDA编程与性能”主题的克隆,同名。

我可以重新运行您的示例并得出相同的结果

我试图从另一方面解决这个问题,并试图分配大小不断减小的块

int (*reservedMemory);
size_t const NBlockSize = 1300 *1024*1024; 
size_t freeMemory, totalMemory;
cudaError_t nErr = cudaSuccess;
size_t nTotalAlloc=0;
while( nErr == cudaSuccess )
{
    cudaMemGetInfo(&freeMemory, &totalMemory);
    std::cout << "===========================================================" << std::endl;
    std::cout << "Free/Total(kB): " << freeMemory/1024 << "/" << totalMemory/1024 << std::endl;

    size_t nAllocSize = NBlockSize;
    while( nAllocSize > freeMemory )
        nAllocSize /= 2;

    nErr = cudaMalloc(&reservedMemory, nAllocSize );
    if( nErr == cudaSuccess )
        nTotalAlloc += nAllocSize;
    std::cout << "AllocSize(kB): " << nAllocSize/1024 << ", error: " << cudaGetErrorString(nErr) << std::endl;

}
std::cout << "TotalAlloc/Total (kB): " << nTotalAlloc/1024 << "/" << totalMemory/1024 << std::endl;

我最近想起了cuda中的“页面锁定”机制。我对它进行了测试,但没有得到令人满意的性能结果(使用这种机制的计算速度要慢十倍,而对于使用GeForce GTX 690的Windows来说,这是一个内存保留功能非常有限的版本)。我只是认为将数据复制到设备以供以后计算和写回将自动完成,但实际上不涉及设备的内存。

您显然是在Windows主机上运行此操作。您提到的其他设备是否也在Windows主机上运行?此外,还显示了编译到Compute2.0目标的构建过程。如果您针对compute 3.0目标进行编译,其行为是否与GTX690相同?因此您可以在GTX 690子gpu之一上分配2GB,在另一个子gpu上分配大约1.3GB?如果是的话,我的第一个猜测是使用这个子gpu作为显示卡,这会占用一些内存(虽然700MB会有点多)。多亏了talonmies和GeorgT。talonmies,首先-是的,我使用Windows作为该设备的操作系统平台,其次-在上面所示的构建过程中,compute number 2.0的目标应用于Quadro 600设备,compute number 3.0的目标应用于GeForce GTX 690设备。GeorgT-不,我可以在两个使用GeForce GTX 690设备的现有流式多处理器上分配1.3*8^9.98(9.98是8)字节的全局内存。我得到了一个带有2GB内存的GTX660。使用sm_30和compute_30编译。内存大小输出通过整数除法获得。我在安装了cuda 5.0的Ubuntu11.10(amd64)上对此进行了测试,得出的结果是,前面描述的算法,首先分配零字节内存来初始化设备,然后剩下的可用内存就不工作了。我们可以保留所有可用内存,但GeForce GTX 690上没有一兆字节的内存。我想知道GeForce GTX Titan在操作系统Windows中的问题是否存在。
J:\Gdmt&gt; nvcc -arch=compute_30 -code=sm_30 -c ./Gdmt.cu -o ./Gdmt.obj
Gdmt.cu
tmpxft_000011f0_00000000-5_Gdmt.cudafe1.gpu
tmpxft_000011f0_00000000-10_Gdmt.cudafe2.gpu
Gdmt.cu
tmpxft_000011f0_00000000-5_Gdmt.cudafe1.cpp
tmpxft_000011f0_00000000-15_Gdmt.ii

J:\Gdmt&gt; nvcc ./Gdmt.obj -o ./Gdmt.exe

J:\Gdmt&gt; nvcc -arch=compute_30 -code=sm_30 -c ./Gdmt_additional.cu -o ./Gdmt_add
itional.obj
Gdmt_additional.cu
tmpxft_00001164_00000000-5_Gdmt_additional.cudafe1.gpu
tmpxft_00001164_00000000-10_Gdmt_additional.cudafe2.gpu
Gdmt_additional.cu
tmpxft_00001164_00000000-5_Gdmt_additional.cudafe1.cpp
tmpxft_00001164_00000000-15_Gdmt_additional.ii

J:\Gdmt&gt; nvcc ./Gdmt_additional.obj -o ./Gdmt_additional.exe

J:\Gdmt&gt; Gdmt.exe
Total amount of memory: 2147483648 Bytes;
Memory to reserve: 2050109440 Bytes;
Warning, memory allocation process is not succeeded!
^C
J:\Gdmt&gt; Gdmt_additional.exe
Allocation is succeeded on 1341915136 bytes of reserved memory.
^C
int (*reservedMemory);
size_t const NBlockSize = 1300 *1024*1024; 
size_t freeMemory, totalMemory;
cudaError_t nErr = cudaSuccess;
size_t nTotalAlloc=0;
while( nErr == cudaSuccess )
{
    cudaMemGetInfo(&freeMemory, &totalMemory);
    std::cout << "===========================================================" << std::endl;
    std::cout << "Free/Total(kB): " << freeMemory/1024 << "/" << totalMemory/1024 << std::endl;

    size_t nAllocSize = NBlockSize;
    while( nAllocSize > freeMemory )
        nAllocSize /= 2;

    nErr = cudaMalloc(&reservedMemory, nAllocSize );
    if( nErr == cudaSuccess )
        nTotalAlloc += nAllocSize;
    std::cout << "AllocSize(kB): " << nAllocSize/1024 << ", error: " << cudaGetErrorString(nErr) << std::endl;

}
std::cout << "TotalAlloc/Total (kB): " << nTotalAlloc/1024 << "/" << totalMemory/1024 << std::endl;
D:\Buildx64\Test\GMDT\Debug>Gdmt.exe
NBlockSize(MB): 1000
===========================================================
Free/Total(kB): 1797120/2097152
AllocSize(kB): 1024000, percentage of freememory: 0.569801, error: no error
===========================================================
Free/Total(kB): 773120/2097152
AllocSize(kB): 512000, percentage of freememory: 0.662252, error: no error
===========================================================
Free/Total(kB): 261120/2097152
AllocSize(kB): 256000, percentage of freememory: 0.980392, error: no error
===========================================================
Free/Total(kB): 5128/2097152
AllocSize(kB): 4000, percentage of freememory: 0.780031, error: no error
===========================================================
Free/Total(kB): 1032/2097152
AllocSize(kB): 1000, percentage of freememory: 0.968992, error: no error
===========================================================
Free/Total(kB): 8/2097152
AllocSize(kB): 7, percentage of freememory: 0.976563, error: out of memory
TotalAlloc/Total (kB): 1797000/2097152


D:\Buildx64\Test\GMDT\Debug>Gdmt.exe
NBlockSize(MB): 1200
===========================================================
Free/Total(kB): 1796864/2097152
AllocSize(kB): 1228800, percentage of freememory: 0.683858, error: no error
===========================================================
Free/Total(kB): 568072/2097152
AllocSize(kB): 307200, percentage of freememory: 0.540777, error: no error
===========================================================
Free/Total(kB): 260872/2097152
AllocSize(kB): 153600, percentage of freememory: 0.588795, error: no error
===========================================================
Free/Total(kB): 107272/2097152
AllocSize(kB): 76800, percentage of freememory: 0.715937, error: no error
===========================================================
Free/Total(kB): 30472/2097152
AllocSize(kB): 19200, percentage of freememory: 0.630087, error: no error
===========================================================
Free/Total(kB): 11272/2097152
AllocSize(kB): 9600, percentage of freememory: 0.851668, error: no error
===========================================================
Free/Total(kB): 1672/2097152
AllocSize(kB): 1200, percentage of freememory: 0.717703, error: no error
===========================================================
Free/Total(kB): 392/2097152
AllocSize(kB): 300, percentage of freememory: 0.765306, error: out of memory
TotalAlloc/Total (kB): 1796400/2097152

D:\Buildx64\Test\GMDT\Debug>Gdmt.exe
NBlockSize(MB): 800
===========================================================
Free/Total(kB): 1844448/2097152
AllocSize(kB): 819200, percentage of freememory: 0.444144, error: no error
===========================================================
Free/Total(kB): 1025248/2097152
AllocSize(kB): 819200, percentage of freememory: 0.799026, error: out of memory
TotalAlloc/Total (kB): 819200/2097152

D:\Buildx64\Test\GMDT\Debug>Gdmt.exe
NBlockSize(MB): 700
===========================================================
Free/Total(kB): 1835528/2097152
AllocSize(kB): 716800, percentage of freememory: 0.390514, error: no error
===========================================================
Free/Total(kB): 1118740/2097152
AllocSize(kB): 716800, percentage of freememory: 0.640721, error: no error
===========================================================
Free/Total(kB): 401940/2097152
AllocSize(kB): 358400, percentage of freememory: 0.891675, error: no error
===========================================================
Free/Total(kB): 43540/2097152
AllocSize(kB): 22400, percentage of freememory: 0.514469, error: no error
===========================================================
Free/Total(kB): 21140/2097152
AllocSize(kB): 11200, percentage of freememory: 0.529801, error: no error
===========================================================
Free/Total(kB): 9876/2097152
AllocSize(kB): 5600, percentage of freememory: 0.567031, error: no error
===========================================================
Free/Total(kB): 4244/2097152
AllocSize(kB): 2800, percentage of freememory: 0.659755, error: no error
===========================================================
Free/Total(kB): 1428/2097152
AllocSize(kB): 1400, percentage of freememory: 0.980392, error: no error
===========================================================
Free/Total(kB): 20/2097152
AllocSize(kB): 10, percentage of freememory: 0.546875, error: out of memory
TotalAlloc/Total (kB): 1835400/2097152