运行tensorflow gpu设备时，几乎没有可用的1080 ti内存分配_Tensorflow_Cuda

运行tensorflow gpu设备时，几乎没有可用的1080 ti内存分配

tensorflow cuda

运行tensorflow gpu设备时，几乎没有可用的1080 ti内存分配,tensorflow,cuda,Tensorflow,Cuda,我正在通过一个简单的测试python（matmul.py）程序测试最近购买的华硕ROG STRIX 1080 ti（11 GB）卡。虚拟环境（venv）设置如下：ubuntu=16.04，tensorflow gpu==1.5.0，python=3.6.6，CUDA==9.0，Cudnn==7.2.1 发生CUDA错误，内存不足最奇怪的是：totalMemory:10.91GiB freeMemory:61.44MiB 我不确定这是由于环境设置还是由于1080 ti本身。如果这里有任何摘录，

我正在通过一个简单的测试python（matmul.py）程序测试最近购买的华硕ROG STRIX 1080 ti（11 GB）卡。虚拟环境（venv）设置如下：ubuntu=16.04，tensorflow gpu==1.5.0，python=3.6.6，CUDA==9.0，Cudnn==7.2.1

发生CUDA错误，内存不足

最奇怪的是：totalMemory:10.91GiB freeMemory:61.44MiB

我不确定这是由于环境设置还是由于1080 ti本身。如果这里有任何摘录，我将不胜感激

终点站显示-

(venv) xx@xxxxxx:~/xx$ python matmul.py gpu 1500
2018-10-01 09:05:12.459203: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-10-01 09:05:12.514203: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:895] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-10-01 09:05:12.514445: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.607
pciBusID: 0000:01:00.0
totalMemory: 10.91GiB freeMemory: 61.44MiB
2018-10-01 09:05:12.514471: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2018-10-01 09:05:12.651207: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 11.44M (11993088 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
......

Python进程可能会在GPU上卡住。始终使用nvidia smi检查进程，并在必要时手动终止它们。

我通过限制内存使用来解决此问题：

def gpu_config():
    config = tf.ConfigProto(
        allow_soft_placement=True, log_device_placement=False)
    config.gpu_options.allow_growth = True
    config.gpu_options.allocator_type = 'BFC'

    config.gpu_options.per_process_gpu_memory_fraction = 0.8
    print("GPU memory upper bound:", upper)
    return config

然后你就可以做：

config = gpu_config()
with tf.Session(config=config) as sess:
    ....

重新启动后，我能够运行tersorflow.org的示例代码——没有内存问题

在运行用于检查1080 ti的tensorflow样本代码之前，我很难训练Mask RCNN模型，如下所示-

将cudnn 7.2.1替换为7.0.5后，不再出现资源耗尽（OOM）问题。

我在简单测试程序中添加了一行代码，如下所示，但无能为力，config.gpu_options.allow_growth=True totalMemory:10.91GiB freemory:111.06MiB..请在命令行运行

nvidia smi

以查看哪个进程占用内存？只需重新启动并重试。感谢您的建议。是的，我重新启动了，并且在两个小测试中不再出现CUDA_错误。