Performance Tensorflow、Keras和GPU：在简单加载模型权重之前，日志显示资源耗尽错误_Performance_Memory_Tensorflow_Keras

Performance Tensorflow、Keras和GPU：在简单加载模型权重之前，日志显示资源耗尽错误

performance memory tensorflow keras

Performance Tensorflow、Keras和GPU：在简单加载模型权重之前，日志显示资源耗尽错误,performance,memory,tensorflow,keras,Performance,Memory,Tensorflow,Keras,我是Ubuntu的新手，我正在使用Keras和Tensorflow建立一台新的深度学习机器。我正在一组非常复杂的医学图像上微调VGG16。我的机器规格是：- i7-6900K CPU @ 3.20GHz × 16 GeForce GTX 1080 Ti x 4 62.8 GiB of RAM 我以前的机器是一台iMac，没有GPU，只有i7四核处理器和32GB内存。iMac运行了以下模型，尽管它花费了32个小时来完成代码如下：- img_width, img_height = 512

我是Ubuntu的新手，我正在使用Keras和Tensorflow建立一台新的深度学习机器。我正在一组非常复杂的医学图像上微调VGG16。我的机器规格是：-

i7-6900K CPU @ 3.20GHz × 16  
GeForce GTX 1080 Ti x 4
62.8 GiB of RAM

我以前的机器是一台iMac，没有GPU，只有i7四核处理器和32GB内存。iMac运行了以下模型，尽管它花费了32个小时来完成

代码如下：-

  img_width, img_height = 512, 512
  top_model_weights_path = '50435_train_uip_possible_inconsistent.h5'
  train_dir = '../../MasterHRCT/50435/Three-Classes/train'
  validation_dir =  '../../MasterHRCT/50435/Three-Classes/validation'
  nb_train_samples = 50435
  nb_validation_samples = 12600
  epochs = 200
  batch_size = 16

  datagen = ImageDataGenerator(rescale=1. / 255)
  model = applications.VGG16(include_top=False, weights='imagenet')

然后：-

generator_train = datagen.flow_from_directory(
train_dir, 
target_size=(img_width, img_height), 
shuffle=False, 
class_mode=None,
batch_size=batch_size
)  

bottleneck_features_train = model.predict_generator(
generator=generator_train, 
steps=nb_train_samples // batch_size,
verbose=1
)

np.save(file="50435_train_uip_possible_inconsistent.npy",     arr=bottleneck_features_train)
print("Completed train data")

generator_validation = datagen.flow_from_directory(
validation_dir, 
target_size=(img_width, img_height), 
shuffle=False, 
class_mode=None,
batch_size=batch_size
)  

bottleneck_features_validation = model.predict_generator(
generator=generator_validation, 
steps=nb_validation_samples // batch_size,
verbose=1
)

np.save(file="12600_validate_uip_possible_inconsistent.npy", arr=bottleneck_features_validation)
print("Completed validation data")

昨天，我运行了这段代码，速度非常快（nvidia smi建议只使用一个GPU，我相信这是TF的预期）。CPU达到最大值的56%。然后它崩溃了-出现

CUDA\u内存不足错误。所以我将批量大小降低到4。同样，它的启动速度非常快，但随后CPU跳到100%，我的系统冻结。我不得不硬重启
我今天再次尝试，第一次尝试加载ImageNet权重时出现此错误
  ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[3,3,512,512]
 [[Node: block4_conv2_2/random_uniform/RandomUniform = RandomUniform[T=DT_INT32, dtype=DT_FLOAT, seed=87654321, seed2=5932420, _device="/job:localhost/replica:0/task:0/gpu:0"](block4_conv2_2/random_uniform/shape)]]

命令行上显示：-
2017-08-08 06:13:57.937723: I tensorflow/core/common_runtime  /bfc_allocator.cc:700] Sum Total of in-use chunks: 71.99MiB
2017-08-08 06:13:57.937739: I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats: 
Limit:                    80150528
InUse:                    75491072
MaxInUse:                 80069120
NumAllocs:                     177
MaxAllocSize:             11985920

现在很明显这是一个内存问题，但是为什么它甚至不能加载权重呢。我的Mac电脑可以运行整个代码，尽管速度很慢。我应该注意到，今天早上，我确实运行过一次这段代码，但这一次，它的速度慢得可笑——比我的Mac电脑慢。我无知的观点是，有些东西正在吞噬内存，但我无法调试它……我不知道从哪里开始成为Ubuntu的新手。昨天，我想知道系统是“重置”了什么还是禁用了什么
救命啊
编辑：
我清除了jupyter笔记本中的所有变量，将批量大小降至1，然后重新加载，并设法加载权重，但在运行第一个生成器时，我得到：
    ResourceExhaustedError: OOM when allocating tensor with shape[1,512,512,64]
 [[Node: block1_conv1/convolution = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](_arg_input_1_0_0/_105, block1_conv1/kernel/read)]]

我不清楚为什么我可以在我的Mac电脑上成功地运行这个，但没有一台具有更大RAM、CPU和4 GPU的机器