Python 新纪元开始时TensorFlow内存泄漏_Python_Memory_Tensorflow

Python 新纪元开始时TensorFlow内存泄漏

python memory tensorflow

Python 新纪元开始时TensorFlow内存泄漏,python,memory,tensorflow,Python,Memory,Tensorflow,我正在用TensorFlow编写一个训练脚本，对两种不同类型的图像进行分类。下面是创建数据集对象的类，该对象用于生成批处理和增量记录。在第一个纪元完成之前，它运行良好。然后在next\u batch方法中的self.\u images=self.\u images[perm]行失败。这对我来说没有意义，因为Python不应该复制self.\u图像——只需要重新排列数据 class DataSet(object): def __init__(self, images, labels, norm

我正在用TensorFlow编写一个训练脚本，对两种不同类型的图像进行分类。下面是创建数据集对象的类，该对象用于生成批处理和增量记录。在第一个纪元完成之前，它运行良好。然后在

next\u batch

方法中的

self.\u images=self.\u images[perm]

行失败。这对我来说没有意义，因为Python不应该复制self.\u图像——只需要重新排列数据

class DataSet(object):
  def __init__(self, images, labels, norm=True):
    assert images.shape[0] == labels.shape[0], (
      "images.shape: %s labels.shape: %s" % (images.shape,
                                         labels.shape))
    self._num_examples = images.shape[0]
    self._images = images
    self._labels = labels
    self._epochs_completed = 0
    self._index_in_epoch = 0
    self._norm = norm
    # Shuffle the data right away
    perm = np.arange(self._num_examples)
    np.random.shuffle(perm)
    self._images = self._images[perm]
    self._labels = self._labels[perm]
  @property
  def images(self):
    return self._images
  @property
  def labels(self):
    return self._labels
  @property
  def num_examples(self):
    return self._num_examples
  @property
  def epochs_completed(self):
    return self._epochs_completed
  def next_batch(self, batch_size):
    """Return the next `batch_size` examples from this data set."""
    start = self._index_in_epoch
    self._index_in_epoch += batch_size
    if self._index_in_epoch > self._num_examples:
      # Finished epoch
      self._epochs_completed += 1
      print("Completed epoch %d.\n"%self._epochs_completed)
      # Shuffle the data
      perm = np.arange(self._num_examples)
      np.random.shuffle(perm)
      self._images = self._images[perm] # this is where OOM happens
      self._labels = self._labels[perm]
      # Start next epoch

在普通训练周期中，内存使用不会增加。这是培训代码的一部分

data\u train\u norm

是一个

DataSet

对象

batch_size = 300
csv_plot = open("csvs/train_plot.csv","a")
for i in range(3000):
    batch = data_train_norm.next_batch(batch_size)
    if i%50 == 0:
            tce = cross_entropy.eval(feed_dict={x:batch[0],y_:batch[1],keep_prob:1.0},session=sess)
            print("\nstep %d, train ce %g"%(i,tce))
            print datetime.datetime.now()
            csv_plot.write("%d, %g\n"%(i,tce))

    train_step.run(feed_dict={x:batch[0],y_:batch[1],keep_prob:0.8},session=sess)

version = 1
saver.save(sess,'nets/cnn0nu_batch_gpu_roi_v%02d'%version)
csv_plot.close()

这可能是因为这段代码将新的

next\u batch

操作添加到图形中

for i in range(3000):
    batch = data_train_norm.next_batch(batch_size)

方法

data\u train\u norm.next\u batch

创建一个新的TensorFlow操作，因此您应该只调用它一次并使用创建的操作（在

batch

中保持）。请参阅中的示例，例如：

另外，在调试TensorFlow内存泄漏时，您可以使用

sess.graph.finalize（）

您使用的是：

dataset=dataset.shuffle（缓冲区大小）

尝试减小

缓冲区大小

。这对我有用

dataset = tf.contrib.data.Dataset.range(100)
iterator = dataset.make_one_shot_iterator()
next_element = iterator.get_next()

for i in range(100):
  value = sess.run(next_element)
  assert i == value