Python 手动查看模型变量时,为什么tf.GradientTape()的GPU内存使用量较少?
因此,当我使用tf.GradientTape()自动监控resnet模型中的可训练变量时,计算机抛出了一个内存不足错误。代码如下:Python 手动查看模型变量时,为什么tf.GradientTape()的GPU内存使用量较少?,python,tensorflow,automatic-differentiation,gradienttape,Python,Tensorflow,Automatic Differentiation,Gradienttape,因此,当我使用tf.GradientTape()自动监控resnet模型中的可训练变量时,计算机抛出了一个内存不足错误。代码如下: x_mini = preprocess_input(x_train) with tf.GradientTape() as tape: outputs = model(x_mini, training=True) 但是,如果我禁用自动监视器并手动观察可训练变量,我可以输入更大的数据,而不会出现任何内存问题。代码如下: x_mini = prepr
x_mini = preprocess_input(x_train)
with tf.GradientTape() as tape:
outputs = model(x_mini, training=True)
但是,如果我禁用自动监视器并手动观察可训练变量,我可以输入更大的数据,而不会出现任何内存问题。代码如下:
x_mini = preprocess_input(x_train)
with tf.GradientTape(watch_accessed_variables=False) as tape:
tape.watch(model.trainable_variables)
outputs = model(x_mini, training=True)
我想知道当我手动操作时,磁带是否遗漏了一些变量
下面是可运行代码(如果您评论选项1,内存不足错误将显示):我使用Tesla T4
15G GPU和tensorflow 2.3
import tensorflow as tf
import numpy as np
from keras.models import Model
import keras.layers as ly
x_train = tf.convert_to_tensor(np.random.randint(0, 255, (900,224,224,3)), dtype=tf.dtypes.float32)
y_train = tf.convert_to_tensor([0,1,0], dtype=tf.dtypes.float32)
print(x_train.shape)
tf.keras.backend.clear_session()
resnet_model = tf.keras.applications.resnet.ResNet50(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
resnet_model.trainable = False
inputs = tf.keras.Input(shape=(224, 224, 3))
x = resnet_model(inputs, training=False)
x = ly.GlobalAveragePooling2D()(x)
x = ly.Dropout(0.2)(x)
outputs = ly.Dense(3, activation='softmax')(x)
model = Model(inputs, outputs)
mcross = tf.keras.losses.categorical_crossentropy
macc = tf.keras.metrics.categorical_accuracy
base_learning_rate = 0.0001
optimizer = tf.keras.optimizers.Adam(base_learning_rate)
def cross_entropy(y_true, y_pred):
y_pred = y_pred / tf.reduce_sum(y_pred, 1, True)
y_pred = tf.clip_by_value(y_pred, 1e-3, 1-1e-3)
return -tf.reduce_sum(y_true*tf.math.log(y_pred), 1)
# option 1
# manually tapping variables
with tf.GradientTape(watch_accessed_variables=False) as tape:
tape.watch(model.trainable_variables)
y_pred = model(x_train, training=True)
loss = cross_entropy(y_train, tf.reduce_mean(y_pred, 0, keepdims=True))
gradients = tape.gradient(loss, model.trainable_variables)
#option 2
# automatically tapping variable
with tf.GradientTape() as tape:
y_pred = model(x_train, training=True)
loss = cross_entropy(y_train, tf.reduce_mean(y_pred, 0, keepdims=True))
gradients = tape.gradient(loss, model.trainable_variables)
还将显示错误消息:
--------------------------------------------------------------------------- ResourceExhaustedError Traceback (most recent call last) <ipython-input-4-42e45caeae41> in <module>
31 # automatically tapping variable
32 with tf.GradientTape() as tape:
---> 33 y_pred = model(x_train, training=True)
34 loss = cross_entropy(y_train, tf.reduce_mean(y_pred, 0, keepdims=True))
35 gradients = tape.gradient(loss, model.trainable_variables)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py in __call__(self, *args, **kwargs)
983
984 with ops.enable_auto_cast_variables(self._compute_dtype_object):
--> 985 outputs = call_fn(inputs, *args, **kwargs)
986
987 if self._activity_regularizer:
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/functional.py in call(self, inputs, training, mask)
384 """
385 return self._run_internal_graph(
--> 386 inputs, training=training, mask=mask)
387
388 def compute_output_shape(self, input_shape):
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/functional.py in _run_internal_graph(self, inputs, training, mask)
506
507 args, kwargs = node.map_arguments(tensor_dict)
--> 508 outputs = node.layer(*args, **kwargs)
509
510 # Update tensor_dict.
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py in __call__(self, *args, **kwargs)
983
984 with ops.enable_auto_cast_variables(self._compute_dtype_object):
--> 985 outputs = call_fn(inputs, *args, **kwargs)
986
987 if self._activity_regularizer:
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/functional.py in call(self, inputs, training, mask)
384 """
385 return self._run_internal_graph(
--> 386 inputs, training=training, mask=mask)
387
388 def compute_output_shape(self, input_shape):
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/functional.py in _run_internal_graph(self, inputs, training, mask)
506
507 args, kwargs = node.map_arguments(tensor_dict)
--> 508 outputs = node.layer(*args, **kwargs)
509
510 # Update tensor_dict.
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py in __call__(self, *args, **kwargs)
983
984 with ops.enable_auto_cast_variables(self._compute_dtype_object):
--> 985 outputs = call_fn(inputs, *args, **kwargs)
986
987 if self._activity_regularizer:
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/layers/convolutional.py in call(self, inputs)
245 inputs = array_ops.pad(inputs, self._compute_causal_padding(inputs))
246
--> 247 outputs = self._convolution_op(inputs, self.kernel)
248
249 if self.use_bias:
/opt/conda/lib/python3.7/site-packages/tensorflow/python/util/dispatch.py in wrapper(*args, **kwargs)
199 """Call target, and fall back on dispatchers if there is a TypeError."""
200 try:
--> 201 return target(*args, **kwargs)
202 except (TypeError, ValueError):
203 # Note: convert_to_eager_tensor currently raises a ValueError, not a
/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/nn_ops.py in convolution_v2(input, filters, strides, padding, data_format, dilations, name) 1016 data_format=data_format, 1017 dilations=dilations,
-> 1018 name=name) 1019 1020
/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/nn_ops.py in convolution_internal(input, filters, strides, padding, data_format, dilations, name, call_from_convolution, num_spatial_dims) 1146 data_format=data_format, 1147 dilations=dilations,
-> 1148 name=name) 1149 else: 1150 if channel_index == 1:
/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/nn_ops.py in _conv2d_expanded_batch(input, filters, strides, padding, data_format, dilations, name) 2590 data_format=data_format, 2591 dilations=dilations,
-> 2592 name=name) 2593 return squeeze_batch_dims( 2594 input,
/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/gen_nn_ops.py in conv2d(input, filter, strides, padding, use_cudnn_on_gpu, explicit_paddings, data_format, dilations, name)
936 return _result
937 except _core._NotOkStatusException as e:
--> 938 _ops.raise_from_not_ok_status(e, name)
939 except _core._FallbackException:
940 pass
/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/ops.py in raise_from_not_ok_status(e, name) 6841 message = e.message + (" name: " + name if name is not None else "") 6842 # pylint: disable=protected-access
-> 6843 six.raise_from(core._status_to_exception(e.code, message), None) 6844 # pylint: enable=protected-access 6845
/opt/conda/lib/python3.7/site-packages/six.py in raise_from(value, from_value)
ResourceExhaustedError: OOM when allocating tensor with shape[900,56,56,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:Conv2D]
-------------------------------------------------------------中的ResourceExhaustedError回溯(最近一次调用)
31#自动攻丝变量
32将tf.GradientTape()作为磁带:
--->33 y_pred=模型(x_列车,列车=真实)
34损失=交叉熵(y\u序列,tf.reduce\u平均值(y\u pred,0,keepdims=True))
35梯度=磁带梯度(损失、模型、可训练变量)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py in __调用(self,*args,**kwargs)
983
984带操作。启用自动转换变量(自计算类型对象):
-->985输出=呼叫(输入,*args,**kwargs)
986
987如果自活动正则化器:
/调用中的opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/functional.py(self、input、training、mask)
384 """
385返回自运行内部图(
-->386输入,训练=训练,掩码=掩码)
387
388 def计算输出形状(自身、输入形状):
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/functional.py in\u run\u internal\u图(自我、输入、培训、掩码)
506
507 args,kwargs=node.map_参数(张量dict)
-->508输出=节点层(*args,**kwargs)
509
510#更新张量dict。
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py in __调用(self,*args,**kwargs)
983
984带操作。启用自动转换变量(自计算类型对象):
-->985输出=呼叫(输入,*args,**kwargs)
986
987如果自活动正则化器:
/调用中的opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/functional.py(self、input、training、mask)
384 """
385返回自运行内部图(
-->386输入,训练=训练,掩码=掩码)
387
388 def计算输出形状(自身、输入形状):
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/functional.py in\u run\u internal\u图(自我、输入、培训、掩码)
506
507 args,kwargs=node.map_参数(张量dict)
-->508输出=节点层(*args,**kwargs)
509
510#更新张量dict。
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py in __调用(self,*args,**kwargs)
983
984带操作。启用自动转换变量(自计算类型对象):
-->985输出=呼叫(输入,*args,**kwargs)
986
987如果自活动正则化器:
/调用中的opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/layers/convolutional.py(self,输入)
245个输入=数组操作填充(输入、自计算、因果填充(输入))
246
-->247输出=self.\u卷积\u op(输入,self.kernel)
248
249如果自我使用偏差:
/包装器中的opt/conda/lib/python3.7/site-packages/tensorflow/python/util/dispatch.py(*args,**kwargs)
199“调用目标,如果出现类型错误,则返回调度程序。”
200次尝试:
-->201返回目标(*args,**kwargs)
202除外(类型错误、值错误):
203#注意:将_转换为_急切_张量当前会引发一个值错误,而不是
/卷积2中的opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/nn_ops.py(输入、过滤、跨步、填充、数据格式、膨胀、名称)1016数据格式=数据格式,1017膨胀=膨胀,
->1018名称=名称)1019 1020
/卷积内部的opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/nn_ops.py(输入、过滤器、跨步、填充、数据格式、膨胀、名称、从卷积调用、数字空间dims)1146数据格式=数据格式,1147膨胀=膨胀,
->1148 name=name)1149 else:1150如果通道索引=1:
/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/nn_ops.py in_conv2d_expanded_batch(输入、筛选、跨步、填充、数据格式、扩展、名称)2590数据格式=数据格式,2591扩展=扩展,
->2592 name=name)2593返回压缩批处理dims(2594输入,
/conv2d中的opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/gen_nn_ops.py(输入、筛选、跨步、填充、gpu上的使用、显式填充、数据格式、扩展、名称)
936返回结果
937除_core._notokstatuseException为e外:
-->938 _ops.从_not _ok_status(e,name)升起
939除_core._FallbackException外:
940通行证
/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/ops.py in raise_from_not_ok_status(e,name)6841 message=e.message+(“name:“+name if name is not other”“)684235;pylint:disable=protected access
->6843六。将_从(核心状态_)提升到_异常(例如代码、消息、无)6844#pylint:enable=受保护访问6845
/raise\u from(value,from\u value)中的opt/conda/lib/python3.7/site-packages/six.py
ResourceExhaustedError:OOM当分配器GPU(U 0)bfc分配形状为[900,56,56256]且类型为float on/job:localhost/replica:0/task:0/device:GPU:0的张量时[Op:Conv2D]
很难做出任何结论