Python Tensorflow渐变对于自定义层中的偏移不存在_Python_Tensorflow_Keras

Python Tensorflow渐变对于自定义层中的偏移不存在

python tensorflow keras

Python Tensorflow渐变对于自定义层中的偏移不存在,python,tensorflow,keras,Python,Tensorflow,Keras,我在Tensorflow中建立了一个输入凸神经网络，这是一个标量输出前馈模型。第一个隐藏层是密集的，后续层是定制的，接受两个输入：前一层的输出（内核）和模型输入（直通）。对每个应用单独的权重。这允许将正权重正则化器应用于内核权重，但不应用于传递。我计算正则化器并使用自定义层的call方法添加它。我还使用了自定义激活函数，它们是平方泄漏ReLU和泄漏ReLU 当我训练这个网络时，我能够为第一个密集层中的偏差计算一个梯度，但我得到一个警告，即自定义层中的偏差不存在梯度。当我将@tf.function

我在Tensorflow中建立了一个输入凸神经网络，这是一个标量输出前馈模型。第一个隐藏层是密集的，后续层是定制的，接受两个输入：前一层的输出（内核）和模型输入（直通）。对每个应用单独的权重。这允许将正权重正则化器应用于内核权重，但不应用于传递。我计算正则化器并使用自定义层的

call

方法添加它。我还使用了自定义激活函数，它们是平方泄漏ReLU和泄漏ReLU

当我训练这个网络时，我能够为第一个密集层中的偏差计算一个梯度，但我得到一个警告，即自定义层中的偏差不存在梯度。当我将

@tf.function

添加到激活函数时，警告消失，但梯度为0。此外，

loss.numpy（）

在我使用

@tf.function

并在本地Jupyter笔记本（但不是在Colab中）中运行时会抛出一个错误

你知道为什么稠密层存在偏置梯度而非自定义层，以及如何计算所有层的偏置梯度吗？本文提供了一个简单的工作示例。非常感谢

下面是我的自定义图层。它与标准的致密层非常相似

class DensePartiallyConstrained(Layer):
    '''
    A custom layer inheriting from `tf.keras.layers.Layers` class.
    This class is a fully-connected layer with two inputs. This allows
    for different constraints on the weights of each input. This enables
    a passthrough of the inputs to each hidden layer to have no
    weight constraints while the input from the previous layer can have
    a positive constraint. It also allows for different initializations
    of the weight values for each input.

    Most of this code and documentation was borrowed from the
    `tf.keras.layers.Dense` documentation on Github (thanks!).
    '''
    def __init__(self,
                 units,
                 activation = None,
                 use_bias = True,
                 kernel_initializer = 'glorot_uniform',
                 passthrough_initializer = 'glorot_uniform',
                 bias_initializer = 'zeros',
                 kernel_constraint = None,
                 passthrough_constraint = None,
                 bias_constraint = None,
                 activity_regularizer = None,
                 regularizer_constant = 1.0,
                 **kwargs):

        if 'input_shape' not in kwargs and 'input_dim' in kwargs:
            kwargs['input_shape'] = (kwargs.pop('input_dim'),)

        super(DensePartiallyConstrained, self).__init__(
                activity_regularizer = regularizers.get(activity_regularizer), **kwargs)

        self.units = int(units)
        self.activation = activations.get(activation)
        self.use_bias = use_bias
        self.kernel_initializer = initializers.get(kernel_initializer)
        self.passthrough_initializer = initializers.get(passthrough_initializer)
        self.bias_initializer = initializers.get(bias_initializer)
        self.kernel_constraint = constraints.get(kernel_constraint)
        self.passthrough_constraint = constraints.get(passthrough_constraint)
        self.bias_constraint = constraints.get(bias_constraint)

        # This is for add_loss in call() method
        self.regularizer_constant = regularizer_constant

        # What does this do?
        self.supports_masking = True

        self.kernel_input_spec = InputSpec(min_ndim=2)
        self.passthrough_input_spec = InputSpec(min_ndim=2)


    def build(self, input_shape):
        # Input shapes provided as list [kernel, passthrough]
        kernel_input_shape, passthrough_input_shape = input_shape

        # Check for proper datatype
        dtype = dtypes.as_dtype(self.dtype or K.floatx())
        if not (dtype.is_floating or dtype.is_complex):
          raise TypeError('Unable to build `DensePartiallyConstrained` layer with non-floating point '
                          'dtype %s' % (dtype,))

        # Check kernel input dimensions
        kernel_input_shape = tensor_shape.TensorShape(kernel_input_shape)
        if tensor_shape.dimension_value(kernel_input_shape[-1]) is None:
          raise ValueError('The last dimension of the inputs to `DensePartiallyConstrained` '
                           'should be defined. Found `None`.')
        kernel_last_dim = tensor_shape.dimension_value(kernel_input_shape[-1])
        self.kernel_input_spec = InputSpec(min_ndim=2,
                                    axes={-1: kernel_last_dim})

        # Check passthrough input dimensions
        passthrough_input_shape = tensor_shape.TensorShape(passthrough_input_shape)
        if tensor_shape.dimension_value(passthrough_input_shape[-1]) is None:
          raise ValueError('The last dimension of the inputs to `DensePartiallyConstrained` '
                           'should be defined. Found `None`.')
        passthrough_last_dim = tensor_shape.dimension_value(passthrough_input_shape[-1])
        self.passthrough_input_spec = InputSpec(min_ndim=2,
                                    axes={-1: passthrough_last_dim})

        # Add weights to kernel (between layer connections)
        self.kernel = self.add_weight(name = 'kernel',
                                      shape = [kernel_last_dim, self.units],
                                      initializer = self.kernel_initializer,
                                      constraint = self.kernel_constraint,
                                      dtype = self.dtype,
                                      trainable = True)
        # Add weight to input passthrough
        self.passthrough = self.add_weight(name = 'passthrough',
                                      shape = [passthrough_last_dim, self.units],
                                      initializer = self.passthrough_initializer,
                                      constraint = self.passthrough_constraint,
                                      dtype = self.dtype,
                                      trainable = True)
        # Add weights to bias
        if self.use_bias:
            self.bias = self.add_weight(name = 'bias',
                                        shape = [self.units,],
                                        initializer = self.bias_initializer,
                                        constraint = self.bias_constraint,
                                        dtype = self.dtype,
                                        trainable = True)
        else:
            self.bias = None

        self.built = True

        super(DensePartiallyConstrained, self).build(input_shape)


    def call(self, inputs):
        # Inputs provided as list [kernel, passthrough]
        kernel_input, passthrough_input = inputs

        # Calculate weights regularizer
        self.add_loss(self.regularizer_constant * tf.reduce_sum(tf.square(tf.math.maximum(tf.negative(self.kernel), 0.0))))

        # Calculate layer output
        outputs = tf.add(tf.matmul(kernel_input, self.kernel), tf.matmul(passthrough_input, self.passthrough))

        if self.use_bias:
            outputs = tf.add(outputs, self.bias)

        if self.activation is not None:
            return self.activation(outputs)
        return outputs

和我的激活功能：

#@tf.function
def squared_leaky_ReLU(x, alpha = 0.2):
    return tf.square(tf.maximum(x, alpha * x))
#@tf.function
def leaky_ReLU(x, alpha = 0.2):
    return tf.maximum(x, alpha * x)

编辑： 通过tensorflow更新，我现在可以在激活函数中使用

@tf.function

时访问

loss.numpy（）

。这将为所有自定义层中的偏移返回0个渐变

我开始认为，自定义层中的偏差项缺少梯度可能与我的损失函数有关：哪里

仅对自定义层内核中的权重进行正则化。

g（x）

的损失基于相对于输入的梯度，因此它不包含任何关于偏差的信息（

f（x）

中的偏差正常更新）。尽管如此，如果是这种情况，我不明白为什么

g（y）

的第一个隐藏密集层中的偏差会被更新？除了

f（x）

对内核权重有正约束外，网络是相同的。

当运行

model.summary（）

时，您能在输出中看到您的自定义层吗？是的，我能在

model.summary（）

中看到自定义层，以及在使用Graphviz打印时。此外，第一个隐藏的密集层权重在训练期间更新，并且它仅通过自定义层连接。当您运行

model.summary（）

时，您能在输出中看到自定义层吗？是的，我能在

model.summary（）

中看到自定义层，也能在使用Graphviz打印时看到自定义层。此外，第一个隐藏的密集层权重在训练期间更新，并且它仅通过自定义层连接。