无法理解tensorflow keras层(tf.keras.layers.layers)中方法“build”的行为

无法理解tensorflow keras层(tf.keras.layers.layers)中方法“build”的行为,tensorflow,keras,tensorflow2.0,keras-layer,tf.keras,Tensorflow,Keras,Tensorflow2.0,Keras Layer,Tf.keras,tensorflow keras中的层有一个方法build,用于将权重创建推迟到您看到输入内容时 我有几个问题没有找到答案: 据说 如果将图层实例指定为另一个图层的属性,则外层将开始跟踪内层的权重 跟踪层的权重意味着什么 同一链接还提到 我们建议在init方法中创建这样的子层(因为子层通常有一个构建方法,它们将在构建外层时构建) 这是否意味着,在运行子类(self)的build方法时,将对self的所有属性进行迭代,并且从tf.keras.layer.layer的(实例)中发现的子类将自动运

tensorflow keras中的层有一个方法
build
,用于将权重创建推迟到您看到输入内容时

我有几个问题没有找到答案:

  • 据说 如果将图层实例指定为另一个图层的属性,则外层将开始跟踪内层的权重

  • 跟踪层的权重意味着什么

  • 同一链接还提到 我们建议在init方法中创建这样的子层(因为子层通常有一个构建方法,它们将在构建外层时构建)

  • 这是否意味着,在运行子类(self)的
    build
    方法时,将对
    self
    的所有属性进行迭代,并且从
    tf.keras.layer.layer
    的(实例)中发现的子类将自动运行其
    build
    方法

  • 我可以运行以下代码:
  • 但不是这个:

    class Net(tf.keras.Model):
      """A simple linear model."""
    
      def __init__(self):
        super(Net, self).__init__()
        self.l1 = tf.keras.layers.Dense(5)
      def build(self,input_shape):
        super().build()
      def call(self, x):
        return self.l1(x)
    
    net = Net()
    print(net.variables)
    

    为什么?

    我想说前面提到的构建意味着,例如,当您构建一个自定义的tf.keras.Model时

    net = Net()
    
    然后您将获得在
    \uuuu init\uuuu
    中创建的所有
    tf.keras.layers.Layer
    对象,并将其存储在可调用的
    net
    中。在这种情况下,它将成为TF稍后训练的完整对象,这就是它所说的跟踪。下次调用
    net(inputs)
    时,您将可以获得输出

    下面是Tensorflow自定义解码器的一个示例

    class BahdanauAttention(tf.keras.layers.Layer):
      def __init__(self, units):
        super(BahdanauAttention, self).__init__()
        self.W1 = tf.keras.layers.Dense(units)
        self.W2 = tf.keras.layers.Dense(units)
        self.V = tf.keras.layers.Dense(1)
    
      def call(self, query, values):
        # query hidden state shape == (batch_size, hidden size)
        # query_with_time_axis shape == (batch_size, 1, hidden size)
        # values shape == (batch_size, max_len, hidden size)
        # we are doing this to broadcast addition along the time axis to calculate the score
        query_with_time_axis = tf.expand_dims(query, 1)
    
        # score shape == (batch_size, max_length, 1)
        # we get 1 at the last axis because we are applying score to self.V
        # the shape of the tensor before applying self.V is (batch_size, max_length, units)
        score = self.V(tf.nn.tanh(
            self.W1(query_with_time_axis) + self.W2(values)))
    
        # attention_weights shape == (batch_size, max_length, 1)
        attention_weights = tf.nn.softmax(score, axis=1)
    
        # context_vector shape after sum == (batch_size, hidden_size)
        context_vector = attention_weights * values
        context_vector = tf.reduce_sum(context_vector, axis=1)
    
        return context_vector, attention_weights
    
    class Decoder(tf.keras.Model):
      def __init__(self, vocab_size, embedding_dim, dec_units, batch_sz):
        super(Decoder, self).__init__()
        self.batch_sz = batch_sz
        self.dec_units = dec_units
        self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
        self.gru = tf.keras.layers.GRU(self.dec_units,
                                       return_sequences=True,
                                       return_state=True,
                                       recurrent_initializer='glorot_uniform')
        self.fc = tf.keras.layers.Dense(vocab_size)
    
        # used for attention
        self.attention = BahdanauAttention(self.dec_units)
    
      def call(self, x, hidden, enc_output):
        # enc_output shape == (batch_size, max_length, hidden_size)
        context_vector, attention_weights = self.attention(hidden, enc_output)
    
        # x shape after passing through embedding == (batch_size, 1, embedding_dim)
        x = self.embedding(x)
    
        # x shape after concatenation == (batch_size, 1, embedding_dim + hidden_size)
        x = tf.concat([tf.expand_dims(context_vector, 1), x], axis=-1)
    
        # passing the concatenated vector to the GRU
        output, state = self.gru(x)
    
        # output shape == (batch_size * 1, hidden_size)
        output = tf.reshape(output, (-1, output.shape[2]))
    
        # output shape == (batch_size, vocab)
        x = self.fc(output)
    
        return x, state, attention_weights
    
    我曾尝试将
    tf.keras.layers.Layer
    对象放入
    call
    中,但结果非常糟糕,我猜这是因为如果将它放入
    call
    中,那么每次发生前向反向传播时,它都会被多次调用

    class BahdanauAttention(tf.keras.layers.Layer):
      def __init__(self, units):
        super(BahdanauAttention, self).__init__()
        self.W1 = tf.keras.layers.Dense(units)
        self.W2 = tf.keras.layers.Dense(units)
        self.V = tf.keras.layers.Dense(1)
    
      def call(self, query, values):
        # query hidden state shape == (batch_size, hidden size)
        # query_with_time_axis shape == (batch_size, 1, hidden size)
        # values shape == (batch_size, max_len, hidden size)
        # we are doing this to broadcast addition along the time axis to calculate the score
        query_with_time_axis = tf.expand_dims(query, 1)
    
        # score shape == (batch_size, max_length, 1)
        # we get 1 at the last axis because we are applying score to self.V
        # the shape of the tensor before applying self.V is (batch_size, max_length, units)
        score = self.V(tf.nn.tanh(
            self.W1(query_with_time_axis) + self.W2(values)))
    
        # attention_weights shape == (batch_size, max_length, 1)
        attention_weights = tf.nn.softmax(score, axis=1)
    
        # context_vector shape after sum == (batch_size, hidden_size)
        context_vector = attention_weights * values
        context_vector = tf.reduce_sum(context_vector, axis=1)
    
        return context_vector, attention_weights
    
    class Decoder(tf.keras.Model):
      def __init__(self, vocab_size, embedding_dim, dec_units, batch_sz):
        super(Decoder, self).__init__()
        self.batch_sz = batch_sz
        self.dec_units = dec_units
        self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
        self.gru = tf.keras.layers.GRU(self.dec_units,
                                       return_sequences=True,
                                       return_state=True,
                                       recurrent_initializer='glorot_uniform')
        self.fc = tf.keras.layers.Dense(vocab_size)
    
        # used for attention
        self.attention = BahdanauAttention(self.dec_units)
    
      def call(self, x, hidden, enc_output):
        # enc_output shape == (batch_size, max_length, hidden_size)
        context_vector, attention_weights = self.attention(hidden, enc_output)
    
        # x shape after passing through embedding == (batch_size, 1, embedding_dim)
        x = self.embedding(x)
    
        # x shape after concatenation == (batch_size, 1, embedding_dim + hidden_size)
        x = tf.concat([tf.expand_dims(context_vector, 1), x], axis=-1)
    
        # passing the concatenated vector to the GRU
        output, state = self.gru(x)
    
        # output shape == (batch_size * 1, hidden_size)
        output = tf.reshape(output, (-1, output.shape[2]))
    
        # output shape == (batch_size, vocab)
        x = self.fc(output)
    
        return x, state, attention_weights