无法理解tensorflow keras层（tf.keras.layers.layers）中方法“build”的行为_Tensorflow_Keras_Tensorflow2.0_Keras Layer_Tf.keras

无法理解tensorflow keras层（tf.keras.layers.layers）中方法“build”的行为

tensorflow keras

无法理解tensorflow keras层（tf.keras.layers.layers）中方法“build”的行为,tensorflow,keras,tensorflow2.0,keras-layer,tf.keras,Tensorflow,Keras,Tensorflow2.0,Keras Layer,Tf.keras,tensorflow keras中的层有一个方法build，用于将权重创建推迟到您看到输入内容时我有几个问题没有找到答案：据说如果将图层实例指定为另一个图层的属性，则外层将开始跟踪内层的权重跟踪层的权重意味着什么同一链接还提到我们建议在init方法中创建这样的子层（因为子层通常有一个构建方法，它们将在构建外层时构建）这是否意味着，在运行子类（self）的build方法时，将对self的所有属性进行迭代，并且从tf.keras.layer.layer的（实例）中发现的子类将自动运

tensorflow keras中的层有一个方法

build

，用于将权重创建推迟到您看到输入内容时

我有几个问题没有找到答案：

据说如果将图层实例指定为另一个图层的属性，则外层将开始跟踪内层的权重

跟踪层的权重意味着什么

同一链接还提到我们建议在init方法中创建这样的子层（因为子层通常有一个构建方法，它们将在构建外层时构建）

这是否意味着，在运行子类（self）的

build

方法时，将对

self

的所有属性进行迭代，并且从

tf.keras.layer.layer

的（实例）中发现的子类将自动运行其

build

方法

我可以运行以下代码：

但不是这个：

class Net(tf.keras.Model):
  """A simple linear model."""

  def __init__(self):
    super(Net, self).__init__()
    self.l1 = tf.keras.layers.Dense(5)
  def build(self,input_shape):
    super().build()
  def call(self, x):
    return self.l1(x)

net = Net()
print(net.variables)

为什么？

我想说前面提到的构建意味着，例如，当您构建一个自定义的tf.keras.Model时

net = Net()

然后您将获得在

\uuuu init\uuuu

中创建的所有

tf.keras.layers.Layer

对象，并将其存储在可调用的

net

中。在这种情况下，它将成为TF稍后训练的完整对象，这就是它所说的跟踪。下次调用

net（inputs）

时，您将可以获得输出

下面是Tensorflow自定义解码器的一个示例

class BahdanauAttention(tf.keras.layers.Layer):
  def __init__(self, units):
    super(BahdanauAttention, self).__init__()
    self.W1 = tf.keras.layers.Dense(units)
    self.W2 = tf.keras.layers.Dense(units)
    self.V = tf.keras.layers.Dense(1)

  def call(self, query, values):
    # query hidden state shape == (batch_size, hidden size)
    # query_with_time_axis shape == (batch_size, 1, hidden size)
    # values shape == (batch_size, max_len, hidden size)
    # we are doing this to broadcast addition along the time axis to calculate the score
    query_with_time_axis = tf.expand_dims(query, 1)

    # score shape == (batch_size, max_length, 1)
    # we get 1 at the last axis because we are applying score to self.V
    # the shape of the tensor before applying self.V is (batch_size, max_length, units)
    score = self.V(tf.nn.tanh(
        self.W1(query_with_time_axis) + self.W2(values)))

    # attention_weights shape == (batch_size, max_length, 1)
    attention_weights = tf.nn.softmax(score, axis=1)

    # context_vector shape after sum == (batch_size, hidden_size)
    context_vector = attention_weights * values
    context_vector = tf.reduce_sum(context_vector, axis=1)

    return context_vector, attention_weights

class Decoder(tf.keras.Model):
  def __init__(self, vocab_size, embedding_dim, dec_units, batch_sz):
    super(Decoder, self).__init__()
    self.batch_sz = batch_sz
    self.dec_units = dec_units
    self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
    self.gru = tf.keras.layers.GRU(self.dec_units,
                                   return_sequences=True,
                                   return_state=True,
                                   recurrent_initializer='glorot_uniform')
    self.fc = tf.keras.layers.Dense(vocab_size)

    # used for attention
    self.attention = BahdanauAttention(self.dec_units)

  def call(self, x, hidden, enc_output):
    # enc_output shape == (batch_size, max_length, hidden_size)
    context_vector, attention_weights = self.attention(hidden, enc_output)

    # x shape after passing through embedding == (batch_size, 1, embedding_dim)
    x = self.embedding(x)

    # x shape after concatenation == (batch_size, 1, embedding_dim + hidden_size)
    x = tf.concat([tf.expand_dims(context_vector, 1), x], axis=-1)

    # passing the concatenated vector to the GRU
    output, state = self.gru(x)

    # output shape == (batch_size * 1, hidden_size)
    output = tf.reshape(output, (-1, output.shape[2]))

    # output shape == (batch_size, vocab)
    x = self.fc(output)

    return x, state, attention_weights

我曾尝试将

tf.keras.layers.Layer

对象放入

call

中，但结果非常糟糕，我猜这是因为如果将它放入

call

中，那么每次发生前向反向传播时，它都会被多次调用

class BahdanauAttention(tf.keras.layers.Layer):
  def __init__(self, units):
    super(BahdanauAttention, self).__init__()
    self.W1 = tf.keras.layers.Dense(units)
    self.W2 = tf.keras.layers.Dense(units)
    self.V = tf.keras.layers.Dense(1)

  def call(self, query, values):
    # query hidden state shape == (batch_size, hidden size)
    # query_with_time_axis shape == (batch_size, 1, hidden size)
    # values shape == (batch_size, max_len, hidden size)
    # we are doing this to broadcast addition along the time axis to calculate the score
    query_with_time_axis = tf.expand_dims(query, 1)

    # score shape == (batch_size, max_length, 1)
    # we get 1 at the last axis because we are applying score to self.V
    # the shape of the tensor before applying self.V is (batch_size, max_length, units)
    score = self.V(tf.nn.tanh(
        self.W1(query_with_time_axis) + self.W2(values)))

    # attention_weights shape == (batch_size, max_length, 1)
    attention_weights = tf.nn.softmax(score, axis=1)

    # context_vector shape after sum == (batch_size, hidden_size)
    context_vector = attention_weights * values
    context_vector = tf.reduce_sum(context_vector, axis=1)

    return context_vector, attention_weights

class Decoder(tf.keras.Model):
  def __init__(self, vocab_size, embedding_dim, dec_units, batch_sz):
    super(Decoder, self).__init__()
    self.batch_sz = batch_sz
    self.dec_units = dec_units
    self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
    self.gru = tf.keras.layers.GRU(self.dec_units,
                                   return_sequences=True,
                                   return_state=True,
                                   recurrent_initializer='glorot_uniform')
    self.fc = tf.keras.layers.Dense(vocab_size)

    # used for attention
    self.attention = BahdanauAttention(self.dec_units)

  def call(self, x, hidden, enc_output):
    # enc_output shape == (batch_size, max_length, hidden_size)
    context_vector, attention_weights = self.attention(hidden, enc_output)

    # x shape after passing through embedding == (batch_size, 1, embedding_dim)
    x = self.embedding(x)

    # x shape after concatenation == (batch_size, 1, embedding_dim + hidden_size)
    x = tf.concat([tf.expand_dims(context_vector, 1), x], axis=-1)

    # passing the concatenated vector to the GRU
    output, state = self.gru(x)

    # output shape == (batch_size * 1, hidden_size)
    output = tf.reshape(output, (-1, output.shape[2]))

    # output shape == (batch_size, vocab)
    x = self.fc(output)

    return x, state, attention_weights