Python keras argmax没有渐变。如何为argmax定义渐变？_Python_Tensorflow_Keras_Deep Learning_Nlp

Python keras argmax没有渐变。如何为argmax定义渐变？

python tensorflow keras deep-learning nlp

Python keras argmax没有渐变。如何为argmax定义渐变？,python,tensorflow,keras,deep-learning,nlp,Python,Tensorflow,Keras,Deep Learning,Nlp,我正在gamma层中使用Keras.Backend.armax。模型编译良好，但在拟合过程中抛出错误 ValueError: An operation has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval. 我的模

我正在gamma层中使用Keras.Backend.armax。模型编译良好，但在拟合过程中抛出错误

ValueError: An operation has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.

我的模型：

latent_dim = 512
encoder_inputs = Input(shape=(train_data.shape[1],))
encoder_dense = Dense(vocabulary, activation='softmax')
encoder_outputs = Embedding(vocabulary, latent_dim)(encoder_inputs)
encoder_outputs = LSTM(latent_dim, return_sequences=True)(encoder_outputs)
encoder_outputs = Dropout(0.5)(encoder_outputs)
encoder_outputs = encoder_dense(encoder_outputs)
encoder_outputs = Lambda(K.argmax, arguments={'axis':-1})(encoder_outputs)
encoder_outputs = Lambda(K.cast, arguments={'dtype':'float32'})(encoder_outputs)

encoder_dense1 = Dense(train_label.shape[1], activation='softmax')
decoder_embedding = Embedding(vocabulary, latent_dim)
decoder_lstm1 = LSTM(latent_dim, return_sequences=True)
decoder_lstm2 = LSTM(latent_dim, return_sequences=True)
decoder_dense2 = Dense(vocabulary, activation='softmax')

decoder_outputs = encoder_dense1(encoder_outputs)
decoder_outputs = decoder_embedding(decoder_outputs)
decoder_outputs = decoder_lstm1(decoder_outputs)
decoder_outputs = decoder_lstm2(decoder_outputs)
decoder_outputs = Dropout(0.5)(decoder_outputs)
decoder_outputs = decoder_dense2(decoder_outputs)
model = Model(encoder_inputs, decoder_outputs)
model.summary()

便于可视化的模型摘要：

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_7 (InputLayer)         (None, 32)                0         
_________________________________________________________________
embedding_13 (Embedding)     (None, 32, 512)           2018816   
_________________________________________________________________
lstm_19 (LSTM)               (None, 32, 512)           2099200   
_________________________________________________________________
dropout_10 (Dropout)         (None, 32, 512)           0         
_________________________________________________________________
dense_19 (Dense)             (None, 32, 3943)          2022759   
_________________________________________________________________
lambda_5 (Lambda)            (None, 32)                0         
_________________________________________________________________
lambda_6 (Lambda)            (None, 32)                0         
_________________________________________________________________
dense_20 (Dense)             (None, 501)               16533     
_________________________________________________________________
embedding_14 (Embedding)     (None, 501, 512)          2018816   
_________________________________________________________________
lstm_20 (LSTM)               (None, 501, 512)          2099200   
_________________________________________________________________
lstm_21 (LSTM)               (None, 501, 512)          2099200   
_________________________________________________________________
dropout_11 (Dropout)         (None, 501, 512)          0         
_________________________________________________________________
dense_21 (Dense)             (None, 501, 3943)         2022759   
=================================================================
Total params: 14,397,283
Trainable params: 14,397,283
Non-trainable params: 0
_________________________________________________________________

我在谷歌上搜索解决方案，但几乎都是关于一个有缺陷的模型。一些建议不要使用导致问题的功能。但是，正如您所看到的，没有K.argmax，我无法创建此模型。如果您知道任何其他方法，请告诉我。如何解决这个问题，从而训练我的模型？

显然，Argmax函数没有梯度；这是怎么定义的？为了使模型工作，需要使层不可训练。根据or，您需要将trainable=False传递给您的层。对于层权重（如果适用），您可能希望将其设置为单位矩阵。

使trainable=False没有帮助。我仍然会犯同样的错误。我认为在没有参数的图层上设置trainable=False没有任何效果。你有一个巨大的概念问题，argmax没有梯度，它是不可微的，所以你不能将它用于你的模型。是的，我知道argmax没有梯度，希望能用类似0的东西来定义它来修正错误。我需要一个像argmax这样的函数才能让我的模型工作，你知道我还可以使用其他函数吗？再一次，另一个概念问题，你不能定义argmax的梯度，如果你这样做，它总是错的，然后模型就不能训练了，因为梯度中的信息是完全错误的，所以你是说没有解决这个错误的方法？因此，我必须使用不同的模型，因为没有argmax，这显然不起作用？是的，这不起作用。不要使用没有梯度的操作。