Python 使用MNIST数字集上的Keras计算损失

Python 使用MNIST数字集上的Keras计算损失,python,tensorflow,machine-learning,keras,deep-learning,Python,Tensorflow,Machine Learning,Keras,Deep Learning,我遵循数据科学教科书中的一个例子,遇到了一个问题,在运行简单的Keras神经网络以找到最佳学习率时,我得到了损失的NaN值 # Get data and split into test/train/valid and normalize (X_train_full, y_train_full), (X_test, y_test) = keras.datasets.mnist.load_data() X_valid, X_train = X_train_full[:5000] / 255., X_

我遵循数据科学教科书中的一个例子,遇到了一个问题,在运行简单的Keras神经网络以找到最佳学习率时,我得到了损失的NaN值

# Get data and split into test/train/valid and normalize
(X_train_full, y_train_full), (X_test, y_test) = keras.datasets.mnist.load_data()
X_valid, X_train = X_train_full[:5000] / 255., X_train_full[5000:] / 255.
y_valid, y_train = y_train_full[:5000], y_train_full[5000:]
X_test = X_test / 255.


# Callback to grow the learning rate at each iteration.
# Also record learning rate and loss at each iteration.
K = keras.backend
class ExponentialLearningRate(keras.callbacks.Callback):
    def __init__(self, factor):
        self.factor = factor
        self.rates = []
        self.losses = []
    def on_batch_end(self, batch, logs):
        self.rates.append(K.get_value(self.model.optimizer.lr))
        self.losses.append(logs["loss"])
        K.set_value(self.model.optimizer.lr, self.model.optimizer.lr * self.factor)

# Define the model and compile/fit.
keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

model = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[28, 28]),
    keras.layers.Dense(300, activation="relu"),
    keras.layers.Dense(100, activation="relu"),
    keras.layers.Dense(10, activation="softmax")
])

model.compile(loss="sparse_categorical_crossentropy",
              optimizer=keras.optimizers.SGD(lr=1e-3),
              metrics=["accuracy"])
expon_lr = ExponentialLearningRate(factor=1.005)

history = model.fit(X_train, y_train, epochs=1,
                    validation_data=(X_valid, y_valid),
                    callbacks=[expon_lr])
运行此命令将产生以下输出:

1719/1719 [==============================] - 6s 4ms/step - loss: nan - accuracy: 0.6030 - val_loss: nan - val_accuracy: 0.0958
绘制损失与学习率之间的关系图(顶部是我的结果,底部是我下面示例中的预期结果):

值得注意的是,示例损耗比我的噪声大得多,范围在~2.5到~0.25之间。我的损失范围仅为~2.5到正好1,在这一点上损失变为NaN


也许自本例编写以来,keras/tf的某些内容已经更新,但由于我是keras新手,我想知道这里可能存在什么问题。

您的问题是指数级的学习率,您的学习率从0.0010150751上升到5.237502,这就是您的损失急剧增加的原因,请像这样更改优化器

optimizer=tf.keras.optimizers.Adam(0.001)

删除回调,您的损失将很好,然后

可能会坚持默认的学习率,所以不要指定它。您也可以尝试使用Adam作为优化器。