Python 尽管加载了最佳权重,但提前停止不停止模型

Python 尽管加载了最佳权重,但提前停止不停止模型,python,tensorflow,keras,callback,neural-network,Python,Tensorflow,Keras,Callback,Neural Network,我正在运行一个使用tf.keras的图像分类程序,并试图确定精度和val_精度的误差曲线。但是,当我添加一个提前停止回调时,模型不会停止训练,即使它超过了耐心阈值 我尝试过更改早期停止监视器的值,但我发现它正在监视正确的值,因为我在tensorflow.python.keras.callbacks.py文件中达到了这一点 其他: self.wait+=1 如果self.wait>=self.patience: self.stopped\u epoch=epoch self.model.stop\

我正在运行一个使用tf.keras的图像分类程序,并试图确定精度和val_精度的误差曲线。但是,当我添加一个提前停止回调时,模型不会停止训练,即使它超过了耐心阈值

我尝试过更改早期停止监视器的值,但我发现它正在监视正确的值,因为我在tensorflow.python.keras.callbacks.py文件中达到了这一点

其他:
self.wait+=1
如果self.wait>=self.patience:
self.stopped\u epoch=epoch
self.model.stop\u training=True
如果self.restore\u最佳权重:
如果self.verbose>0:
打印('从最佳纪元结束时恢复模型权重')
self.model.set_权重(self.best_权重)
我的输出显示了打印行,因此我清楚地点击了
self.model.stop\u training=True
行,但是我的模型继续。下面是一个模型在到达早期停止点时仍在运行的示例。您可以看到,在第9个纪元结束时,它“从最佳纪元结束时恢复模型权重”。然而,它在第10个纪元结束后继续运行

Epoch 1/10
 9/10 [==========================>...] - ETA: 1s - loss: 1.1147 - categorical_accuracy: 0.6058
Epoch 00001: val_categorical_accuracy improved from -inf to 0.25000, saving model to /home/chale/ml_classify/data/best.weights.hdf5
10/10 [==============================] - 29s 3s/step - loss: 1.0876 - categorical_accuracy: 0.6013 - val_loss: 60.9186 - val_categorical_accuracy: 0.2500
Epoch 2/10
 9/10 [==========================>...] - ETA: 0s - loss: 1.2638 - categorical_accuracy: 0.5694
Epoch 00002: val_categorical_accuracy did not improve from 0.25000
10/10 [==============================] - 7s 747ms/step - loss: 1.2278 - categorical_accuracy: 0.5750 - val_loss: 147.1493 - val_categorical_accuracy: 0.2396
Epoch 3/10
 9/10 [==========================>...] - ETA: 0s - loss: 0.5760 - categorical_accuracy: 0.8321
Epoch 00003: val_categorical_accuracy improved from 0.25000 to 0.26042, saving model to /home/chale/ml_classify/data/best.weights.hdf5
10/10 [==============================] - 10s 972ms/step - loss: 0.5569 - categorical_accuracy: 0.8288 - val_loss: 21.9862 - val_categorical_accuracy: 0.2604
Epoch 4/10
 9/10 [==========================>...] - ETA: 0s - loss: 0.4401 - categorical_accuracy: 0.8681
Epoch 00004: val_categorical_accuracy improved from 0.26042 to 0.30208, saving model to /home/chale/ml_classify/data/best.weights.hdf5
10/10 [==============================] - 9s 897ms/step - loss: 0.4383 - categorical_accuracy: 0.8687 - val_loss: 146.7307 - val_categorical_accuracy: 0.3021
Epoch 5/10
 9/10 [==========================>...] - ETA: 0s - loss: 0.4499 - categorical_accuracy: 0.8394
Epoch 00005: val_categorical_accuracy did not improve from 0.30208
10/10 [==============================] - 7s 714ms/step - loss: 0.4218 - categorical_accuracy: 0.8493 - val_loss: 71.2797 - val_categorical_accuracy: 0.1354
Epoch 6/10
 9/10 [==========================>...] - ETA: 0s - loss: 0.5760 - categorical_accuracy: 0.8194
Epoch 00006: val_categorical_accuracy improved from 0.30208 to 0.38542, saving model to /home/chale/ml_classify/data/best.weights.hdf5
10/10 [==============================] - 10s 974ms/step - loss: 0.5342 - categorical_accuracy: 0.8313 - val_loss: 13.7430 - val_categorical_accuracy: 0.3854
Epoch 7/10
 9/10 [==========================>...] - ETA: 0s - loss: 0.3852 - categorical_accuracy: 0.9000
Epoch 00007: val_categorical_accuracy did not improve from 0.38542
10/10 [==============================] - 6s 619ms/step - loss: 0.4190 - categorical_accuracy: 0.8973 - val_loss: 164.1882 - val_categorical_accuracy: 0.2708
Epoch 8/10
 9/10 [==========================>...] - ETA: 0s - loss: 0.3401 - categorical_accuracy: 0.8905
Epoch 00008: val_categorical_accuracy did not improve from 0.38542
10/10 [==============================] - 7s 723ms/step - loss: 0.3745 - categorical_accuracy: 0.8889 - val_loss: 315.0913 - val_categorical_accuracy: 0.2708
Epoch 9/10
 9/10 [==========================>...] - ETA: 0s - loss: 0.2713 - categorical_accuracy: 0.8958
Epoch 00009: val_categorical_accuracy did not improve from 0.38542
Restoring model weights from the end of the best epoch.
10/10 [==============================] - 9s 853ms/step - loss: 0.2550 - categorical_accuracy: 0.9062 - val_loss: 340.6383 - val_categorical_accuracy: 0.2708
Epoch 10/10
 9/10 [==========================>...] - ETA: 0s - loss: 0.4282 - categorical_accuracy: 0.8759
Epoch 00010: val_categorical_accuracy did not improve from 0.38542
Restoring model weights from the end of the best epoch.
10/10 [==============================] - 8s 795ms/step - loss: 0.4260 - categorical_accuracy: 0.8758 - val_loss: 4.5791 - val_categorical_accuracy: 0.2500
Epoch 00010: early stopping
以下是该问题的主要代码

如果损失==“分类的交叉熵”:
监视器='val\u分类精度'
其他:
监视器='val\u二进制\u精度'
提前停止=提前停止(监视器=监视器,耐心=3,详细=1,恢复最佳权重=真)
检查点路径='{}/best.weights.hdf5'.格式(输出目录)
最佳\u模型=模型检查点(检查点路径,监视器=监视器,详细信息=1,保存\u最佳\u仅=真,模式='max')
#reduce_lr=tensorflow.python.keras.callbacks.reducelRonplation()
m=度量(标签=标签,val\u数据=验证\u生成器,批次大小=批次大小)
历史=型号安装发电机(列发电机、,
每个历元的步数=每个历元的步数,
时代,
使用_multiprocessing=True,
验证数据=验证生成器,
验证步骤=验证步骤,
回调=[tensorboard,最佳模型,提前停止,
WandbCallback(数据类型=“图像”,
验证数据=验证生成器,
标签=标签)]#,附表])
回归历史
所有代码都在

我原以为,当早期停止的耐心过去后,剩下的时代就不会过去了。如果提前停止,返回的模型将是权重中的最佳模型。然而,这个模型不是最好的模型,它是最后一个模型。我想返回最好的模型,并在提前停止后跳过历代

编辑:在复制并添加一些打印到EarlyStopping类之后,我发现了这个

Epoch 6/10
8/9 [=========================>....] - ETA: 0s - loss: 0.5594 - categorical_accuracy: 0.9062
Epoch 00006: val_categorical_accuracy did not improve from 0.27083
3 epochs since improvement to val_categorical_accuracy
Model stop_training state previously: False
Model stop_training state now: True
Restoring model weights from the end of the best epoch.
9/9 [==============================] - 8s 855ms/step - loss: 0.5511 - categorical_accuracy: 0.8889 - val_loss: 466.1678 - val_categorical_accuracy: 0.2292
Epoch 7/10
8/9 [=========================>....] - ETA: 0s - loss: 0.3544 - categorical_accuracy: 0.8992
Epoch 00007: val_categorical_accuracy did not improve from 0.27083
4 epochs since improvement to val_categorical_accuracy
Model stop_training state previously: False
Model stop_training state now: True
Restoring model weights from the end of the best epoch.

当self.model.stop_training设置为True时,它似乎不会持续到下一个纪元的末尾。看来回调中发生的事情没有应用到模型中?我不确定。欢迎有任何见解。

我也有同样的问题。我设置了
epochs=100
patience=5
,但每次我都有100个训练时期

我发现,这些家伙的早起行为是正确的:


主要提示:使用
min_delta
param。在这种情况下,如果在经过数个历次之后,与之前的最佳成绩相比没有任何改善,则将停止培训,这是由
patience
param设置的。

我想知道在将use\u multiprocessing设置为False之后,相同的问题是否仍然存在。我刚刚测试了它,并且该问题确实在
use\u multiprocessing=False
中持续存在。您找到方法让它工作了吗?我也看到了类似的问题,尽管在我的案例中是这样的。似乎不尊重model.stop_训练标志