Deep learning 损失减少,突然跳跃

Deep learning 损失减少,突然跳跃,deep-learning,reinforcement-learning,q-learning,Deep Learning,Reinforcement Learning,Q Learning,我正在用DQN培训一名特工。报酬在增加,损失在减少。这是一个好的迹象,我有很好的结果。然而,我有点怀疑,因为损失减少了,突然跳到一个非常高的值 以下是前20个时代 =============================== Reward: 0.0 Steps: 0.0 Update: 1 Time: 1.2 Episodes: 1 Loss: 19796.0547 =============================== Reward: 13243.5 Steps: 100.0 Upd

我正在用DQN培训一名特工。报酬在增加,损失在减少。这是一个好的迹象,我有很好的结果。然而,我有点怀疑,因为损失减少了,突然跳到一个非常高的值

以下是前20个时代

===============================
Reward: 0.0 Steps: 0.0 Update: 1 Time: 1.2 Episodes: 1
Loss: 19796.0547
===============================
Reward: 13243.5 Steps: 100.0 Update: 3 Time: 5.33 Episodes: 2
Loss: 19431.1680
===============================
Reward: 13507.0 Steps: 100.0 Update: 6 Time: 5.56 Episodes: 3
Loss: 19586.0059
===============================
Reward: 13469.5 Steps: 100.0 Update: 9 Time: 5.96 Episodes: 4
Loss: 19398.0176
===============================
Reward: 13923.5 Steps: 100.0 Update: 12 Time: 6.34 Episodes: 5
Loss: 19539.2090
===============================
Reward: 13382.0 Steps: 100.0 Update: 15 Time: 6.57 Episodes: 6
Loss: 19461.4648
===============================
Reward: 14326.0 Steps: 100.0 Update: 18 Time: 6.89 Episodes: 7
Loss: 19103.9668
===============================
Reward: 15041.0 Steps: 100.0 Update: 21 Time: 7.16 Episodes: 8
Loss: 19470.4160
===============================
Reward: 15718.5 Steps: 100.0 Update: 24 Time: 7.52 Episodes: 9
Loss: 19668.2324
===============================
Reward: 14925.5 Steps: 100.0 Update: 27 Time: 8.0 Episodes: 10
Loss: 19771.4648
===============================
Reward: 15555.0 Steps: 100.0 Update: 30 Time: 8.12 Episodes: 11
Loss: 19788.6621
===============================
Reward: 14711.0 Steps: 100.0 Update: 33 Time: 8.52 Episodes: 12
Loss: 19724.0176
===============================
Reward: 15329.5 Steps: 100.0 Update: 36 Time: 9.03 Episodes: 13
Loss: 19551.4707
===============================
Reward: 15748.0 Steps: 100.0 Update: 39 Time: 9.17 Episodes: 14
Loss: 19516.3770
===============================
Reward: 15666.5 Steps: 100.0 Update: 42 Time: 9.39 Episodes: 15
Loss: 19426.6973
===============================
Reward: 15593.5 Steps: 100.0 Update: 45 Time: 9.85 Episodes: 16
Loss: 19327.2832
===============================
Reward: 15614.0 Steps: 100.0 Update: 48 Time: 10.13 Episodes: 17
Loss: 19158.5488
===============================
Reward: 15874.5 Steps: 100.0 Update: 51 Time: 10.47 Episodes: 18
Loss: 19061.7402
===============================
Reward: 15575.5 Steps: 100.0 Update: 54 Time: 10.68 Episodes: 19
Loss: 18895.0918
===============================
Reward: 15949.5 Steps: 100.0 Update: 57 Time: 11.01 Episodes: 20
Loss: 18741.6094
37个时代之后
,奖励达到
~17000
,损失减少到
15694

在这里你可以注意到损失的大幅增加。它在100集中播放了3次

Reward: 16366.0 Steps: 100.0 Update: 117 Time: 17.44 Episodes: 40
Loss: 15099.0156
===============================
Reward: 15909.5 Steps: 100.0 Update: 120 Time: 17.9 Episodes: 41
Loss: 14892.0322
===============================
Reward: 16744.5 Steps: 100.0 Update: 123 Time: 17.87 Episodes: 42
Loss: 14705.1650
===============================
Reward: 16613.5 Steps: 100.0 Update: 126 Time: 18.39 Episodes: 43
Loss: 14518.6943
===============================
Reward: 16422.0 Steps: 100.0 Update: 129 Time: 18.8 Episodes: 44
Loss: 19189.0879
===============================
Reward: 16820.5 Steps: 100.0 Update: 132 Time: 19.27 Episodes: 45
Loss: 28676.2344
===============================
Reward: 16513.5 Steps: 100.0 Update: 135 Time: 19.66 Episodes: 46
Loss: 28341.6875
===============================
Reward: 16878.5 Steps: 100.0 Update: 138 Time: 20.08 Episodes: 47
Loss: 27986.1465

我预计损失会持续减少或稳定下来。我如何解释损失的增加?我怎样才能避免呢?

可能是梯度爆炸的问题。在那里,训练期间损失突然变得非常大。 您可以尝试使用L2规范化()和渐变剪裁。此外,您还可以调整学习率,可以降低学习率或使用其他优化器(例如,仅使用SGD而不是Adam或您正在使用的任何优化器)。如果您使用的是循环单元格,可以尝试使用LSTM而不是GRU。

可能是梯度爆炸的问题。在训练过程中,损失突然变得非常大。 您可以尝试使用L2规范化()和梯度剪裁。此外,您还可以调整学习速率,或者降低学习速率,或者使用其他优化器(例如,只使用SGD而不是Adam或您正在使用的任何优化器。如果您使用的是循环单元格,则可以尝试使用LSTM而不是GRU。