Neural network 在Pytork上训练一个模型,但只有第一个历元有效

Neural network 在Pytork上训练一个模型,但只有第一个历元有效,neural-network,pytorch,seq2seq,Neural Network,Pytorch,Seq2seq,我是pytorch的新手,我正在尝试训练一个seq2seq模型(实际上是一个中国诗歌生成器)。我遇到一个问题,每次运行train.py并保存模型,然后再次运行train.py并加载模型时,由于未进行训练,损失会突然增加 经过实验,我发现如果我在每次运行中只训练模型1个历元,损失将正确下降,如下所示: PS F:\Programs\Programs-Python\RNN-Advanced-seq2seq> & F:/Anaconda/python.exe f:/Programs/Pr

我是pytorch的新手,我正在尝试训练一个seq2seq模型(实际上是一个中国诗歌生成器)。我遇到一个问题,每次运行train.py并保存模型,然后再次运行train.py并加载模型时,由于未进行训练,损失会突然增加

经过实验,我发现如果我在每次运行中只训练模型1个历元,损失将正确下降,如下所示:

PS F:\Programs\Programs-Python\RNN-Advanced-seq2seq> & F:/Anaconda/python.exe f:/Programs/Programs-Python/RNN-Advanced-seq2seq/train.py        
epoch: 1, loss = 3.670
PS F:\Programs\Programs-Python\RNN-Advanced-seq2seq> & F:/Anaconda/python.exe f:/Programs/Programs-Python/RNN-Advanced-seq2seq/train.py        
epoch: 1, loss = 3.641
PS F:\Programs\Programs-Python\RNN-Advanced-seq2seq> & F:/Anaconda/python.exe f:/Programs/Programs-Python/RNN-Advanced-seq2seq/train.py        
epoch: 1, loss = 3.640
PS F:\Programs\Programs-Python\RNN-Advanced-seq2seq> & F:/Anaconda/python.exe f:/Programs/Programs-Python/RNN-Advanced-seq2seq/train.py        
epoch: 1, loss = 3.690
epoch: 2, loss = 3.444
PS F:\Programs\Programs-Python\RNN-Advanced-seq2seq> & F:/Anaconda/python.exe f:/Programs/Programs-Python/RNN-Advanced-seq2seq/train.py        
epoch: 1, loss = 3.685
epoch: 2, loss = 3.418
PS F:\Programs\Programs-Python\RNN-Advanced-seq2seq> & F:/Anaconda/python.exe f:/Programs/Programs-Python/RNN-Advanced-seq2seq/train.py        
epoch: 1, loss = 3.670
但如果我训练它超过1个历元,1号之后的历元将是徒劳的,如下所示:

PS F:\Programs\Programs-Python\RNN-Advanced-seq2seq> & F:/Anaconda/python.exe f:/Programs/Programs-Python/RNN-Advanced-seq2seq/train.py        
epoch: 1, loss = 3.670
PS F:\Programs\Programs-Python\RNN-Advanced-seq2seq> & F:/Anaconda/python.exe f:/Programs/Programs-Python/RNN-Advanced-seq2seq/train.py        
epoch: 1, loss = 3.641
PS F:\Programs\Programs-Python\RNN-Advanced-seq2seq> & F:/Anaconda/python.exe f:/Programs/Programs-Python/RNN-Advanced-seq2seq/train.py        
epoch: 1, loss = 3.640
PS F:\Programs\Programs-Python\RNN-Advanced-seq2seq> & F:/Anaconda/python.exe f:/Programs/Programs-Python/RNN-Advanced-seq2seq/train.py        
epoch: 1, loss = 3.690
epoch: 2, loss = 3.444
PS F:\Programs\Programs-Python\RNN-Advanced-seq2seq> & F:/Anaconda/python.exe f:/Programs/Programs-Python/RNN-Advanced-seq2seq/train.py        
epoch: 1, loss = 3.685
epoch: 2, loss = 3.418
PS F:\Programs\Programs-Python\RNN-Advanced-seq2seq> & F:/Anaconda/python.exe f:/Programs/Programs-Python/RNN-Advanced-seq2seq/train.py        
epoch: 1, loss = 3.670
你可以看到损失在每第一个纪元都会下降,但其他纪元是无用的。由于损失在每第一个历元都在减少,所以整个训练毫无用处是不真实的

我已检查数据加载程序是否正常工作。我尝试在每个历元后保存模型,我确信每个历元后的模型都是不同的,我保存的是上一历元后的模型,我加载的正是我保存的

有什么问题吗?我已经睡了一个星期了。我感谢您帮助我解决这个wierd bug

这是我的培训代码:

以下是我对模型的定义: