简单Keras LSTM模型的PyTorch版本

简单Keras LSTM模型的PyTorch版本,keras,lstm,pytorch,Keras,Lstm,Pytorch,尝试将Keras中的简单LSTM模型转换为PyTorch代码。Keras模型仅在200个时代后收敛,而PyTorch模型: 需要更多的时间才能达到相同的损失水平(200对8000) 似乎过度拟合输入,因为预测值不接近100 这是Keras代码: from numpy import array from keras.models import Sequential from keras.layers import LSTM from keras.layers import Dense X =

尝试将Keras中的简单LSTM模型转换为PyTorch代码。Keras模型仅在200个时代后收敛,而PyTorch模型:

  • 需要更多的时间才能达到相同的损失水平(200对8000)
  • 似乎过度拟合输入,因为预测值不接近100
这是Keras代码:

from numpy import array
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense

X = array([10,20,30,20,30,40,30,40,50,40,50,60,50,60,70,60,70,80]).reshape((6,3,1))
y = array([40,50,60,70,80,90])
model = Sequential()
model.add(LSTM(50, activation='relu', recurrent_activation='sigmoid',  input_shape=(3, 1)))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')
model.fit(X, y, epochs=200, verbose=1)
x_input = array([70, 80, 90]).reshape((1, 3, 1))
yhat = model.predict(x_input, verbose=0)
print(yhat)
这是等效的PyTorch代码:

from numpy import array
import torch
import torch.nn as nn
import torch.nn.functional as F

X = torch.tensor([10,20,30,20,30,40,30,40,50,40,50,60,50,60,70,60,70,80]).float().reshape(6,3,1)
y = torch.tensor([40,50,60,70,80,90]).float().reshape(6,1)

class Model(nn.Module):
  def __init__(self):
    super(Model, self).__init__()
    self.lstm = nn.LSTM(input_size=1, hidden_size=50, num_layers=1, batch_first=True)
    self.fc = nn.Linear(50, 1)

  def forward(self, x):
    batches = x.size(0)
    h0 = torch.zeros([1, batches, 50])
    c0 = torch.zeros([1, batches, 50])
    (x, _) = self.lstm(x, (h0, c0))
    x = x[:,-1,:]  # Keep only the output of the last iteration. Before shape (6,3,50), after shape (6,50)
    x = F.relu(x)
    x = self.fc(x)
    return x

model = Model()
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters())

n_epochs = 8000
for epoch in range(n_epochs):
  model.train()
  optimizer.zero_grad()
  y_ = model(X)
  loss = criterion(y_, y)
  loss.backward()
  optimizer.step()
  print(f"Epoch {epoch+1}/{n_epochs}, loss = {loss.item()}")

model.eval()
x_input = torch.tensor([70, 80, 90]).float().reshape((1, 3, 1))
yhat = model(x_input)
print(yhat)
唯一可能的区别是初始权重和偏差值,但我不认为稍微不同的权重和偏差可以解释行为上如此大的差异。
PyTorch代码中缺少了什么?

行为差异是因为LSTM API中的激活函数。通过将激活更改为tanh,我也可以在Keras中重现问题

模型。添加(LSTM(50,激活='tanh',循环激活='sigmoid',输入形状=(3,1)))

pytorch LSTM API中没有将激活函数更改为“relu”的选项。

从这里开始实施LSTM,
将hardsigmoid/tanh更改为sigmoid/relu,模型也会在pytorch中收敛。

我认为您每次都在初始化h0,c0,这是初始时需要的。所以,最好使用下面我修改过的代码。您可以在pytorch中通过RNN的此链接:

这在2500个时代内给出了很好的预测结果。 我想知道你为什么写下面这行代码,它的目的是什么。所以,我可以试着让它变得更好

x = x[:,-1,:]  # Keep only the output of the last iteration. Before shape (6,3,50), after shape (6,50)  
x = x[:,-1,:]  # Keep only the output of the last iteration. Before shape (6,3,50), after shape (6,50)