简单Keras LSTM模型的PyTorch版本_Keras_Lstm_Pytorch

简单Keras LSTM模型的PyTorch版本

keras pytorch

简单Keras LSTM模型的PyTorch版本,keras,lstm,pytorch,Keras,Lstm,Pytorch,尝试将Keras中的简单LSTM模型转换为PyTorch代码。Keras模型仅在200个时代后收敛，而PyTorch模型：需要更多的时间才能达到相同的损失水平（200对8000）似乎过度拟合输入，因为预测值不接近100 这是Keras代码： from numpy import array from keras.models import Sequential from keras.layers import LSTM from keras.layers import Dense X =

尝试将Keras中的简单LSTM模型转换为PyTorch代码。Keras模型仅在200个时代后收敛，而PyTorch模型：

需要更多的时间才能达到相同的损失水平（200对8000）
似乎过度拟合输入，因为预测值不接近100

这是Keras代码：

from numpy import array
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense

X = array([10,20,30,20,30,40,30,40,50,40,50,60,50,60,70,60,70,80]).reshape((6,3,1))
y = array([40,50,60,70,80,90])
model = Sequential()
model.add(LSTM(50, activation='relu', recurrent_activation='sigmoid',  input_shape=(3, 1)))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')
model.fit(X, y, epochs=200, verbose=1)
x_input = array([70, 80, 90]).reshape((1, 3, 1))
yhat = model.predict(x_input, verbose=0)
print(yhat)

这是等效的PyTorch代码：

from numpy import array
import torch
import torch.nn as nn
import torch.nn.functional as F

X = torch.tensor([10,20,30,20,30,40,30,40,50,40,50,60,50,60,70,60,70,80]).float().reshape(6,3,1)
y = torch.tensor([40,50,60,70,80,90]).float().reshape(6,1)

class Model(nn.Module):
  def __init__(self):
    super(Model, self).__init__()
    self.lstm = nn.LSTM(input_size=1, hidden_size=50, num_layers=1, batch_first=True)
    self.fc = nn.Linear(50, 1)

  def forward(self, x):
    batches = x.size(0)
    h0 = torch.zeros([1, batches, 50])
    c0 = torch.zeros([1, batches, 50])
    (x, _) = self.lstm(x, (h0, c0))
    x = x[:,-1,:]  # Keep only the output of the last iteration. Before shape (6,3,50), after shape (6,50)
    x = F.relu(x)
    x = self.fc(x)
    return x

model = Model()
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters())

n_epochs = 8000
for epoch in range(n_epochs):
  model.train()
  optimizer.zero_grad()
  y_ = model(X)
  loss = criterion(y_, y)
  loss.backward()
  optimizer.step()
  print(f"Epoch {epoch+1}/{n_epochs}, loss = {loss.item()}")

model.eval()
x_input = torch.tensor([70, 80, 90]).float().reshape((1, 3, 1))
yhat = model(x_input)
print(yhat)

唯一可能的区别是初始权重和偏差值，但我不认为稍微不同的权重和偏差可以解释行为上如此大的差异。

PyTorch代码中缺少了什么？

行为差异是因为LSTM API中的激活函数。通过将激活更改为tanh，我也可以在Keras中重现问题

模型。添加（LSTM（50，激活='tanh'，循环激活='sigmoid'，输入形状=（3,1）））

pytorch LSTM API中没有将激活函数更改为“relu”的选项。

从这里开始实施LSTM，

将hardsigmoid/tanh更改为sigmoid/relu，模型也会在pytorch中收敛。

我认为您每次都在初始化h0，c0，这是初始时需要的。所以，最好使用下面我修改过的代码。您可以在pytorch中通过RNN的此链接：

这在2500个时代内给出了很好的预测结果。我想知道你为什么写下面这行代码，它的目的是什么。所以，我可以试着让它变得更好

x = x[:,-1,:]  # Keep only the output of the last iteration. Before shape (6,3,50), after shape (6,50)

x = x[:,-1,:]  # Keep only the output of the last iteration. Before shape (6,3,50), after shape (6,50)