简单Keras LSTM模型的PyTorch版本
尝试将Keras中的简单LSTM模型转换为PyTorch代码。Keras模型仅在200个时代后收敛,而PyTorch模型:简单Keras LSTM模型的PyTorch版本,keras,lstm,pytorch,Keras,Lstm,Pytorch,尝试将Keras中的简单LSTM模型转换为PyTorch代码。Keras模型仅在200个时代后收敛,而PyTorch模型: 需要更多的时间才能达到相同的损失水平(200对8000) 似乎过度拟合输入,因为预测值不接近100 这是Keras代码: from numpy import array from keras.models import Sequential from keras.layers import LSTM from keras.layers import Dense X =
- 需要更多的时间才能达到相同的损失水平(200对8000)
- 似乎过度拟合输入,因为预测值不接近100
from numpy import array
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense
X = array([10,20,30,20,30,40,30,40,50,40,50,60,50,60,70,60,70,80]).reshape((6,3,1))
y = array([40,50,60,70,80,90])
model = Sequential()
model.add(LSTM(50, activation='relu', recurrent_activation='sigmoid', input_shape=(3, 1)))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')
model.fit(X, y, epochs=200, verbose=1)
x_input = array([70, 80, 90]).reshape((1, 3, 1))
yhat = model.predict(x_input, verbose=0)
print(yhat)
这是等效的PyTorch代码:
from numpy import array
import torch
import torch.nn as nn
import torch.nn.functional as F
X = torch.tensor([10,20,30,20,30,40,30,40,50,40,50,60,50,60,70,60,70,80]).float().reshape(6,3,1)
y = torch.tensor([40,50,60,70,80,90]).float().reshape(6,1)
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.lstm = nn.LSTM(input_size=1, hidden_size=50, num_layers=1, batch_first=True)
self.fc = nn.Linear(50, 1)
def forward(self, x):
batches = x.size(0)
h0 = torch.zeros([1, batches, 50])
c0 = torch.zeros([1, batches, 50])
(x, _) = self.lstm(x, (h0, c0))
x = x[:,-1,:] # Keep only the output of the last iteration. Before shape (6,3,50), after shape (6,50)
x = F.relu(x)
x = self.fc(x)
return x
model = Model()
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters())
n_epochs = 8000
for epoch in range(n_epochs):
model.train()
optimizer.zero_grad()
y_ = model(X)
loss = criterion(y_, y)
loss.backward()
optimizer.step()
print(f"Epoch {epoch+1}/{n_epochs}, loss = {loss.item()}")
model.eval()
x_input = torch.tensor([70, 80, 90]).float().reshape((1, 3, 1))
yhat = model(x_input)
print(yhat)
唯一可能的区别是初始权重和偏差值,但我不认为稍微不同的权重和偏差可以解释行为上如此大的差异。
PyTorch代码中缺少了什么?行为差异是因为LSTM API中的激活函数。通过将激活更改为tanh,我也可以在Keras中重现问题 模型。添加(LSTM(50,激活='tanh',循环激活='sigmoid',输入形状=(3,1))) pytorch LSTM API中没有将激活函数更改为“relu”的选项。 从这里开始实施LSTM,
将hardsigmoid/tanh更改为sigmoid/relu,模型也会在pytorch中收敛。我认为您每次都在初始化h0,c0,这是初始时需要的。所以,最好使用下面我修改过的代码。您可以在pytorch中通过RNN的此链接: 这在2500个时代内给出了很好的预测结果。 我想知道你为什么写下面这行代码,它的目的是什么。所以,我可以试着让它变得更好
x = x[:,-1,:] # Keep only the output of the last iteration. Before shape (6,3,50), after shape (6,50)
x = x[:,-1,:] # Keep only the output of the last iteration. Before shape (6,3,50), after shape (6,50)