从Keras到Pytorch的LSTM模型转换

从Keras到Pytorch的LSTM模型转换,keras,pytorch,lstm,Keras,Pytorch,Lstm,我很难将一个非常简单的LSTM模型从Keras翻译成PytorchX(获取它)对应于90个时间步的1152个样本,每个时间步只有一个维度y()是所有1152个样本在t=91时的单一预测 在凯拉斯: from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense, Dropout, LSTM import numpy as np import pandas as pd X = pd.

我很难将一个非常简单的LSTM模型从Keras翻译成Pytorch
X
(获取它)对应于90个时间步的1152个样本,每个时间步只有一个维度
y
()是所有1152个样本在t=91时的单一预测

在凯拉斯:

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, LSTM
import numpy as np
import pandas as pd

X = pd.read_csv('X.csv', header = None).values
X.shape

y = pd.read_csv('y.csv', header = None).values
y.shape

# From Keras documentation [https://keras.io/layers/recurrent/]: 
# Input shape 3D tensor with shape (batch_size, timesteps, input_dim).
X = np.reshape(X, (1152, 90, 1))

regressor = Sequential()
regressor.add(LSTM(units = 100, return_sequences = True, input_shape = (90, 1)))
regressor.add(Dropout(0.3))
regressor.add(LSTM(units = 50, return_sequences = True))
regressor.add(Dropout(0.3))
regressor.add(LSTM(units = 50, return_sequences = True))
regressor.add(Dropout(0.3))
regressor.add(LSTM(units = 50))
regressor.add(Dropout(0.3))
regressor.add(Dense(units = 1, activation = 'linear'))
regressor.compile(optimizer = 'rmsprop', loss = 'mean_squared_error', metrics = ['mean_absolute_error'])
regressor.fit(X, y, epochs = 10, batch_size = 32)
。。。让我想到:

# Epoch 10/10
# 1152/1152 [==============================] - 33s 29ms/sample - loss: 0.0068 - mean_absolute_error: 0.0628
# EPOCH  10: loss 0.04220 - MAE 0.16762
然后在Pytork:

import torch
from torch import nn, optim
from sklearn.metrics import mean_absolute_error

X = pd.read_csv('X.csv', header = None).values
y = pd.read_csv('y.csv', header = None).values

X = torch.tensor(X, dtype = torch.float32)
y = torch.tensor(y, dtype = torch.float32)

dataset = torch.utils.data.TensorDataset(X, y)
loader = torch.utils.data.DataLoader(dataset, batch_size = 32, shuffle = True)

class regressor_LSTM(nn.Module):
    def __init__(self):
        super().__init__()
        self.lstm1 = nn.LSTM(input_size = 1, hidden_size = 100)
        self.lstm2 = nn.LSTM(100, 50)
        self.lstm3 = nn.LSTM(50, 50, dropout = 0.3, num_layers = 2)
        self.dropout = nn.Dropout(p = 0.3)
        self.linear = nn.Linear(in_features = 50, out_features = 1)

    def forward(self, X):
        # From the Pytorch documentation [https://pytorch.org/docs/stable/_modules/torch/nn/modules/rnn.html]:
        # **input** of shape `(seq_len, batch, input_size)`
        X = X.view(90, 32, 1)
        # I am discarding hidden/cell states since in Keras I am using a stateless approach
        # [https://keras.io/examples/lstm_stateful/]
        X, _ = self.lstm1(X)
        X = self.dropout(X)
        X, _ = self.lstm2(X)
        X = self.dropout(X)
        X, _ = self.lstm3(X)
        X = self.dropout(X)
        X = self.linear(X)

        return X

regressor = regressor_LSTM()
criterion = nn.MSELoss()
optimizer = optim.RMSprop(regressor.parameters())

for epoch in range(10):
    running_loss = 0.
    running_mae = 0.

    for i, data in enumerate(loader):
        inputs, labels = data
        optimizer.zero_grad()
        outputs = regressor(inputs)
        outputs = outputs[-1].view(*labels.shape)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
        mae = mean_absolute_error(labels.detach().cpu().numpy().flatten(), outputs.detach().cpu().numpy().flatten())
        running_mae += mae

    print('EPOCH %3d: loss %.5f - MAE %.5f' % (epoch+1, running_loss/len(loader), running_mae/len(loader)))
。。。让我想到:

# Epoch 10/10
# 1152/1152 [==============================] - 33s 29ms/sample - loss: 0.0068 - mean_absolute_error: 0.0628
# EPOCH  10: loss 0.04220 - MAE 0.16762
你可以注意到损失和MAE是完全不同的(Pytorch的要高得多)。如果我使用Pytorch的模型来预测这些值,它们都会作为常量返回


我做错了什么?

哦,我相信我取得了相当大的进步。Keras和Pytorch之间表示
y
的方式似乎有所不同。在Keras中,我们应该将其作为表示未来一个timestep的单个值传递(或者,至少对于我试图解决的问题)。但在Pytorch中,
y
必须
X
向未来移动一个时间步。是这样的:

time_series = [0, 1, 2, 3, 4, 5]

X = [0, 1, 2, 3, 4]
# Keras:
y = [5]
# Pytorch:
y = [1, 2, 3, 4, 5]
这样,Pytorch在计算损耗时会比较时间片中的所有值。我相信Keras会重新安排后台的数据,以符合这种方法,因为这样输入变量时代码就会工作。但在Pytorch中,我只是根据一个值(我试图预测的值)估算损失,而不是整个系列,因此我认为它无法正确地捕获时间依赖性

考虑到这一点,我得出以下结论:

EPOCH 100: loss 0.00551 - MAE 0.058435
而且,最重要的是,在一个单独的数据集中比较真实值和预测值使我能够

模型清晰地捕捉到了这些模式

万岁