Python 使用递归神经网络进行时间序列预测：如何准备输入数据集序列？_Python_Neural Network_Pytorch_Data Science_Recurrent Neural Network

Python 使用递归神经网络进行时间序列预测：如何准备输入数据集序列？

python neural-network pytorch

Python 使用递归神经网络进行时间序列预测：如何准备输入数据集序列？,python,neural-network,pytorch,data-science,recurrent-neural-network,Python,Neural Network,Pytorch,Data Science,Recurrent Neural Network,我有7个连续输入变量，我想估计1个连续变量（y=f（x_1，…x_8））。数据集大约有26000个度量值，但它将在短时间内增长到数百万个度量值。我有每个度量的时间。我成功地用PyTorch构建了一个具有线性层和ReLU的神经网络来实现这一点，但我想考虑一下过去的100种评估方法。我考虑构建RNN，特别是GRU或LSTM，因为我发现它们的问题比Elman RNN小。我构建了有史以来最简单的类作为起点： class RNN(nn.Module): def __init__(self

我有7个连续输入变量，我想估计1个连续变量（

y=f（x_1，…x_8）

）。
数据集大约有26000个度量值，但它将在短时间内增长到数百万个度量值。
我有每个度量的时间。
我成功地用PyTorch构建了一个具有线性层和ReLU的神经网络来实现这一点，但我想考虑一下过去的100种评估方法。
我考虑构建RNN，特别是GRU或LSTM，因为我发现它们的问题比Elman RNN小。
我构建了有史以来最简单的类作为起点：

class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, num_layers):
        super(RNN, self).__init__()  # compulsory for pytorch
        # Parameters
        self.nHiddenFeatures = hidden_size
        self.nLayers = num_layers
        self.nHiddenNeurons = hidden_size
        # Layers
        self.gru = nn.GRU(input_size=input_size, hidden_size=hidden_size, num_layers=num_layers)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, X, hidden):
          # initial hidden input sequence
        Y, hidden_ = self.gru(X, hidden)
        Y = Y.contiguous().view(-1, self.hidden_dim)
        Y = self.fc(Y)
        return F.relu(Y), hidden_  # output variable can't be negative

# Parameters:
input_size = 7
output_size = 1
hidden_size = 4
num_layers = 2
seq_length = 100

# model
rnn = RNN(input_size=input_size, hidden_size=hidden_size, output_size=output_size, num_layers=num_layers).to(device)
optimizer = torch.optim.Adam(rnn.parameters(), lr=learningRate)
loss_func = nn.MSELoss()

但现在我正在努力找出如何训练它：特别是为X_train数据集准备正确的形状。我目前的情况是

X_train.shape=torch.Size（[26000,7]）

我记得26000是时间戳的数量，7是每个时间戳测量的变量数量

我已经阅读了PyTorch RNN的文档，我知道输入张量的形状应该是

（seq\u len，batch，input\u size）

：
我的问题是如何从当前张量创建一个输入张量，其中序列运行所有时间戳，考虑到之前的100个时间戳：我正确地尝试在序列重叠的地方构建这个张量，因此它的形状大约是

X_train.shape=torch.Size（[10026000,7]）

还是应该像我在互联网上的许多示例中看到的那样创建单独的序列，以

X_train.shape=torch.Size（[100260,7]）结尾？在我看来，这个解决方案似乎只考虑了前一次260个时间戳，而不是剩余的时间戳
最终目标是进行定期培训，如：
hidden = torch.zeros(..., ..., ...)  # initialization
for epoch in range(nEpochs):
    optimizer.zero_grad()  # set gradient to zero in each step
    Y_estTrain= rnn(X_train, hidden)  # prediction of all samples
    loss = loss_func(Y_estTrain, Y_trueTrain)  # difference between predicted and expected
    loss.backward(loss)
    optimizer.step()  # update weights of the NN
    if epoch % printLossOnceEvery == 0:  # check loss decrease during training
        print(f"Epoch = {epoch}, MSE = {loss.item():0.1f}")

感谢您阅读整个问题：有点长，但我更愿意尽可能多地添加信息