Python LSTM模型的问题

Python LSTM模型的问题,python,python-3.x,lstm,pytorch,rnn,Python,Python 3.x,Lstm,Pytorch,Rnn,我试图在PyTorch中实现LSTM模型,但遇到了这样一个问题:损失并没有减少。 我的任务是:我有不同功能的会话。会话长度是固定的,等于20。我的目标是预测上一节课是否被跳过。 我尝试缩放输入功能,我尝试将目标传递到功能中(可能提供的功能完全没有信息,我认为这会导致过度拟合,损失应该接近0),但我的损失减少总是这样: 我还定义了get\u batches函数。是的,我知道这台发电机上一批的问题 def get_batches(X, y, batch_size): '''Create a gen

我试图在PyTorch中实现LSTM模型,但遇到了这样一个问题:损失并没有减少。 我的任务是:我有不同功能的会话。会话长度是固定的,等于20。我的目标是预测上一节课是否被跳过。 我尝试缩放输入功能,我尝试将
目标
传递到功能中(可能提供的功能完全没有信息,我认为这会导致过度拟合,损失应该接近0),但我的损失减少总是这样:

我还定义了
get\u batches
函数。是的,我知道这台发电机上一批的问题

def get_batches(X, y, batch_size):
'''Create a generator that returns batches of size
   batch_size x seq_length from arr.
'''
assert X.shape[0] == y.shape[0]
assert X.shape[1] == y.shape[1]
assert len(X.shape) == 3
assert len(y.shape) == 2

seq_len = X.shape[1]
n_batches = X.shape[0]//seq_len

for batch_number in range(n_batches):
    #print(batch_number*batch_size, )
    batch_x = X[batch_number*batch_size:(batch_number+1)*batch_size, :, :]
    batch_y = y[batch_number*batch_size:(batch_number+1)*batch_size, :]
    if batch_x.shape[0] == batch_size:
        yield batch_x, batch_y
    else:
        print('batch_x shape: {}'.format(batch_x.shape))
        break
这是我的RNN

class BaseRNN(nn.Module):

def __init__(self, n_features, hidden_size, n_layers, drop_p=0.3, lr=0.001, last_items=10):
    super(BaseRNN, self).__init__()
    # constants
    self.n_features = n_features
    self.hidden_size = hidden_size
    self.n_layers = n_layers 
    self.drop_p = drop_p
    self.lr = lr
    self.last_items = last_items

    # layers
    self.lstm = nn.LSTM(
        n_features, n_hidden, n_layers, 
        dropout=drop_p, batch_first=True
    )
    self.dropout = nn.Dropout(self.drop_p)
    self.linear_layer = nn.Linear(self.hidden_size, 1)
    self.sigm = nn.Sigmoid()

def forward(self, x, hidden):
    out, hidden = self.lstm(x, hidden)
    batch_size = x.shape[0]
    out = self.dropout(out)
    out = out.contiguous().view(-1, self.hidden_size)
    out = self.linear_layer(out)
    out = self.sigm(out)
    # use only last elements
    out = out.view(batch_size, -1)
    out = out[:, -1] 
    return out, hidden

def init_hidden(self, batch_size):
    #initialize with zeros
    weight = next(self.parameters()).data
    hidden = (weight.new(self.n_layers, batch_size, self.hidden_size).zero_(),
                  weight.new(self.n_layers, batch_size, self.hidden_size).zero_())

    return hidden
以下是我的列车功能:

def train(net, X, y,
      n_epochs=10, batch_size=10, clip=5):
'''
pass
'''
n_features = X.shape[2]
seq_len = X.shape[1]
net.train()
opt = torch.optim.Adam(net.parameters(), lr=net.lr)
criterion = nn.BCELoss()
counter = 0
losses = []
for e in range(n_epochs):
    h = net.init_hidden(batch_size)
    for x, y in get_batches(X=X, y=y, batch_size=batch_size):
        counter += 1
        h = net.init_hidden(batch_size)
        inputs, targets = torch.from_numpy(x).float(), torch.from_numpy(y.astype(int))
        targets = targets[:,-net.last_items:].float().view(net.last_items*batch_size)
        h = tuple([each.data for each in h])
        net.zero_grad()
        output, h = net(inputs, h)
        loss = criterion(output.view(net.last_items*batch_size), targets)
        losses.append(loss.item())
        loss.backward()
        nn.utils.clip_grad_norm_(net.parameters(), clip)
        opt.step()
return losses
跑步训练:

n_hidden = 100
n_layers = 1
n_features = X.shape[2]
net = BaseRNN(n_features, n_hidden, n_layers, 
              lr=0.01, drop_p=0.1, last_items=1)

losses = train(net, X, y, n_epochs=5, batch_size=1000, lr=0.001, clip=5)
plt.plot(losses)
在所有这些步骤之后,我得到了问题顶部的情节。我想我在某个地方得到了一个巨大的错误,因为我在特性中加入了目标变量,但仍然没有减少损失。 我错在哪里

PS.如何生成样本数据?我将使用实
y
数据并添加一些噪声

Y = np.array([[0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1],
       [1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1],
       [0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0],
       [0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1]])
print(Y.shape)
#(10, 20)

# add 5 features with random noise 
random_noise = np.random.randn(10*20*5).reshape(10,20,5)
X = np.concatenate((Y.reshape(10,20,1), random_noise), axis=2)
print(X.shape)
#(10, 20, 6)

我的失败,忘记缩放输入功能,现在可以正常工作了

检查神经网络代码是否正确的一个技巧是首先在整个数据集的一小部分上运行迭代。如果每件事都写得很好,那么它在几个时代内就应该过了头。然后,您可以逐渐增加数据集的大小,看看您的网络是否能够真正聚合。这是一种非常方便的方法,可以在出现问题时对神经网络代码进行故障排除。当然,我尝试在小数据集上进行训练,但我的神经网络仍然不能正常工作(即使在训练集中添加目标变量,也不能减少损失)。这是我的主要问题,在上面的代码中我错在哪里如果这是一个二进制分类问题,为什么y的形状是(82770,20)?应该是(82770,)。你能分享样本X和y吗?@ErnestSKirubakaran 20是序列长度,我也尝试了多对多架构。我通过BaseRNN.last_items参数控制它,将其设置为1将导致多对一架构。尽管如此,我补充道,如何生成样本数据一种方法是降低学习率并检查性能改进,因为这可能是不合适的,我个人认为0.01的学习率太高了。尝试0.001或0.0003,查看模型是否有所改进。
Y = np.array([[0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1],
       [1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1],
       [0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0],
       [0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1]])
print(Y.shape)
#(10, 20)

# add 5 features with random noise 
random_noise = np.random.randn(10*20*5).reshape(10,20,5)
X = np.concatenate((Y.reshape(10,20,1), random_noise), axis=2)
print(X.shape)
#(10, 20, 6)