Deep learning Pytorch中自定义LSTM模型的输出尺寸

Deep learning Pytorch中自定义LSTM模型的输出尺寸,deep-learning,pytorch,lstm,Deep Learning,Pytorch,Lstm,我在PyTorch中有一个自定义的LSTM模型,如下所示: hidden_size = 32 num_layers = 1 num_classes = 2 class customModel(nn.Module): def __init__(self, input_size, hidden_size, num_layers, num_classes): super(customModel, self).__init__() self.hidden_s

我在PyTorch中有一个自定义的LSTM模型,如下所示:

hidden_size = 32  
num_layers = 1
num_classes = 2

class customModel(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, num_classes):
        super(customModel, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.bilstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True, bidirectional=True)
        self.fcl = nn.Linear(hidden_size*2, num_classes)

    def forward(self, x):
        # Set initial hidden and cell states 
        h0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device)
        c0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device)

        # Forward propagate LSTM
        out, hidden = self.bilstm(x, (h0, c0)) 
        fw_bilstm = out[-1, :, :self.hidden_size]
        bk_bilstm = out[0, :, :self.hidden_size]
        concat_fw_bw = torch.cat((fw_bilstm, bk_bilstm), dim = 1)
        fc = self.fcl(concat_fw_bw)
        x = F.softmax(F.relu(fc))
        return x
我可以将类型为
torch.Tensor
的输入传递到此模型。输入长度为
67349
,每个长度为
300
尺寸向量

在模型初始化和预测之后,我得到一个长度为
1
的输出向量

model = customModel(300, hidden_size, num_layers, num_classes)
output = model(input_torch)
当我打印出来时,输出显示
张量([[0.5020,0.4980]],grad_fn=)

为什么输出长度为
1
?似乎我不应该在我的模型中使用
barch_first=True
,但是改变它需要其他输入维度的改变,我不知道该怎么做

请建议如何获得长度为
67349
(输入长度)而不是
1
的矢量输出

解释

我看到@gorjan建议对网络的
forward
方法进行一些修改。因此,我想进一步澄清我试图构建的内容

  • 将嵌入馈送到BiLSTM(完成)
  • 获取每个方向上最后一步的隐藏状态并连接
  • 将串联输出(从步骤2)馈送至完全连接的层 与雷卢斯
  • 将步骤3的输出馈送至softmax层

  • 我已经对模块中的
    def forward(…)
    方法进行了注释,请查看:

    def forward(self, x):
        # Set initial hidden and cell states 
        h0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device)
        c0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device)
    
        # Forward propagate LSTM
        out, hidden = self.bilstm(x, (h0, c0)) # out is of size [batch_size, sequence_length, hidden_size * num_directions]
        fw_bilstm = out[-1, :, :self.hidden_size] # This is wrong: You are taking only last batch element
        bk_bilstm = out[0, :, :self.hidden_size] # This is wrong: You are taking only the first batch element
        concat_fw_bw = torch.cat((fw_bilstm, bk_bilstm), dim = 1) # This is not needed: If you want to obtain the hidden states for all elements in the sequence
        fc = self.fcl(concat_fw_bw) # Because of the above mentioned issues, this is wrong as well.
        x = F.softmax(F.relu(fc)) # This is wrong: Never stack activation on top of activation.
        return x
    
    现在,根据你的提问:

    请建议如何获得长度为67349(输入长度)而不是1的矢量输出

    我假设您希望获取批处理中每个元素的隐藏状态。以下是你应该如何组织你的前传:

    def forward(self, x):
        # Set initial hidden and cell states 
        h0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device)
        c0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device)
    
        # Forward propagate LSTM
        out, hidden = self.bilstm(x, (h0, c0)) # out is of size [batch_size, sequence_length, hidden_size * num_directions]
        fc = self.fcl(out) # fc is of size [batch_size, sequence_length, num_classes]
        x = F.softmax(fc) # Just softmax so that you can get the probabilities for each of your classes
        return x
    
    如果我们测试更新后的模型,结果如下:

    # Assuming 32 elements in the batch, each elements has 177 elements in the sequence, and each sequence element has size 300
    inputs = torch.rand(32, 177, 300)
    # Obtaining the outputs from the model
    outputs = model(inputs)
    # The size is as expected: torch.Size([32, 177, 2])
    print(outputs.shape)
    
    还有一件事要记住,你说:

    输入长度为67349,每个长度为300维向量


    这是一个非常长的序列。你的模特表现会很差,我想你的训练会永远持续下去。但是,这是一个完全不同的问题,应该在单独的线程中讨论。

    请参阅我的解释。您可以发布预期的输入大小和预期的输出大小,以及对大小的说明吗?