Deep learning Pytorch中自定义LSTM模型的输出尺寸_Deep Learning_Pytorch_Lstm

Deep learning Pytorch中自定义LSTM模型的输出尺寸

deep-learning pytorch

Deep learning Pytorch中自定义LSTM模型的输出尺寸,deep-learning,pytorch,lstm,Deep Learning,Pytorch,Lstm,我在PyTorch中有一个自定义的LSTM模型，如下所示： hidden_size = 32 num_layers = 1 num_classes = 2 class customModel(nn.Module): def __init__(self, input_size, hidden_size, num_layers, num_classes): super(customModel, self).__init__() self.hidden_s

我在PyTorch中有一个自定义的LSTM模型，如下所示：

hidden_size = 32  
num_layers = 1
num_classes = 2

class customModel(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, num_classes):
        super(customModel, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.bilstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True, bidirectional=True)
        self.fcl = nn.Linear(hidden_size*2, num_classes)

    def forward(self, x):
        # Set initial hidden and cell states 
        h0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device)
        c0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device)

        # Forward propagate LSTM
        out, hidden = self.bilstm(x, (h0, c0)) 
        fw_bilstm = out[-1, :, :self.hidden_size]
        bk_bilstm = out[0, :, :self.hidden_size]
        concat_fw_bw = torch.cat((fw_bilstm, bk_bilstm), dim = 1)
        fc = self.fcl(concat_fw_bw)
        x = F.softmax(F.relu(fc))
        return x

我可以将类型为

torch.Tensor

的输入传递到此模型。输入长度为

，每个长度为

尺寸向量

在模型初始化和预测之后，我得到一个长度为

的输出向量

model = customModel(300, hidden_size, num_layers, num_classes)
output = model(input_torch)

当我打印出来时，输出显示

张量（[[0.5020，0.4980]]，grad_fn=）

为什么输出长度为

？似乎我不应该在我的模型中使用

barch_first=True

，但是改变它需要其他输入维度的改变，我不知道该怎么做

请建议如何获得长度为

（输入长度）而不是

的矢量输出

解释

我看到@gorjan建议对网络的

forward

方法进行一些修改。因此，我想进一步澄清我试图构建的内容

将嵌入馈送到BiLSTM（完成）

获取每个方向上最后一步的隐藏状态并连接

将串联输出（从步骤2）馈送至完全连接的层与雷卢斯

将步骤3的输出馈送至softmax层

我已经对模块中的

def forward（…）

方法进行了注释，请查看：

def forward(self, x):
    # Set initial hidden and cell states 
    h0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device)
    c0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device)

    # Forward propagate LSTM
    out, hidden = self.bilstm(x, (h0, c0)) # out is of size [batch_size, sequence_length, hidden_size * num_directions]
    fw_bilstm = out[-1, :, :self.hidden_size] # This is wrong: You are taking only last batch element
    bk_bilstm = out[0, :, :self.hidden_size] # This is wrong: You are taking only the first batch element
    concat_fw_bw = torch.cat((fw_bilstm, bk_bilstm), dim = 1) # This is not needed: If you want to obtain the hidden states for all elements in the sequence
    fc = self.fcl(concat_fw_bw) # Because of the above mentioned issues, this is wrong as well.
    x = F.softmax(F.relu(fc)) # This is wrong: Never stack activation on top of activation.
    return x

现在，根据你的提问：

请建议如何获得长度为67349（输入长度）而不是1的矢量输出

我假设您希望获取批处理中每个元素的隐藏状态。以下是你应该如何组织你的前传：

def forward(self, x):
    # Set initial hidden and cell states 
    h0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device)
    c0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device)

    # Forward propagate LSTM
    out, hidden = self.bilstm(x, (h0, c0)) # out is of size [batch_size, sequence_length, hidden_size * num_directions]
    fc = self.fcl(out) # fc is of size [batch_size, sequence_length, num_classes]
    x = F.softmax(fc) # Just softmax so that you can get the probabilities for each of your classes
    return x

如果我们测试更新后的模型，结果如下：

# Assuming 32 elements in the batch, each elements has 177 elements in the sequence, and each sequence element has size 300
inputs = torch.rand(32, 177, 300)
# Obtaining the outputs from the model
outputs = model(inputs)
# The size is as expected: torch.Size([32, 177, 2])
print(outputs.shape)

还有一件事要记住，你说：

输入长度为67349，每个长度为300维向量

这是一个非常长的序列。你的模特表现会很差，我想你的训练会永远持续下去。但是，这是一个完全不同的问题，应该在单独的线程中讨论。

请参阅我的解释。您可以发布预期的输入大小和预期的输出大小，以及对大小的说明吗？