Pytorch 如何将LSTM中的二元分类器输出/损失张量转换为多类_Pytorch

Pytorch 如何将LSTM中的二元分类器输出/损失张量转换为多类

pytorch

Pytorch 如何将LSTM中的二元分类器输出/损失张量转换为多类,pytorch,Pytorch,我正在尝试训练一个LSTM模型，根据Pytorch中使用单词级关联的歌词来预测歌曲的创作年份。有51个潜在的类/标签（1965-2015）——然而，我正在使用一个模板，该模板使用二进制分类器来解决不同的问题。我一直在试图找出如何改变模型来预测多个类别（1965年、1966年等）我知道您应该提供大小为C=num_类的张量作为输出。然而，我通过使output_size=51实现了这一点，但我得到了一个错误，这使我认为与定义或操作我正在定义的Criteria类有关的一些事情我做得不正确模型如下：

我正在尝试训练一个LSTM模型，根据Pytorch中使用单词级关联的歌词来预测歌曲的创作年份。有51个潜在的类/标签（1965-2015）——然而，我正在使用一个模板，该模板使用二进制分类器来解决不同的问题。我一直在试图找出如何改变模型来预测多个类别（1965年、1966年等）

我知道您应该提供大小为C=num_类的张量作为输出。然而，我通过使output_size=51实现了这一点，但我得到了一个错误，这使我认为与定义或操作我正在定义的Criteria类有关的一些事情我做得不正确

模型如下：

class LyricLSTM(nn.Module):
    def __init__(self, vocab_size, output_size, embedding_dim, hidden_dim, n_layers, drop_prob=0.5):
        super().__init__()

        self.output_size = output_size
        self.n_layers = n_layers
        self.hidden_dim = hidden_dim

        # embedding and LSTM layers
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        self.lstm = nn.LSTM(embedding_dim, hidden_dim, n_layers,
                            dropout=drop_prob, batch_first=True)

        # dropout layer
        self.dropout = nn.Dropout(0.3)

        # linear and sigmoid layers
        self.fc = nn.Linear(hidden_dim, output_size)
        self.sig = nn.Sigmoid()
        #self.softmax = nn.LogSoftmax(dim=1)

    def forward(self, x, hidden):
        batch_size = x.size(0)

        # embeddings and lstm_out
        embeds = self.embedding(x)
        lstm_out, hidden = self.lstm(embeds, hidden)

        # stack up lstm outputs
        lstm_out = lstm_out.contiguous().view(-1, self.hidden_dim)

        # dropout and fully-connected layer
        out = self.dropout(lstm_out)
        out = self.fc(out)
        # sigmoid function
        sig_out = self.sig(out)
        #sig_out = self.softmax(out)

        # reshape to be batch_size first
        sig_out = sig_out.view(batch_size, -1)
        sig_out = sig_out[:, -1]  # get last batch of labels

        # return last sigmoid output and hidden state
        return sig_out, hidden

    def init_hidden(self, batch_size):
        ''' Initializes hidden state '''
        # Create two new tensors with sizes n_layers x batch_size x hidden_dim,
        # initialized to zero, for hidden state and cell state of LSTM
        weight = next(self.parameters()).data

        hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_(),
                  weight.new(self.n_layers, batch_size, self.hidden_dim).zero_())


        return hidden

以及培训循环：

n_epochs = 10
batch_size = 16 #100  # 11 batches of size 337 so iters = 11 (11 * 337 = 3707)

# Split into training, validation, testing - train= 80% | valid = 10% | test = 10%
split_frac = 0.8
train_x = encoded_lyrics[0:int(split_frac * len(encoded_lyrics))] # 3707 training samples
train_y = encoded_years[0:int(split_frac * len(encoded_lyrics))]  # 3707 training samples

# Dataloaders and batching
# create Tensor datasets
train_data = TensorDataset(torch.from_numpy(train_x), torch.from_numpy(train_y))

# make sure to SHUFFLE your data
train_loader = DataLoader(train_data, shuffle=True, batch_size=batch_size, drop_last=True)

output_size = 51
embedding_dim = 400
hidden_dim = 128 #256
n_layers = 2
lstmc = lstm.LyricLSTM(vocab_len, output_size, embedding_dim, hidden_dim, n_layers)

# Loss function + accuracy reporting
current_loss = 0
losses = np.zeros(n_epochs)  # For plotting
accuracy = np.zeros(n_epochs)

lr = 0.001
criterion = nn.CrossEntropyLoss() #nn.BCELoss()
optimizer = torch.optim.Adam(lstmc.parameters(), lr=lr)
counter = 0
print_every = 1
clip = 5  # gradient clipping

# Main training loop
start = time.time()
lstmc.train()
for epoch in range(0, n_epochs):
    # initialize hidden state
    h = lstmc.init_hidden(batch_size)

    # batch loop
    for inputs, labels in train_loader:
        counter += 1

        # Creating new variables for the hidden state, otherwise
        # we'd backprop through the entire training history
        h = tuple([each.data for each in h])

        # zero accumulated gradients
        lstmc.zero_grad()

        # get the output from the model
        inputs = inputs.type(torch.LongTensor)
        output, h = lstmc(inputs, h)

        # calculate the loss and perform backprop
        loss = criterion(output.squeeze(), labels.float())
        loss.backward()

        nn.utils.clip_grad_norm_(lstmc.parameters(), clip)
        optimizer.step()

我在运行代码时遇到此错误

File "main.py", line 182, in main
    loss = criterion(output.squeeze(), labels.float())
/venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
/venv/lib/python3.7/site-packages/torch/nn/modules/loss.py", line 904, in forward
    ignore_index=self.ignore_index, reduction=self.reduction)
/venv/lib/python3.7/site-packages/torch/nn/functional.py", line 1970, in cross_entropy
    return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
/venv/lib/python3.7/site-packages/torch/nn/functional.py", line 1295, in log_softmax
    ret = input.log_softmax(dim)
RuntimeError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

这是我得到的输出和标签（对于批量16）：

输出：张量（[0.4962,0.5025,0.4963,0.4936,0.5058,0.4872,0.4995,0.4852,0.4840，
0.4791, 0.4984, 0.5034, 0.4796, 0.4826, 0.4811, 0.4859],
grad_fn=）
标签：张量（[1994,1965,1981,1986,1973,1981,1975,1968,1981,1968,。，
1989., 1981., 1988., 1991., 1983., 1982.])

我希望输出是长度为51的张量，其中每个元素都包含该年是正确答案的可能性（例如：输出[0]=第一年/1965年，输出[1]=1966年，等等）。

您必须将输入作为（N，C）和目标作为（N）提供给。我怀疑在您的模型的

foward（）

方法中，以下代码段是错误的

sig_out = self.sig(out) # shape: batch_size*seq_len x output_size

# reshape to be batch_size first 
sig_out = sig_out.view(batch_size, -1) # shape: batch_size x seq_len*output_size
sig_out = sig_out[:, -1] # shape: batch_size

你想用你的最后一句话做什么？另外，您希望如何处理LSTM输出的

seq_len

维度

试着想想你在这里做什么

尽管我认为

输出

张量的形状是错误的，但请确保

输出

是形状的2d张量（N，C），而

标签

是形状的1d张量（N）

另外，我在代码中看到了一些问题

通常，将零梯度应用于优化器（而不是模型）是一种很好的做法。不要做以下事情

相反，请执行：

optimizer.zero\u grad（）

你不应该在51个类中使用Sigmoid。而是使用完全连接的层，然后是softmax层。在
```
fc
```
和
```
softmax
```
层之前，不需要
```
view（）
```
操作

所以，不要使用下面的代码段

# stack up lstm outputs
lstm_out = lstm_out.contiguous().view(-1, self.hidden_dim) # DON'T DO THIS

谢谢你的回复-我对Pytorch有点陌生。看看你说可能出错的forward（）函数代码，我查看了我使用的形状，这就是我得到的：

sig_out=self.sig（out）#shape=torch.Size（[16,1000,51]）sig_out=sig_out.view（batch_Size，-1）#shape=torch.Size（[1651000]）sig_out=sig_out[：，-1]#shape=torch.Size（[16]）

这看起来不对。我不确定seq_len的功能在哪里，但它是所有文本样本张量应该具有的标准大小。如果我理解正确的话，我的输出应该是（16，51）大小，因为它的批次大小是16和51类。我应该改成那样的尺寸吗？谢谢这似乎已经解决了问题，谢谢！！

# zero accumulated gradients
lstmc.zero_grad()

self.fc = nn.Linear(hidden_dim, output_size)
self.softmax = nn.LogSoftmax(dim=-1) # use -1 to apply in the last axis

...

out = self.dropout(lstm_out)
out = self.softmax(self.fc(out))

# stack up lstm outputs
lstm_out = lstm_out.contiguous().view(-1, self.hidden_dim) # DON'T DO THIS