如何正确使用pytorch中GRU的CTC损耗？_Pytorch_Ctc

如何正确使用pytorch中GRU的CTC损耗？

pytorch

如何正确使用pytorch中GRU的CTC损耗？,pytorch,ctc,Pytorch,Ctc,我正在尝试创建ASR，我还在学习，因此，我只是尝试使用一个简单的GRU： MySpeechRecognition( (gru): GRU(128, 128, num_layers=5, batch_first=True, dropout=0.5) (dropout): Dropout(p=0.3, inplace=False) (fc1): Linear(in_features=128, out_features=512, bias=True) (fc2): Linear(in_

我正在尝试创建ASR，我还在学习，因此，我只是尝试使用一个简单的GRU：

MySpeechRecognition(
  (gru): GRU(128, 128, num_layers=5, batch_first=True, dropout=0.5)
  (dropout): Dropout(p=0.3, inplace=False)
  (fc1): Linear(in_features=128, out_features=512, bias=True)
  (fc2): Linear(in_features=512, out_features=28, bias=True)
)

将每个输出分类为一个可能的字母+空格+空白

然后我使用CTC损失函数和Adam优化器：

lr = 5e-4
criterion = nn.CTCLoss(blank=28, zero_infinity=False)
optimizer = torch.optim.Adam(net.parameters(), lr=lr)

在我的培训循环中，我只展示了有问题的领域：

output, h = mynet(specs, h)
print(output.size())
output = F.log_softmax(output, dim=2)
output = output.transpose(0,1)
# calculate the loss and perform backprop
loss = criterion(output, labels, input_lengths, label_lengths)
loss.backward()

我得到这个错误：

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-133-5e47e7b03a46> in <module>
     42         output = output.transpose(0,1)
     43         # calculate the loss and perform backprop
---> 44         loss = criterion(output, labels, input_lengths, label_lengths)
     45         loss.backward()
     46         # `clip_grad_norm` helps prevent the exploding gradient problem in RNNs / LSTMs.

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/loss.py in forward(self, log_probs, targets, input_lengths, target_lengths)
   1309     def forward(self, log_probs, targets, input_lengths, target_lengths):
   1310         return F.ctc_loss(log_probs, targets, input_lengths, target_lengths, self.blank, self.reduction,
-> 1311                           self.zero_infinity)
   1312 
   1313 # TODO: L1HingeEmbeddingCriterion

/opt/conda/lib/python3.7/site-packages/torch/nn/functional.py in ctc_loss(log_probs, targets, input_lengths, target_lengths, blank, reduction, zero_infinity)
   2050     """
   2051     return torch.ctc_loss(log_probs, targets, input_lengths, target_lengths, blank, _Reduction.get_enum(reduction),
-> 2052                           zero_infinity)
   2053 
   2054 

RuntimeError: blank must be in label range

谢谢。

您的模型预测了28个类，因此模型的输出大小为[batch_size，seq_len，28]或[seq_len，batch_size，28]，用于计算CTC损失的日志概率。在NN.CTCOLSES中，设置空白＝28，这意味着空白标签是具有索引28的类。为了获得空白标签的日志概率，您将其索引为输出[：，，28 ]，但这不起作用，因为该索引超出范围，因为有效索引是0到27。输出中的最后一个类位于索引27处，因此它应该为空=27：

标准=nn.CTCLossblank=27，零无穷大=False

您的模型预测了28个类，因此模型的输出有[batch_size，seq_len，28]或[seq_len，batch_size，28]作为CTC损失的日志概率。在NN.CTCOLSES中，设置空白＝28，这意味着空白标签是具有索引28的类。为了获得空白标签的日志概率，您将其索引为输出[：，，28 ]，但这不起作用，因为该索引超出范围，因为有效索引是0到27。输出中的最后一个类位于索引27处，因此它应该为空=27：

标准=nn.CTCLossblank=27，零无穷大=False

非常感谢你！！我对此感到非常沮丧。它奏效了，现在我更明白了：非常感谢！！我对此感到非常沮丧。它起作用了，现在我对它有了更好的理解：

labels.float()