用于计算NLP问题中损失的稀疏交叉熵损失。皮托克_Nlp_Pytorch_Huggingface Transformers

用于计算NLP问题中损失的稀疏交叉熵损失。皮托克

nlp pytorch

用于计算NLP问题中损失的稀疏交叉熵损失。皮托克,nlp,pytorch,huggingface-transformers,Nlp,Pytorch,Huggingface Transformers,我的输入张量如下所示： torch.Size([8, 23]) // where, // 8 -> batch size // 23 -> words in each of them torch.Size([8, 23, 103]) // where, // 8 -> batch size // 23 -> words predictions // 103 -> vocab size. 我的输出张量如下所示： torch.Size([8, 23]) //

我的输入张量如下所示：

torch.Size([8, 23])

// where,
// 8 -> batch size
// 23 -> words in each of them

torch.Size([8, 23, 103])

// where,
// 8 -> batch size
// 23 -> words predictions
// 103 -> vocab size.

我的输出张量如下所示：

torch.Size([8, 23])

// where,
// 8 -> batch size
// 23 -> words in each of them

torch.Size([8, 23, 103])

// where,
// 8 -> batch size
// 23 -> words predictions
// 103 -> vocab size.

我想计算这个任务的稀疏交叉熵损失，但我不能，因为PyTorch只计算单个元素的损失。如何编写代码使其工作？谢谢你的帮助。

你能解释一下你对结果的期望吗？你可能在寻找吗？我正在训练一个编码器-解码器网络，这样输出中的每个位置都有103个（vocab大小）位置可供选择。但既然在Pytorch中，我只能计算一个单词的损失，那么我该如何计算总损失呢。我正在使用变压器网络。你能解释一下你期望的结果吗？你可能在寻找吗？我正在训练一个编码器-解码器网络，这样输出中的每个位置都有103个（vocab大小）位置可供选择。但既然在Pytorch中，我只能计算一个单词的损失，那么我该如何计算总损失呢。我正在使用变压器网络。