Tensorflow tf.contrib.seq2seq.sequence_loss中的参数

Tensorflow tf.contrib.seq2seq.sequence_loss中的参数,tensorflow,Tensorflow,我试图在RNN模型中使用tf.contrib.seq2seq.sequence_损耗函数来计算损耗。 根据API文档,此函数至少需要三个参数:logits、targets和weights sequence_loss( logits, targets, weights, average_across_timesteps=True, average_across_batch=True, softmax_loss_function=None, n

我试图在RNN模型中使用tf.contrib.seq2seq.sequence_损耗函数来计算损耗。 根据API文档,此函数至少需要三个参数:logits、targets和weights

sequence_loss(
    logits,
    targets,
    weights,
    average_across_timesteps=True,
    average_across_batch=True,
    softmax_loss_function=None,
    name=None
)

logits: A Tensor of shape [batch_size, sequence_length, num_decoder_symbols] and dtype float. The logits correspond to the prediction across all classes at each timestep.
targets: A Tensor of shape [batch_size, sequence_length] and dtype int. The target represents the true class at each timestep. 
weights: A Tensor of shape [batch_size, sequence_length] and dtype float. weights constitutes the weighting of each prediction in the sequence. When using weights as masking, set all valid timesteps to 1 and all padded timesteps to 0, e.g. a mask returned by tf.sequence_mask.
average_across_timesteps: If set, sum the cost across the sequence dimension and divide the cost by the total label weight across timesteps.
average_across_batch: If set, sum the cost across the batch dimension and divide the returned cost by the batch size.
softmax_loss_function: Function (labels, logits) -> loss-batch to be used instead of the standard softmax (the default if this is None). Note that to avoid confusion, it is required for the function to accept named arguments.
name: Optional name for this operation, defaults to "sequence_loss".
我的理解是logits是我使用Xw+b后的预测,所以它的形状应该是[批次大小、序列长度、输出大小]。那么目标应该是我的标签,但中所需的形状是[批次大小,序列长度]。我想我的标签应该和罗吉特的形状一样


那么,如何将三维标签转换为二维标签呢?提前感谢

您的标签应该是二维形状矩阵[批次大小、序列长度],您的登录应该是三维形状张量[批次大小、序列长度、输出大小]。因此,如果标签变量已处于形状[batch\u size,sequence\u length],则不需要扩展标签的维度

如果您确实想要扩展维度,您可以这样做
expended\u variable=tf。expand\u dims(您想要扩展的变量\u,axis=-1)
您的
目标(标签)不需要与
logits的形状相同

如果我们暂时忽略批次大小(与您的问题无关),此API将通过每个单词的加权和损失来计算两个序列之间的损失。假设vocab_大小为5,我们得到一个目标单词3,
logits
使用向量[0.2,0.1,0.15,0.4,0.15]对此目标进行预测。
要计算目标和预测之间的损失,目标不需要与预测形状相同[0,0,0,1,0]。tensorflow将在内部执行此操作。
您可以参考两个api之间的区别:
softmax\u cross\u entropy\u with_logits
sparse\u softmax\u cross\u entropy\u with_logits