Tensorflow 解码器不接受双向编码器的输出

Tensorflow 解码器不接受双向编码器的输出,tensorflow,lstm,bidirectional,seq2seq,attention-model,Tensorflow,Lstm,Bidirectional,Seq2seq,Attention Model,我正在尝试用Tensorflow实现一个编码器-解码器模型。编码器是一个双向单元 def encoder(hidden_units, encoder_embedding, sequence_length): forward_cell = tf.contrib.rnn.LSTMCell(hidden_units) backward_cell = tf.contrib.rnn.LSTMCell(hidden_units) bi_outputs, final_states =

我正在尝试用Tensorflow实现一个编码器-解码器模型。编码器是一个双向单元

def encoder(hidden_units, encoder_embedding, sequence_length):
    forward_cell = tf.contrib.rnn.LSTMCell(hidden_units)
    backward_cell = tf.contrib.rnn.LSTMCell(hidden_units)

    bi_outputs, final_states = tf.nn.bidirectional_dynamic_rnn(forward_cell, backward_cell, encoder_embedding, sequence_length= sequence_length, dtype=tf.float32)
    encoder_outputs = tf.concat(bi_outputs, 2)
    forward_cell_state, backward_cell_state =final_states
    cell_state_final = tf.concat([forward_cell_state.c, backward_cell_state.c],1)
    hidden_state_final = tf.concat([forward_cell_state.h, backward_cell_state.h],1)
    encoder_final_state = tf.nn.rnn_cell.LSTMStateTuple(c=cell_state_final, h=hidden_state_final)

    return encoder_outputs, encoder_final_state
编码器和解码器之间出现问题。我得到了一个类似ValueError的错误:形状(?,42)和(12,21)不兼容

解码器具有注意机制,如下所示:

def decoder(decoder_embedding, vocab_size, hidden_units, sequence_length, encoder_output, encoder_state, batchsize):
    projection_layer = Dense(vocab_size)
    helper = tf.contrib.seq2seq.TrainingHelper(decoder_embedding, sequence_length=sequence_length)

    # Decoder
    decoder_cell = tf.contrib.rnn.LSTMCell(hidden_units)

    # Attention Mechanis
    attention_mechanism = tf.contrib.seq2seq.LuongAttention(hidden_units, encoder_output)
    attn_cell = tf.contrib.seq2seq.AttentionWrapper(decoder_cell, attention_mechanism, attention_layer_size=hidden_units)
    # Initial attention
    attn_zero = attn_cell.zero_state(batch_size=batchsize, dtype=tf.float32)
    ini_state = attn_zero.clone(cell_state=encoder_state)

    decoder = tf.contrib.seq2seq.BasicDecoder(cell=attn_cell, initial_state=ini_state, helper=helper, output_layer=projection_layer)
    decoder_outputs, _final_state, _final_sequence_lengths = tf.contrib.seq2seq.dynamic_decode(decoder)

    return decoder_outputs

如何解决这一问题?

问题是编码器的隐藏单元数量是解码器的两倍,您正试图吸引注意力。它将注意力能量(状态相似性)计算为解码器状态和所有编码器状态之间的点积,点积需要相同的维度

您有几个选择:

  • 使用Bahdanau风格的注意力,在共享的编码器-解码器空间中增加一个非线性层

  • 更改编码器或解码器的维度,使隐藏状态相同,即编码器的
    隐藏单元
    =2×解码器的
    隐藏单元

  • 在编码器输出后添加线性密集层,将编码器输出投影到解码器的隐藏维度