用于文本摘要的Tensorflow keras双向LSTM

用于文本摘要的Tensorflow keras双向LSTM,keras,seq2seq,Keras,Seq2seq,我正在尝试实现一个用于文本摘要的双向LSTM。我对推理部分有异议。维度不匹配。这是我的模型: latent_dim = 300 embedding_dim=100 # Encoder encoder_inputs = Input(shape=(max_news_len,)) #embedding layer enc_emb = Embedding(x_voc, embedding_dim,trainable=True)(encoder_inputs) #encoder lstm 1 en

我正在尝试实现一个用于文本摘要的双向LSTM。我对推理部分有异议。维度不匹配。这是我的模型:

latent_dim = 300
embedding_dim=100

# Encoder
encoder_inputs = Input(shape=(max_news_len,))

#embedding layer
enc_emb =  Embedding(x_voc, embedding_dim,trainable=True)(encoder_inputs)

#encoder lstm 1
encoder_bi_lstm1 = Bidirectional(LSTM(latent_dim,
                                   return_sequences=True,
                                   return_state=True,
                                   dropout=0.4,
                                   recurrent_dropout=0.4), 
                                 merge_mode="concat")
encoder_output1, forward_state_h1, forward_state_c1, backward_state_h1, backward_state_c1 = encoder_bi_lstm1(enc_emb)
encoder_states1 = [forward_state_h1, forward_state_c1, backward_state_h1, backward_state_c1]

# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(None,))

#embedding layer
dec_emb_layer = Embedding(y_voc, embedding_dim,trainable=True)
dec_emb = dec_emb_layer(decoder_inputs)

#decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True,dropout=0.4,recurrent_dropout=0.2)
#decoder_outputs,decoder_fwd_state, decoder_back_state = decoder_lstm(dec_emb,initial_state=[state_h, state_c])

decoder_bi_lstm = Bidirectional(LSTM(latent_dim, 
                                  return_sequences=True, 
                                  return_state=True,
                                  dropout=0.4,
                                  recurrent_dropout=0.2),
                             merge_mode="concat")
decoder_outputs, decoder_fwd_state_h1, decoder_fwd_state_c1, decoder_back_state_h1, decoder_back_state_c1 = decoder_bi_lstm(dec_emb,initial_state=encoder_states1)
decoder_states = [decoder_fwd_state_h1, decoder_fwd_state_c1, decoder_back_state_h1, decoder_back_state_c1]

# Attention layer
attn_layer = AttentionLayer(name='attention_layer')
attn_out, attn_states = attn_layer([encoder_output1, decoder_outputs])

# Concat attention input and decoder LSTM output
decoder_concat_input = Concatenate(axis=-1, name='concat_layer')([decoder_outputs, attn_out])

#dense layer
decoder_dense =  TimeDistributed(Dense(y_voc, activation='softmax'))
decoder_outputs = decoder_dense(decoder_concat_input)

# Define the model 
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

model.summary() 
这是我的推理设置:

# Encode the input sequence to get the feature vector
encoder_model = Model(inputs=encoder_inputs,outputs=encoder_states1)

# Decoder setup
# Below tensors will hold the states of the previous time step
decoder_state_input_h = Input(shape=(latent_dim,))
decoder_state_input_c = Input(shape=(latent_dim,))
decoder_hidden_state_input = Input(shape=(max_news_len,latent_dim))

# Get the embeddings of the decoder sequence
dec_emb2= dec_emb_layer(decoder_inputs) 
# To predict the next word in the sequence, set the initial states to the states from the previous time step
decoder_outputs2, decoder_fwd_state_h2, decoder_fwd_state_c2, decoder_back_state_h2, decoder_back_state_c2 = decoder_bi_lstm(dec_emb2, initial_state=decoder_states)
decoder_states2 = [decoder_fwd_state_h2, decoder_fwd_state_c2, decoder_back_state_h2, decoder_back_state_c2]

#attention inference
attn_out_inf, attn_states_inf = attn_layer([decoder_hidden_state_input, decoder_outputs2])
decoder_inf_concat = Concatenate(axis=-1, name='concat')([decoder_outputs2, attn_out_inf])

# A dense softmax layer to generate prob dist. over the target vocabulary
decoder_outputs2 = decoder_dense(decoder_inf_concat) 

# Final decoder model
decoder_model = Model(
    [decoder_inputs] + [decoder_hidden_state_input,decoder_state_input_h, decoder_state_input_c],
    [decoder_outputs2] + [decoder_fwd_state_h2, decoder_fwd_state_c2, decoder_back_state_h2, decoder_back_state_c2])
错误是:
尺寸必须相等,但对于输入形状为[?,300],[600600]的“注意层6/MatMul”(op:“MatMul”)来说是300和600。我知道我迟到了,但刚才我找到了问题的答案。我还使用了相同的架构和双向LSTM的编码器和解码器架构

在推理模式下传递解码器时,需要为最初从编码器传递到解码器的四个初始状态创建四个单独的张量

dec_h_state_f = tf.keras.layers.Input(shape=(latent_dim))
dec_h_state_r = tf.keras.layers.Input(shape=(latent_dim))

dec_c_state_f = tf.keras.layers.Input(shape=(latent_dim))
dec_c_state_r = tf.keras.layers.Input(shape=(latent_dim))


# Create the hidden input layer with twice the latent dimension,
# since we are using bi - directional LSTM's we will get 
# two hidden states and two cell states

dec_hidden_inp = tf.keras.layers.Input(shape=(max_news_len, latent_dim * 2))
最后,你可以用下面的方式建立你的模型,这对我来说很好

dec_model = tf.keras.models.Model([dec_input] + [dec_hidden_inp, dec_h_state_f, dec_h_state_r, dec_c_state_f, dec_c_state_r],
                              [dec_out_infer] + [dec_states])
是的,在将其最初输入解码器之前,不要忘记在推理时从编码器中分离这四个值