Keras 在用于Q&;的LSTM Seq2Seq模型中合并两个层的问题;用例
我正试图建立一个基于的Q&a模型,但我很难将两个输入层合并到一个层中。以下是我当前的模型架构:Keras 在用于Q&;的LSTM Seq2Seq模型中合并两个层的问题;用例,keras,deep-learning,lstm,Keras,Deep Learning,Lstm,我正试图建立一个基于的Q&a模型,但我很难将两个输入层合并到一个层中。以下是我当前的模型架构: story_input = Input(shape=(story_maxlen,vocab_size), name='story_input') story_input_proc = Embedding(vocab_size, latent_dim, name='story_input_embed', input_length=story_maxlen)(story_input) story_inpu
story_input = Input(shape=(story_maxlen,vocab_size), name='story_input')
story_input_proc = Embedding(vocab_size, latent_dim, name='story_input_embed', input_length=story_maxlen)(story_input)
story_input_proc = Reshape((latent_dim,story_maxlen), name='story_input_reshape')(story_input_proc)
query_input = Input(shape=(query_maxlen,vocab_size), name='query_input')
query_input_proc = Embedding(vocab_size, latent_dim, name='query_input_embed', input_length=query_maxlen)(query_input)
query_input_proc = Reshape((latent_dim,query_maxlen), name='query_input_reshape')(query_input_proc)
story_query = dot([story_input_proc, query_input_proc], axes=(1, 1), name='story_query_merge')
encoder = LSTM(latent_dim, return_state=True, name='encoder')
encoder_output, state_h, state_c = encoder(story_query)
encoder_output = RepeatVector(3, name='encoder_3dim')(encoder_output)
encoder_states = [state_h, state_c]
decoder = LSTM(latent_dim, return_sequences=True, name='decoder')(encoder_output, initial_state=encoder_states)
answer_output = Dense(vocab_size, activation='softmax', name='answer_output')(decoder)
model = Model([story_input, query_input], answer_output)
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
下面是model.summary()的输出
其中vocab_size=38,story_maxlen=358,query_maxlen=5,潜伏_dim=64,批处理大小=64
当我尝试训练这个模型时,我得到了一个错误:
要重塑的输入是一个张量,其值为778240,但请求的形状有20480
以下是这两个值的公式:
input\u to\u reforme=批量大小*潜在大小*查询大小*声音大小
请求的形状=批量大小*潜在尺寸*查询最大值
我在哪里
我相信错误消息是输入到query\u input\u reforme
层的张量的形状是(?,5,38,64),但它期望的是形状的张量(?,5,64)(见上面的公式),但我可能在这一点上错了
当我将整形的目标形状输入更改为3D时(即整形((潜在大小,查询最大大小,声音大小)
),我得到错误新数组的总大小必须保持不变,这对我来说没有任何意义,因为输入是3D。你可能会认为整形((潜在大小,查询最大大小))
会给我这个错误,因为它会将三维张量转换为二维张量,但它编译得很好,所以我不知道那里发生了什么
我使用重塑的唯一原因是我需要将两个张量合并为LSTM编码器的输入。当我尝试去除重塑层时,我在尝试编译模型时只会出现维度不匹配错误。上面的模型架构至少可以编译,但我无法训练它
有人能帮我弄清楚如何合并故事输入层和查询输入层吗?谢谢!为什么要重新塑造?这会把句子混在一起,也许你是想把Permute
改成(?)@nuric是的,应该是Permute
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
story_input (InputLayer) (None, 358, 38) 0
__________________________________________________________________________________________________
query_input (InputLayer) (None, 5, 38) 0
__________________________________________________________________________________________________
story_input_embed (Embedding) (None, 358, 64) 2432 story_input[0][0]
__________________________________________________________________________________________________
query_input_embed (Embedding) (None, 5, 64) 2432 query_input[0][0]
__________________________________________________________________________________________________
story_input_reshape (Reshape) (None, 64, 358) 0 story_input_embed[0][0]
__________________________________________________________________________________________________
query_input_reshape (Reshape) (None, 64, 5) 0 query_input_embed[0][0]
__________________________________________________________________________________________________
story_query_merge (Dot) (None, 358, 5) 0 story_input_reshape[0][0]
query_input_reshape[0][0]
__________________________________________________________________________________________________
encoder (LSTM) [(None, 64), (None, 17920 story_query_merge[0][0]
__________________________________________________________________________________________________
encoder_3dim (RepeatVector) (None, 3, 64) 0 encoder[0][0]
__________________________________________________________________________________________________
decoder (LSTM) (None, 3, 64) 33024 encoder_3dim[0][0]
encoder[0][1]
encoder[0][2]
__________________________________________________________________________________________________
answer_output (Dense) (None, 3, 38) 2470 decoder[0][0]
==================================================================================================
Total params: 58,278
Trainable params: 58,278
Non-trainable params: 0
__________________________________________________________________________________________________