Tensorflow 传递keras TimeDistributed包装器多个输入，只有一个具有时间维度_Tensorflow_Nlp_Deep Learning_Keras_Keras Layer

Tensorflow 传递keras TimeDistributed包装器多个输入，只有一个具有时间维度

tensorflow nlp deep-learning keras

Tensorflow 传递keras TimeDistributed包装器多个输入，只有一个具有时间维度,tensorflow,nlp,deep-learning,keras,keras-layer,Tensorflow,Nlp,Deep Learning,Keras,Keras Layer,我对凯拉斯是全新的。。。正在阅读文档。。。参考Keras函数API简介中的最后一个示例（“视频问答模型”）：简而言之，本例采用了一个关于视频的自然语言问题，并对可能的答案进行了softmax。基本上，开发了一个vision_模型来对一帧视频进行编码。它用TimeDistributed（）包装并应用于视频序列。。。它被传递到一个LSTM，得到整个视频序列的一个向量。。。这与一个编码的问题相连原始代码如下： video_input = Input(shape=(100, 3, 224, 224)

我对凯拉斯是全新的。。。正在阅读文档。。。参考Keras函数API简介中的最后一个示例（“视频问答模型”）：

简而言之，本例采用了一个关于视频的自然语言问题，并对可能的答案进行了softmax。基本上，开发了一个vision_模型来对一帧视频进行编码。它用TimeDistributed（）包装并应用于视频序列。。。它被传递到一个LSTM，得到整个视频序列的一个向量。。。这与一个编码的问题相连

原始代码如下：

video_input = Input(shape=(100, 3, 224, 224))
# This is our video encoded via the previously trained vision_model (weights are reused)
encoded_frame_sequence = TimeDistributed(vision_model)(video_input)  # the output will be a sequence of vectors
encoded_video = LSTM(256)(encoded_frame_sequence)  # the output will be a vector

# This is a model-level representation of the question encoder, reusing the same weights as before:
question_encoder = Model(inputs=question_input, outputs=encoded_question)

# Let's use it to encode the question:
video_question_input = Input(shape=(100,), dtype='int32')
encoded_video_question = question_encoder(video_question_input)

# And this is our video question answering model:
merged = keras.layers.concatenate([encoded_video, encoded_video_question])
output = Dense(1000, activation='softmax')(merged)
video_qa_model = Model(inputs=[video_input, video_question_input], outputs=output)

如果我想扔掉视频序列上的LSTM，对每一帧问同样的问题，并对所有100帧的潜在输出进行softmax，该怎么办？从概念上讲，我可以想到两种方法，但我真的不知道如何实现这两种方法，或者对于一个常见的用例，是否有一种完全更好的方法

方法1

我怀疑这是一条路。。。将编码问题与vision_模型的输出连接起来，vision_模型用于编码一幅图像。。。在前面的示例中，这是在同一链接（“可视化问答模型”）上完成的，称为vqa_模型

vqa_model = Model(inputs=[image_input, question_input], outputs=output)

现在我可以把这个传递给时间分发包装器吗

approach_1 = TimeDistributed(vqa_model)([video_input, question_input])

由于（我认为）明显的原因，这会产生一个断言错误

AssertionError
----> 1 approach_1 = TimeDistributed(vqa_model)([video_input, question_input])

Path\Continuum\Anaconda3\lib\site-packages\keras\engine\topology.py in __call__(self, inputs, **kwargs)
    558                     self.build(input_shapes[0])
    559                 else:
--> 560                     self.build(input_shapes)
    561                 self.built = True
    562 

Path\Continuum\Anaconda3\lib\site-packages\keras\layers\wrappers.py in build(self, input_shape)
    139 
    140     def build(self, input_shape):
--> 141         assert len(input_shape) >= 3
    142         self.input_spec = InputSpec(shape=input_shape)
    143         child_input_shape = (input_shape[0],) + input_shape[2:]

AssertionError:

我需要传递两个输入，其中只有一个（视频输入）实际上有时间维度（问题输入没有）。有没有一种方法可以让这样的工作。。。或者我是否需要广播嵌入的问题100x以匹配视频输入的时间维度？这似乎效率太低了

方法2

这个想法似乎非常笨拙。。。但是如果我把vqa_模型变成一个共享层，并在同一个问题上手动将其命名为100x，但是下一帧视频

vqa_1 = vqa_model([frame_1, question_input])
vqa_2 = vqa_model([frame_2, question_input])
...
vqa_100 = vqa_model([frame_100, question_input])

或者。。。更有可能。。。我这样做完全错了吗

提前谢谢。

嗨，我也有同样的问题。。。你有没有找到一个可以分享的解决方案？谢谢不。我从来没有弄明白这一点。我希望我有，但我暂时放弃了提出这个问题的问题，现在还没有机会回到这个问题上来。如果你想知道些什么，我很想听听你是怎么做的！