PyTorch与Tensorflow中LSTM的不同输出

PyTorch与Tensorflow中LSTM的不同输出,tensorflow,deep-learning,pytorch,lstm,Tensorflow,Deep Learning,Pytorch,Lstm,我正在尝试将Tensorflow(1.15)模型转换为PyTorch模型。因为我得到了非常不同的损耗值,所以我尝试比较相同输入的前向传递中LSTM的输出。LSTM的声明和初始化如下所示: Tensorflow代码 rnn_cell_video_fw = tf.contrib.rnn.LSTMCell( num_units=self.options['rnn_size'], state_is_tuple=True, init

我正在尝试将Tensorflow(1.15)模型转换为PyTorch模型。因为我得到了非常不同的损耗值,所以我尝试比较相同输入的前向传递中LSTM的输出。LSTM的声明和初始化如下所示:

Tensorflow代码

rnn_cell_video_fw = tf.contrib.rnn.LSTMCell(
            num_units=self.options['rnn_size'],
            state_is_tuple=True, 
            initializer=tf.orthogonal_initializer()
        )

rnn_cell_video_fw = tf.contrib.rnn.DropoutWrapper(
            rnn_cell_video_fw,
            input_keep_prob=1.0 - rnn_drop,
            output_keep_prob=1.0 - rnn_drop 
        )

sequence_length = tf.expand_dims(tf.shape(video_feat_fw)[1], axis=0)
initial_state = rnn_cell_video_fw.zero_state(batch_size=batch_size, dtype=tf.float32)

rnn_outputs_fw, _ = tf.nn.dynamic_rnn(
                    cell=rnn_cell_video_fw, 
                    inputs=video_feat_fw, 
                    sequence_length=sequence_length, 
                    initial_state=initial_state,
                    dtype=tf.float32
                )
self.rnn_video_fw = nn.LSTM(self.options['video_feat_dim'], self.options['rnn_size'], dropout = self.options['rnn_drop'])

rnn_outputs_fw, _ = self.rnn_video_fw(video_feat_fw)

PyTorch代码

rnn_cell_video_fw = tf.contrib.rnn.LSTMCell(
            num_units=self.options['rnn_size'],
            state_is_tuple=True, 
            initializer=tf.orthogonal_initializer()
        )

rnn_cell_video_fw = tf.contrib.rnn.DropoutWrapper(
            rnn_cell_video_fw,
            input_keep_prob=1.0 - rnn_drop,
            output_keep_prob=1.0 - rnn_drop 
        )

sequence_length = tf.expand_dims(tf.shape(video_feat_fw)[1], axis=0)
initial_state = rnn_cell_video_fw.zero_state(batch_size=batch_size, dtype=tf.float32)

rnn_outputs_fw, _ = tf.nn.dynamic_rnn(
                    cell=rnn_cell_video_fw, 
                    inputs=video_feat_fw, 
                    sequence_length=sequence_length, 
                    initial_state=initial_state,
                    dtype=tf.float32
                )
self.rnn_video_fw = nn.LSTM(self.options['video_feat_dim'], self.options['rnn_size'], dropout = self.options['rnn_drop'])

rnn_outputs_fw, _ = self.rnn_video_fw(video_feat_fw)

在train.py中初始化LSTM

def init_weight(m):
        if type(m) in [nn.LSTM]:
            for param in m.parameters():
                    nn.init.orthogonal_(m.weight_hh_l0)
                    nn.init.orthogonal_(m.weight_ih_l0)

每个数据项的情况都差不多,我的PyTorch模型没有收敛。我对输出LSTM差异的怀疑是正确的吗?如果是的话,我错在哪里


如果还需要其他功能,请告诉我。

Pytorch在RNN模块上使用
time,batch,others
,而不是使用
batch,somthing,others
,这可能是原因。是的,我也考虑了这一点,并相应地更改了形状,但这也没有帮助:(