Python Keras中的双LSTM注意模型_Python_Tensorflow_Machine Learning_Keras_Deep Learning

Python Keras中的双LSTM注意模型

python tensorflow machine-learning keras deep-learning

Python Keras中的双LSTM注意模型,python,tensorflow,machine-learning,keras,deep-learning,Python,Tensorflow,Machine Learning,Keras,Deep Learning,我正在尝试使用单词嵌入的Bi LSTM制作一个注意力模型。我遇到了，然后然而，我对基于注意力的双向长短期记忆网络在关系分类中的实现感到困惑。所以 _input = Input(shape=[max_length], dtype='int32') # get the embedding layer embedded = Embedding( input_dim=30000, output_dim=300, input_length=100,

我正在尝试使用单词嵌入的Bi LSTM制作一个注意力模型。我遇到了，然后

然而，我对基于注意力的双向长短期记忆网络在关系分类中的实现感到困惑。所以

_input = Input(shape=[max_length], dtype='int32')

# get the embedding layer
embedded = Embedding(
        input_dim=30000,
        output_dim=300,
        input_length=100,
        trainable=False,
        mask_zero=False
    )(_input)

activations = Bidirectional(LSTM(20, return_sequences=True))(embedded)

# compute importance for each step
attention = Dense(1, activation='tanh')(activations)

我在这里很困惑，从论文中可以看出哪个等式是什么

attention = Flatten()(attention)
attention = Activation('softmax')(attention)

你会怎么做

attention = RepeatVector(20)(attention)
attention = Permute([2, 1])(attention)


sent_representation = merge([activations, attention], mode='mul')

现在，我再次不知道为什么这条线在这里

sent_representation = Lambda(lambda xin: K.sum(xin, axis=-2), output_shape=(units,))(sent_representation)

由于我有两门课，因此我将最终的softmax设置为：

probabilities = Dense(2, activation='softmax')(sent_representation)

将注意力权重张量变换为向量（如果序列大小为max_length，则为size max_length）

允许所有注意力权重介于0和1之间，所有权重之和等于1

attention = RepeatVector(20)(attention)
attention = Permute([2, 1])(attention)


sent_representation = merge([activations, attention], mode='mul')

RepeatVector将注意力权重向量（大小为max_len）与隐藏状态（20）的大小重复，以便将激活和隐藏状态元素相乘。张量变量激活的大小为max_len*20

sent_representation = Lambda(lambda xin: K.sum(xin, axis=-2), output_shape=(units,))(sent_representation)
该Lambda层对加权隐藏状态向量求和，以获得将在最后使用的向量

希望这有帮助
这里有一个简单的方法来增加注意力：
attention = RepeatVector(20)(attention) attention = Permute([2, 1])(attention) sent_representation = merge([activations, attention], mode='mul')

sent_representation = Lambda(lambda xin: K.sum(xin, axis=-2), output_shape=(units,))(sent_representation)