Python 检索句子的注意力权重？大多数注意力集中的句子都是零向量_Python_Tensorflow_Keras_Nlp_Attention Model

Python 检索句子的注意力权重？大多数注意力集中的句子都是零向量

python tensorflow keras nlp

Python 检索句子的注意力权重？大多数注意力集中的句子都是零向量,python,tensorflow,keras,nlp,attention-model,Python,Tensorflow,Keras,Nlp,Attention Model,我有一个文档分类任务，它将文档分类为好（1）或坏（0），并且我为每个文档使用一些句子嵌入来相应地对文档进行分类我喜欢做的是检索每个文档的注意力分数，以获得最“相关”的句子（即注意力分数高的句子）我将每个文档填充到相同的长度（即，每个文档1000个句子）。因此，我的5000个文档的张量看起来像X=np.one（shape=（50001000200））（5000个文档，每个文档有1000个句子向量序列，每个句子向量由200个特征组成）我的网络如下所示： no_sentences_per_doc

我有一个文档分类任务，它将文档分类为好（1）或坏（0），并且我为每个文档使用一些句子嵌入来相应地对文档进行分类

我喜欢做的是检索每个文档的注意力分数，以获得最“相关”的句子（即注意力分数高的句子）

我将每个文档填充到相同的长度（即，每个文档1000个句子）。因此，我的5000个文档的张量看起来像

X=np.one（shape=（50001000200））

（5000个文档，每个文档有1000个句子向量序列，每个句子向量由200个特征组成）

我的网络如下所示：

no_sentences_per_doc = 1000
sentence_embedding = 200

sequence_input  = Input(shape=(no_sentences_per_doc, sentence_embedding))
gru_layer = Bidirectional(GRU(50,
                          return_sequences=True
                          ))(sequence_input)
sent_dense = Dense(100, activation='relu', name='sent_dense')(gru_layer)  
sent_att,sent_coeffs = AttentionLayer(100,return_coefficients=True, name='sent_attention')(sent_dense)
preds = Dense(1, activation='sigmoid',name='output')(sent_att)  
model = Model(sequence_input, preds)

model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=[TruePositives(name='true_positives'),
                      TrueNegatives(name='true_negatives'),
                      FalseNegatives(name='false_negatives'),
                      FalsePositives(name='false_positives')
                      ])

history = model.fit(X, y, validation_data=(x_val, y_val), epochs=10, batch_size=32)

训练结束后，我按照如下方式检索注意力得分：

sent_att_weights = Model(inputs=sequence_input,outputs=sent_coeffs)

## load a single sample
## from file with 150 sentences (one sentence per line)
## each sentence consisting of 200 features
x_sample = np.load(x_sample)
## and reshape to (1, 1000, 200)
x_sample = x_sample.reshape(1,1000,200) 

output_array = sent_att_weights.predict(x_sample)

但是，如果我显示句子的前三名注意分数，我也会获得句子索引，例如，对于一个只有150个句子的文档，

[432，434，999]

（其余部分填充，即仅为零）

这有意义吗？还是我在这里做错了什么？（我的注意力层有错误吗？还是因为F分数低？）

我使用的注意层如下所示：

class AttentionLayer(Layer):
    """
    https://humboldt-wi.github.io/blog/research/information_systems_1819/group5_han/
    """
    def __init__(self,attention_dim=100,return_coefficients=False,**kwargs):
        # Initializer 
        self.supports_masking = True
        self.return_coefficients = return_coefficients
        self.init = initializers.get('glorot_uniform') # initializes values with uniform distribution
        self.attention_dim = attention_dim
        super(AttentionLayer, self).__init__(**kwargs)

    def build(self, input_shape):
        # Builds all weights
        # W = Weight matrix, b = bias vector, u = context vector
        assert len(input_shape) == 3
        self.W = K.variable(self.init((input_shape[-1], self.attention_dim)),name='W')
        self.b = K.variable(self.init((self.attention_dim, )),name='b')
        self.u = K.variable(self.init((self.attention_dim, 1)),name='u')
        self.trainable_weights = [self.W, self.b, self.u]

        super(AttentionLayer, self).build(input_shape)

    def compute_mask(self, input, input_mask=None):
        return None

    def call(self, hit, mask=None):
        # Here, the actual calculation is done
        uit = K.bias_add(K.dot(hit, self.W),self.b)
        uit = K.tanh(uit)
        
        ait = K.dot(uit, self.u)
        ait = K.squeeze(ait, -1)
        ait = K.exp(ait)
        
        if mask is not None:
            ait *= K.cast(mask, K.floatx())

        ait /= K.cast(K.sum(ait, axis=1, keepdims=True) + K.epsilon(), K.floatx())
        ait = K.expand_dims(ait)
        weighted_input = hit * ait
        
        if self.return_coefficients:
            return [K.sum(weighted_input, axis=1), ait]
        else:
            return K.sum(weighted_input, axis=1)

    def compute_output_shape(self, input_shape):
        if self.return_coefficients:
            return [(input_shape[0], input_shape[-1]), (input_shape[0], input_shape[-1], 1)]
        else:
            return input_shape[0], input_shape[-1]

请注意，我将

keras

与

tensorflow

后端版本2.1一起使用。；注意层最初是为theano编写的，但我使用导入tensorflow.keras.backend作为K