Nlp 使用BERT强制所有标签为零的关键短语提取。嵌入的使用方式似乎有问题

Nlp 使用BERT强制所有标签为零的关键短语提取。嵌入的使用方式似乎有问题,nlp,topic-modeling,bert-language-model,huggingface-transformers,ner,Nlp,Topic Modeling,Bert Language Model,Huggingface Transformers,Ner,我使用Bert嵌入从文档中提取关键短语。我使用的是Bert嵌入,后面是基于跨度的特征。列车数据具有使用词性标签识别的候选短语。以下是实施细节 encoder = TFBertModel.from_pretrained("bert-base-uncased") input_ids = tf.keras.layers.Input(shape=(max_len,), dtype=tf.int32) attention_mask = tf.keras.layers.Input(s

我使用Bert嵌入从文档中提取关键短语。我使用的是Bert嵌入,后面是基于跨度的特征。列车数据具有使用词性标签识别的候选短语。以下是实施细节

encoder = TFBertModel.from_pretrained("bert-base-uncased")


input_ids = tf.keras.layers.Input(shape=(max_len,), dtype=tf.int32)
attention_mask = tf.keras.layers.Input(shape=(max_len,), dtype=tf.int32)
embedding = encoder(input_ids, attention_mask=attention_mask)[0]

bilstm1 = tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(40,
                                                             #kernel_initializer=tf.keras.initializers.RandomNormal(mean=0.02,stddev=0.25),
                                                             dropout = 0.1,
                                                             return_sequences=True),
                                                             merge_mode=None)(embedding)
pos_mask = tf.keras.layers.Input(shape=(2,146),dtype='int32')
mask_start = pos_mask[0][0]
mask_end = pos_mask[0][1]

start_rep_fr = tf.gather(bilstm1[0],mask_start,axis=1)
start_rep_bk = tf.gather(bilstm1[1],mask_start,axis=1)
end_rep_fr = tf.gather(bilstm1[0],mask_end,axis=1)
end_rep_bk = tf.gather(bilstm1[0],mask_end,axis=1)


span_fe_diff_fr = start_rep_fr-end_rep_fr
span_fe_prod_fr = tf.math.multiply(start_rep_fr,end_rep_fr)
span_fe_diff_bk = start_rep_bk-end_rep_bk
span_fe_prod_bk = tf.math.multiply(start_rep_bk,end_rep_bk)


span_fe = tf.keras.layers.concatenate([start_rep_fr,
                     end_rep_fr,
                     start_rep_bk,
                     end_rep_bk,
                     span_fe_diff_fr,
                     span_fe_diff_bk,
                     span_fe_prod_fr,
                     span_fe_prod_bk
                    ],2)
bilstm2 = tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(10,return_sequences=True,dropout = 0.1,
                                                            #kernel_initializer=tf.keras.initializers.(mean=0.0,stddev=0.05),
                                                            ),
                                        
                                         merge_mode='ave',
                                         input_shape=(146,40*4))(span_fe)
output = tf.keras.layers.Dense(2,activation='softmax')(bilstm2)

kpe_model = tf.keras.models.Model(inputs=[input_ids,attention_mask,pos_mask], outputs=output)
kpe_model.layers[3].trainable = False

opt = tf.keras.optimizers.Adam(learning_rate=0.00005)
kpe_model.compile(optimizer=opt,
              loss=loss_function,
              metrics=[ac_metrics])

输出表示候选短语成为关键短语的概率。不确定此处哪部分不正确。该模型在2-3步内收敛,并强制所有概率