Nlp 基于T5的句子嵌入_Nlp_Pytorch_Word Embedding

Nlp 基于T5的句子嵌入

nlp pytorch

Nlp 基于T5的句子嵌入,nlp,pytorch,word-embedding,Nlp,Pytorch,Word Embedding,我想使用最先进的LM T5来获得句子嵌入向量。我找到了这个存储库正如我所知，在BERT中，我应该将第一个标记作为[CLS]标记，它将是句子嵌入。在这个存储库中，我在T5模型上看到了相同的行为： cls_tokens = output_tokens[:, 0, :] # CLS token is first token 这种行为正确吗？我从T5中提取了编码器，并用它编码了两个短语： "I live in the kindergarden" "Yes, I l

我想使用最先进的LM T5来获得句子嵌入向量。我找到了这个存储库正如我所知，在BERT中，我应该将第一个标记作为[CLS]标记，它将是句子嵌入。在这个存储库中，我在T5模型上看到了相同的行为：

cls_tokens = output_tokens[:, 0, :]  # CLS token is first token

这种行为正确吗？我从T5中提取了编码器，并用它编码了两个短语：

"I live in the kindergarden"
"Yes, I live in the kindergarden"

它们之间的余弦相似性仅为“0.2420”

我只需要了解句子嵌入是如何工作的——我应该训练网络找到相似性以获得正确的结果吗？或者，基本预训练语言模型已经足够了？

为了从T5中获得句子嵌入，您需要从T5编码器输出中获取“获取<代码>最后一个隐藏状态”：

model.encoder(input_ids=s, attention_mask=attn, return_dict=True)
pooled_sentence = output.last_hidden_state # shape is [batch_size, seq_len, hidden_size]
# pooled_sentence will represent the embeddings for each word in the sentence
# you need to sum/average the pooled_sentence
pooled_sentence = torch.mean(pooled_sentence, dim=1)

现在你有了一个嵌入T5的句子