Python 3.x 每次随机输出的伯特预训练模型_Python 3.x_Pytorch_Huggingface Transformers_Bert Language Model

Python 3.x 每次随机输出的伯特预训练模型

python-3.x pytorch

Python 3.x 每次随机输出的伯特预训练模型,python-3.x,pytorch,huggingface-transformers,bert-language-model,Python 3.x,Pytorch,Huggingface Transformers,Bert Language Model,我试图在huggingface bert transformer之后添加一个附加层，所以我在我的nn.Module网络中使用了BertForSequenceClassification。但是，与直接加载模型相比，我看到模型给了我随机输出模式1： from transformers import BertForSequenceClassification model = BertForSequenceClassification.from_pretrained('bert-base-uncas

我试图在huggingface bert transformer之后添加一个附加层，所以我在我的

nn.Module

网络中使用了

BertForSequenceClassification

。但是，与直接加载模型相比，我看到模型给了我随机输出

模式1：

from transformers import BertForSequenceClassification

model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels = 5) # as we have 5 classes

import torch
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

input_ids = torch.tensor(tokenizer.encode(texts[0], add_special_tokens=True, max_length = 512)).unsqueeze(0)  # Batch size 1

print(model(input_ids))

输出：

torch.Size（[1512]）
火炬尺寸（[1,5]）
（张量（[-0.3729，-0.2192,0.1183,0.0778，-0.2820]]，
grad_fn=
伯特是否有一些具体的参数，如果有，如何获得可再现的输出
为什么这两个模型给了我不同的输出？是不是我做错了什么
原因是由于Bert的分类器层的随机初始化
    (pooler): BertPooler(
      (dense): Linear(in_features=768, out_features=768, bias=True)
      (activation): Tanh()
    )
  )
  (dropout): Dropout(p=0.1, inplace=False)
  (classifier): Linear(in_features=768, out_features=5, bias=True)
)

在最后一层中有一个分类器
，该层添加在bert base
之后。现在，您希望培训该层以完成下游任务
如果您想获得更多信息：
model, li = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels = 5, output_loading_info=True) # as we have 5 classes
print(li)

您可以看到分类器。缺少权重
和偏差
，因此每次调用BertForSequenceClassification.from_pretrained（'bert-base-uncased'，num_labels=5）

torch.Size([1, 512])
torch.Size([1, 5])
(tensor([[-0.3729, -0.2192,  0.1183,  0.0778, -0.2820]],
        grad_fn=<AddmmBackward>),)

    (pooler): BertPooler(
      (dense): Linear(in_features=768, out_features=768, bias=True)
      (activation): Tanh()
    )
  )
  (dropout): Dropout(p=0.1, inplace=False)
  (classifier): Linear(in_features=768, out_features=5, bias=True)
)

model, li = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels = 5, output_loading_info=True) # as we have 5 classes
print(li)

{'missing_keys': ['classifier.weight', 'classifier.bias'], 'unexpected_keys': ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias'], 'error_msgs': []}