Python 3.x 每次随机输出的伯特预训练模型

Python 3.x 每次随机输出的伯特预训练模型,python-3.x,pytorch,huggingface-transformers,bert-language-model,Python 3.x,Pytorch,Huggingface Transformers,Bert Language Model,我试图在huggingface bert transformer之后添加一个附加层,所以我在我的nn.Module网络中使用了BertForSequenceClassification。但是,与直接加载模型相比,我看到模型给了我随机输出 模式1: from transformers import BertForSequenceClassification model = BertForSequenceClassification.from_pretrained('bert-base-uncas

我试图在huggingface bert transformer之后添加一个附加层,所以我在我的
nn.Module
网络中使用了
BertForSequenceClassification
。但是,与直接加载模型相比,我看到模型给了我随机输出

模式1:

from transformers import BertForSequenceClassification

model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels = 5) # as we have 5 classes

import torch
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

input_ids = torch.tensor(tokenizer.encode(texts[0], add_special_tokens=True, max_length = 512)).unsqueeze(0)  # Batch size 1

print(model(input_ids))

输出:

torch.Size([1512])
火炬尺寸([1,5])
(张量([-0.3729,-0.2192,0.1183,0.0778,-0.2820]],
grad_fn=

  • 伯特是否有一些具体的参数,如果有,如何获得可再现的输出

  • 为什么这两个模型给了我不同的输出?是不是我做错了什么


  • 原因是由于Bert的分类器层的随机初始化

        (pooler): BertPooler(
          (dense): Linear(in_features=768, out_features=768, bias=True)
          (activation): Tanh()
        )
      )
      (dropout): Dropout(p=0.1, inplace=False)
      (classifier): Linear(in_features=768, out_features=5, bias=True)
    )
    
    在最后一层中有一个
    分类器
    ,该层添加在
    bert base
    之后。现在,您希望培训该层以完成下游任务

    如果您想获得更多信息:

    model, li = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels = 5, output_loading_info=True) # as we have 5 classes
    print(li)
    
    您可以看到
    分类器。缺少权重
    偏差
    ,因此每次调用
    BertForSequenceClassification.from_pretrained('bert-base-uncased',num_labels=5)

    torch.Size([1, 512])
    torch.Size([1, 5])
    (tensor([[-0.3729, -0.2192,  0.1183,  0.0778, -0.2820]],
            grad_fn=<AddmmBackward>),)
    
        (pooler): BertPooler(
          (dense): Linear(in_features=768, out_features=768, bias=True)
          (activation): Tanh()
        )
      )
      (dropout): Dropout(p=0.1, inplace=False)
      (classifier): Linear(in_features=768, out_features=5, bias=True)
    )
    
    model, li = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels = 5, output_loading_info=True) # as we have 5 classes
    print(li)
    
    {'missing_keys': ['classifier.weight', 'classifier.bias'], 'unexpected_keys': ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias'], 'error_msgs': []}