Nlp 无法使用现有代码在'；大型'；模型_Nlp

Nlp 无法使用现有代码在'；大型'；模型

nlp

Nlp 无法使用现有代码在'；大型'；模型,nlp,Nlp,我的Python代码对于基本的transformer模型可以正常工作，但是当我尝试使用“大型”模型或roberta模型时，我会收到错误消息。下面是我打印的最常见的消息 Epoch 1 / 40 运行时错误回溯（上次最近调用）在（） 12 13#列车模型 --->14列车损耗，\=微调（） 15#我们不关心模型输出的第二项（总量） 16#我们只需要这里的平均损失值‘平均损失’ 5帧 /usr/local/lib/python3.6/dist-packages/torch/nn/functio

我的Python代码对于基本的transformer模型可以正常工作，但是当我尝试使用“大型”模型或roberta模型时，我会收到错误消息。下面是我打印的最常见的消息

Epoch 1 / 40

运行时错误回溯（上次最近调用）在（） 12 13#列车模型 --->14列车损耗，\=微调（） 15#我们不关心模型输出的第二项（总量） 16#我们只需要这里的平均损失值‘平均损失’

5帧 /usr/local/lib/python3.6/dist-packages/torch/nn/functional.py（输入、重量、偏差） 1688如果输入.dim（）==2且偏差不是无： 1689#fused op稍微快一点 ->1690 ret=torch.addmm（偏差、输入、重量.t（）） 1691其他： 1692输出=输入.matmul（weight.t（））

运行时错误：mat1尺寸1必须与mat2尺寸0匹配

I am  guessing there is some kind of a mismatch between matrices(Tensors) such that an operation cannot occur. If I can better understand the issue, I can better address the necessary changes to my code. Her is the fine tuning function I am using...

def fine_tune（）：

模型列车（）

总损耗，总精度=0，0

保存模型预测的空列表总preds=[]

迭代批处理对于步骤，枚举中的批处理（列数据加载器）：

计算历元的训练损失平均损耗=总损耗/长度（列车数据装载机）

以（样本数、类别数）的形式重塑预测总预紧度=np。连接（总预紧度，轴=0）

返回平均损失，总损失

关于，Mark，我写了一份打印声明，以显示预训练模型的输入大小。这揭示了真实大小，即1024，而不是我修改的程序中的默认硬代码值768。一旦我了解了问题，这是一个简单的解决办法。对我来说，这个故事的寓意是，当一个YouTuber（实际上是一个好的！）说“所有变压器的输出维度都是768”时，并不一定把这当作福音

# progress update after every 50 batches.
if step % 50 == 0 and not step == 0:
  print('  Batch {:>5,}  of  {:>5,}.'.format(step, len(train_dataloader)))

# push the batch to gpu
batch = [r.to(device) for r in batch]

sent_id, mask, labels = batch

# clear previously calculated gradients 
model.zero_grad()        

# get model predictions for the current batch
preds = model(sent_id, mask)

# compute the loss between actual and predicted values
loss = cross_entropy(preds, labels)

# add on to the total loss
total_loss = total_loss + loss.item()

# backward pass to calculate the gradients
loss.backward()

# clip the the gradients to 1.0. It helps in preventing the exploding gradient problem
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)

# update parameters
optimizer.step()

# model predictions are stored on GPU. So, push it to CPU
preds=preds.detach().cpu().numpy()
# Length of preds is the same as the batch size

# append the model predictions
total_preds.append(preds)