Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/316.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/unix/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何在pytorch中正确绘制同一图形上的train/val_Python_Machine Learning_Logging_Pytorch - Fatal编程技术网

Python 如何在pytorch中正确绘制同一图形上的train/val

Python 如何在pytorch中正确绘制同一图形上的train/val,python,machine-learning,logging,pytorch,Python,Machine Learning,Logging,Pytorch,我有以下代码来训练模型并将日志存储在结果变量中 import tqdm.notebook as tq import sys num_epochs = 10 results = {"train_loss": [], "val_loss": [], "train_acc": [], "val_acc": []} for epoch in range(1, num_epochs+1): sys.stdout.wri

我有以下代码来训练模型并将日志存储在结果变量中

import tqdm.notebook as tq
import sys

num_epochs = 10
results = {"train_loss": [], "val_loss": [], "train_acc": [], "val_acc": []}

for epoch in range(1, num_epochs+1):
  sys.stdout.write(f"---Epoch {epoch}/{num_epochs}: ")
  epoch_loss = {"train": [], "val": []}
  epoch_acc = {"train": [], "val": []}

  for phase in ['train', 'val']:
    if phase=="train":
      model.train(True)
    else:
      model.train(False)
    
    # most important thing I learned from this project was how to fix tqdm nastiness in colab
    for batch_idx, (x, y) in tq.tqdm(enumerate(dataloaders[phase]),
                                     total=len(dataloaders[phase]),
                                     leave=False):

      # put data to device and get output
      x, y = x.to(device), y.to(device)
      preds = model(x)

      # calc and log model loss
      batch_loss = criterion(preds, y)
      epoch_loss[phase].append(batch_loss.item())

      # calculate acc and extend to epoch_acc
      preds = torch.argmax(preds, dim=1)
      batch_acc = torch.sum(preds==y)/len(y)
      epoch_acc[phase].append(batch_acc)

      # zero the grad
      optimizer.zero_grad()

      # take a step if training mode is on
      if phase=="train":
        batch_loss.backward()
        optimizer.step()
        scheduler.step()

  # at the end of each epoch, calculate avg epoch train/val loss/accuracy
  train_loss = sum(epoch_loss["train"])/len(epoch_loss["train"])
  val_loss = sum(epoch_loss["val"])/len(epoch_loss["val"])
  train_acc = 100*sum(epoch_acc["train"])/len(epoch_acc["train"])
  val_acc = 100*sum(epoch_acc["val"])/len(epoch_acc["val"])

  # log losses and accs every epoch
  results['train_loss'].extend(epoch_loss['train'])
  results['train_acc'].extend(epoch_acc['train'])
  results['val_loss'].extend(epoch_loss['val'])
  results['val_acc'].extend(epoch_acc['val'])

  # and print it nicely
  sys.stdout.write("train_loss: {:.4f} train_acc: {:.2f}% ".format(train_loss, train_acc))
  sys.stdout.write("val_loss: {:.4f} val_acc: {:.2f}%\n".format(val_loss, val_acc))
我将每个批次的平均精度和平均损耗记录到单独的培训/验证损耗/acc阵列中。问题是我有更多的培训批次,因此当我尝试绘制培训日志时,我会得到如下结果:


有解决方法吗?

您在概念上犯了一些错误:

  • 您在多个批次中计算验证损失/精度,而不是在整个验证集上
  • 在静态模型已经对所有数据进行训练后,您将计算其验证精度,而不是在训练时定期评估验证精度
  • 您应该平均每个历元的批量培训绩效,每个历元计算一次整个验证集的完整损失/acc统计数据。然后,您将拥有用于培训和验证的
    n_历元
    值,并可以将它们绘制在同一轴上。

    您期望的“解决方案”是什么?绘制历元而不是批次分数,或者使用不同的绘图。否则,您可以重复val绘图上的值,但这完全超出了绘图的目的。另外,请参阅。在验证模式下,您可以禁用渐变,验证将运行得更快,占用的内存更少。另外,您不需要在验证阶段运行optimizer.zero\u grad()