Neural network PyTorch小批量,何时调用optimizer.zero_grad()

Neural network PyTorch小批量,何时调用optimizer.zero_grad(),neural-network,pytorch,Neural Network,Pytorch,当我们使用mini-batch时,我应该在开始迭代之前调用optimizer.zero\u grad()?还是在迭代中?我认为第二个代码是正确的,但我不确定 nb_epochs = 20 for epoch in range(nb_epochs + 1): optimizer.zero_grad() # THIS PART!! for batch_idx, samples in enumerate(dataloader):

当我们使用mini-batch时,我应该在开始迭代之前调用
optimizer.zero\u grad()
?还是在迭代中?我认为第二个代码是正确的,但我不确定

nb_epochs = 20
    for epoch in range(nb_epochs + 1):
      optimizer.zero_grad() # THIS PART!!
      for batch_idx, samples in enumerate(dataloader):
        
        
        x_train, y_train = samples
        
        prediction = model(x_train)
    
        
        cost = F.mse_loss(prediction, y_train)
    
        
        
        cost.backward()
        optimizer.step()
    
        print('Epoch {:4d}/{} Batch {}/{} Cost: {:.6f}'.format(
            epoch, nb_epochs, batch_idx+1, len(dataloader),
            cost.item()
            ))


哪一个是正确的?唯一的区别是优化器的位置。默认情况下,每次在计算图上调用
.backward()
,零梯度(梯度都会累积

在第一个片段中,您将在每个历元中重置一次渐变,以便所有渐变将随时间累积其值。总计
len(数据加载器)
累积梯度,仅在下一个历元开始时重新加载梯度在第二段中,您做的是正确的,即在每次向后传递后重置渐变

所以你的假设是对的

有些情况下需要累积梯度,但大多数情况下并非如此

nb_epochs = 20
    for epoch in range(nb_epochs + 1):
      
      for batch_idx, samples in enumerate(dataloader):


        x_train, y_train = samples

        prediction = model(x_train)

        optimizer.zero_grad() #THIS PART!!
        cost = F.mse_loss(prediction, y_train)



        cost.backward()
        optimizer.step()

        print('Epoch {:4d}/{} Batch {}/{} Cost: {:.6f}'.format(
            epoch, nb_epochs, batch_idx+1, len(dataloader),
            cost.item()
            ))