Neural network PyTorch小批量,何时调用optimizer.zero_grad()
当我们使用mini-batch时,我应该在开始迭代之前调用Neural network PyTorch小批量,何时调用optimizer.zero_grad(),neural-network,pytorch,Neural Network,Pytorch,当我们使用mini-batch时,我应该在开始迭代之前调用optimizer.zero\u grad()?还是在迭代中?我认为第二个代码是正确的,但我不确定 nb_epochs = 20 for epoch in range(nb_epochs + 1): optimizer.zero_grad() # THIS PART!! for batch_idx, samples in enumerate(dataloader):
optimizer.zero\u grad()
?还是在迭代中?我认为第二个代码是正确的,但我不确定
nb_epochs = 20
for epoch in range(nb_epochs + 1):
optimizer.zero_grad() # THIS PART!!
for batch_idx, samples in enumerate(dataloader):
x_train, y_train = samples
prediction = model(x_train)
cost = F.mse_loss(prediction, y_train)
cost.backward()
optimizer.step()
print('Epoch {:4d}/{} Batch {}/{} Cost: {:.6f}'.format(
epoch, nb_epochs, batch_idx+1, len(dataloader),
cost.item()
))
或
哪一个是正确的?唯一的区别是优化器的位置。默认情况下,每次在计算图上调用
.backward()
,零梯度(梯度都会累积
在第一个片段中,您将在每个历元中重置一次渐变,以便所有渐变将随时间累积其值。总计len(数据加载器)
累积梯度,仅在下一个历元开始时重新加载梯度在第二段中,您做的是正确的,即在每次向后传递后重置渐变
所以你的假设是对的
有些情况下需要累积梯度,但大多数情况下并非如此
nb_epochs = 20
for epoch in range(nb_epochs + 1):
for batch_idx, samples in enumerate(dataloader):
x_train, y_train = samples
prediction = model(x_train)
optimizer.zero_grad() #THIS PART!!
cost = F.mse_loss(prediction, y_train)
cost.backward()
optimizer.step()
print('Epoch {:4d}/{} Batch {}/{} Cost: {:.6f}'.format(
epoch, nb_epochs, batch_idx+1, len(dataloader),
cost.item()
))