Python 3.x 经常从内存中取出CUDA错误?
我已经编写了一个用于培训深度学习模型的代码,在该代码中,我将在每批之后删除cuda张量,然后再执行Python 3.x 经常从内存中取出CUDA错误?,python-3.x,deep-learning,pytorch,Python 3.x,Deep Learning,Pytorch,我已经编写了一个用于培训深度学习模型的代码,在该代码中,我将在每批之后删除cuda张量,然后再执行torch.cuda.empty_cache()。我非常确定批量大小不足以导致此错误。可能的原因是什么 for epoch in range(1+last_epoch, self.num_epochs+1): for phase in ['train', 'val']: loss_arr = [] if phase == 'tr
torch.cuda.empty_cache()
。我非常确定批量大小不足以导致此错误。可能的原因是什么
for epoch in range(1+last_epoch, self.num_epochs+1):
for phase in ['train', 'val']:
loss_arr = []
if phase == 'train':
model.train()
scheduler.step()
was_training = True
else:
model.eval()
was_training = False
for i_batch, sample_batched in enumerate(dataloaders[phase]):
X = sample_batched[0]
y = sample_batched[1].type(torch.LongTensor)
w = sample_batched[2]
if model.is_cuda:
X, y, w = X.cuda(non_blocking=True), y.cuda(non_blocking=True), w.cuda(non_blocking=True)
output = model(X)
loss = self.loss_func(output, y, w)
if phase == 'train':
curr_iteration+=1
optim.zero_grad()
loss.backward()
optim.step()
if (curr_iteration % log_nth == 0):
self.logWriter.loss_per_iter(loss.item(), curr_iteration)
loss_arr.append(loss.item())
with torch.no_grad():
self.logWriter.update_cm_per_iter(output, y, self.labels, phase)
del X, y, w, output, loss
torch.cuda.empty_cache()
self.logWriter.loss_per_epoch(loss_arr, phase, epoch)
epoch_output, epoch_labels = model.predict(dataloaders[phase].dataset.X), dataloaders[phase].dataset.y
self.logWriter.dice_score_per_epoch(epoch_output, epoch_labels, phase, epoch)
index = np.random.choice(len(dataloaders[phase].dataset), 3, replace=False)
self.logWriter.image_per_epoch(epoch_output[index], epoch_labels[index], phase, epoch)
self.logWriter.cm_per_epoch(self.labels, phase, epoch, i_batch)
del epoch_output, epoch_labels
print("==== Epoch ["+str(epoch)+" / "+str(self.num_epochs)+"] done ====")
model.save('models/' + self.exp_dir_name + '/quicknat_epoch' + str(epoch) + '.model')
模型中的perdict函数如下
def predict(self, X, enable_dropout = False):
"""
Predicts the outout after the model is trained.
Inputs:
- X: Volume to be predicted
"""
self.eval()
if type(X) is np.ndarray:
X = torch.tensor(X, requires_grad = False).cuda(non_blocking=True)
elif type(X) is torch.Tensor and not X.is_cuda:
X = X.cuda(non_blocking=True)
if enable_dropout:
self.enable_test_dropout()
with torch.no_grad():
out = self.forward(X)
max_val, idx = torch.max(out,1)
idx = idx.data.cpu().numpy()
prediction = np.squeeze(idx)
del X, out, idx, max_val
return prediction
我意识到我正在为每个历元之后的预测提供整个数据集。按批处理解决了问题。请添加相关内容code@TimH添加了代码