Pytorch CUDA错误59:触发设备端断言
我使用Pytorch获得上述错误,并使用以下断言:Pytorch CUDA错误59:触发设备端断言,pytorch,gpu,hinge-loss,Pytorch,Gpu,Hinge Loss,我使用Pytorch获得上述错误,并使用以下断言: /opt/conda/conda-bld/pytorch_1565272269120/work/aten/src/ATen/native/cuda/IndexKernel.cu:60: lambda [](int)->auto::operator()(int)->auto: block: [1,0,0], thread: [127,0,0] assertion `index >= -size[i] && in
/opt/conda/conda-bld/pytorch_1565272269120/work/aten/src/ATen/native/cuda/IndexKernel.cu:60: lambda [](int)->auto::operator()(int)->auto: block: [1,0,0], thread: [127,0,0]
assertion `index >= -size[i] && index < size at] && "index out of bounds"` failed
开始训练时一切正常,但经过特定时期的训练后,我通过计算铰链损耗得到CUDA运行时错误
完整错误跟踪:
/opt/conda/conda-bld/pytorch_1565272269120/work/aten/src/ATen/native/cuda/IndexKernel.cu:60: lambda [](int)->auto::operator()(int)->auto: block: [1,0,0], thread: [127,0,0]
Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
Traceback (most recent call last):
File "ours-vision.py", line 1106, in <module>
penalty_erm, penalty_irm, penalty_ws, penalty_same_ctr, penalty_diff_ctr = train( train_dataset, data_match_tensor, label_match_tensor, phi, opt, opt_ws, scheduler, epoch, base_domain_idx, bool_erm, bool_ws, bool_ctr )
File "ours-vision.py", line 688, in train
diff_hinge_loss+= F.hinge_embedding_loss( neg_dist - pos_dist, torch.tensor(-1).to(cuda), args.diff_margin, reduction='sum').to(cuda)
RuntimeError: CUDA error: device-side assert triggered
/opt/conda/conda bld/pytorch_156527229120/work/aten/src/aten/native/cuda/IndexKernel.cu:60:lambda[](int)->auto::operator()(int)->auto:block:[1,0,0],thread:[127,0,0]
断言`index>=-size[i]&&index
/opt/conda/conda-bld/pytorch_1565272269120/work/aten/src/ATen/native/cuda/IndexKernel.cu:60: lambda [](int)->auto::operator()(int)->auto: block: [1,0,0], thread: [127,0,0]
Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
Traceback (most recent call last):
File "ours-vision.py", line 1106, in <module>
penalty_erm, penalty_irm, penalty_ws, penalty_same_ctr, penalty_diff_ctr = train( train_dataset, data_match_tensor, label_match_tensor, phi, opt, opt_ws, scheduler, epoch, base_domain_idx, bool_erm, bool_ws, bool_ctr )
File "ours-vision.py", line 688, in train
diff_hinge_loss+= F.hinge_embedding_loss( neg_dist - pos_dist, torch.tensor(-1).to(cuda), args.diff_margin, reduction='sum').to(cuda)
RuntimeError: CUDA error: device-side assert triggered