Deep learning 如果设置CUDA可见设备，则获取培训错误_Deep Learning_Pytorch

Deep learning 如果设置CUDA可见设备，则获取培训错误

deep-learning pytorch

Deep learning 如果设置CUDA可见设备，则获取培训错误,deep-learning,pytorch,Deep Learning,Pytorch,我正在使用PyTorch和Cuda 10.1。如果我在训练中设置CUDA可视设备，损失总是NAN，如果我不设置CUDA可视设备，一切都正常。有人知道问题出在哪里吗？可能有一些张量不匹配，转移到gpu上，有些在cpu上，cuda无法使用该张量。cuda\u VISIBLE\u设备是存储在cuda文件中的操作系统级变量。它控制哪些计算机的GPU可用于执行CUDA计算。必须在运行代码之前设置它如果您试图控制Pytork是否使用GPU以及使用哪些GPU，则应使用内置的Pytork.cuda包进行设备管

我正在使用PyTorch和Cuda 10.1。如果我在训练中设置CUDA可视设备，损失总是NAN，如果我不设置CUDA可视设备，一切都正常。有人知道问题出在哪里吗？

可能有一些张量不匹配，转移到gpu上，有些在cpu上，cuda无法使用该张量。

cuda\u VISIBLE\u设备是存储在cuda文件中的操作系统级变量。它控制哪些计算机的GPU可用于执行CUDA计算。必须在运行代码之前设置它

如果您试图控制Pytork是否使用GPU以及使用哪些GPU，则应使用内置的

Pytork.cuda

包进行设备管理

 import torch

 n_gpus = torch.cuda.device_count()

 if n_gpus > 0:
      device = torch.device("cuda:0") # first device as indexed by pytorch cuda
      print("cuda:0 is device {}".format(torch.cuda.get_device_name(device))) # prints name of device

 if n_gpus > 1:  # if you have more than one device, and so on
      device2 = torch.device("cuda:1")
      print("cuda:1 is device {}".format(torch.cuda.get_device_name(device2)))

 # from here, decide which device you want to use and
 # transfer files to this device accordingly
 model.to(device)
 x.to(device2)
 # etc.

您希望使用CUDA_VISIBLE_设备的唯一原因是，如果您有多个GPU，并且您需要其中一些GPU可用于CUDA/Pytorch任务，而其他GPU可用于非CUDA任务，并且担心torch.CUDA包在注册为Pytorch设备时在GPU上消耗的GPU内存太少。对于大多数应用程序，这是不必要的，您只需使用pytorch的设备管理即可。

请共享您的代码

 import torch

 n_gpus = torch.cuda.device_count()

 if n_gpus > 0:
      device = torch.device("cuda:0") # first device as indexed by pytorch cuda
      print("cuda:0 is device {}".format(torch.cuda.get_device_name(device))) # prints name of device

 if n_gpus > 1:  # if you have more than one device, and so on
      device2 = torch.device("cuda:1")
      print("cuda:1 is device {}".format(torch.cuda.get_device_name(device2)))

 # from here, decide which device you want to use and
 # transfer files to this device accordingly
 model.to(device)
 x.to(device2)
 # etc.