运行时错误:cuDNN错误:cuDNN\u状态\u未在pytorch中初始化

运行时错误:cuDNN错误:cuDNN\u状态\u未在pytorch中初始化,pytorch,nvidia,cudnn,Pytorch,Nvidia,Cudnn,我正在我的新机器上运行CNN算法,使用PyTorch和3个Nvidia GPU,错误如下: RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED File "code.py", line 342, in <module> trainer.fit(model) File "/home/.local/lib/python3.8/site-packages/pytorch_lightni

我正在我的新机器上运行CNN算法,使用PyTorch和3个Nvidia GPU,错误如下:

RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

File "code.py", line 342, in <module>
    trainer.fit(model)

  File "/home/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 514, in fit
    self.dispatch()

  File "/home/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 554, in dispatch
    self.accelerator.start_training(self)

  File "/home/.local/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 74, in start_training
    self.training_type_plugin.start_training(trainer)

  File "/home/.local/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 111, in start_training
    self._results = trainer.run_train()

  File "/home/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 615, in run_train
    self.run_sanity_check(self.lightning_module)

  File "/home/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 864, in run_sanity_check
    _, eval_results = self.run_evaluation(max_batches=self.num_sanity_val_batches)

  File "/home/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 733, in run_evaluation
    output = self.evaluation_loop.evaluation_step(batch, batch_idx, dataloader_idx)

  File "/home/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 164, in evaluation_step
    output = self.trainer.accelerator.validation_step(args)

  File "/home/.local/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 178, in validation_step
    return self.training_type_plugin.validation_step(*args)

  File "/home/.local/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/ddp.py", line 290, in validation_step
    return self.model(*args, **kwargs)

  File "/home/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)

  File "/home/.local/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 705, in forward
    output = self.module(*inputs[0], **kwargs[0])

  File "/home/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)

  File "/home/.local/lib/python3.8/site-packages/pytorch_lightning/overrides/base.py", line 63, in forward
    output = self.module.validation_step(*inputs, **kwargs)

  File code.py", line 314, in validation_step
    pred = self.forward(x)

  File code.py", line 259, in forward
    x = self.conv0(x)          #([12, 600, 600])

  File "/home/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)

  File "/home/.local/lib/python3.8/site-packages/torch/nn/modules/container.py", line 119, in forward
    input = module(input)

  File "/home/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)

  File "/home/.local/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 399, in forward
    return self._conv_forward(input, self.weight, self.bias)

  File "/home/.local/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 395, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,



NVIDIA-MSI:
RuntimeError:cuDNN错误:cuDNN\u状态\u未初始化
文件“code.py”,第342行,在
教练:适合(模型)
文件“/home/.local/lib/python3.8/site packages/pytorch\u lightning/trainer/trainer.py”,第514行,适合
self.dispatch()
文件“/home/.local/lib/python3.8/site packages/pytorch\u lightning/trainer/trainer.py”,第554行,已发送
self.加速器.开始训练(self)
文件“/home/.local/lib/python3.8/site packages/pytorch\u lightning/accelerators/accelerators.py”,第74行,在start\u培训中
self.training\u type\u plugin.start\u training(培训师)
文件“/home/.local/lib/python3.8/site packages/pytorch\u lightning/plugins/training\u type/training\u type\u plugin.py”,第111行,在start\u training中
self.\u results=trainer.run\u train()
文件“/home/.local/lib/python3.8/site packages/pytorch\u lightning/trainer/trainer.py”,第615行,运行中
自我运行检查(自我闪电模块)
文件“/home/.local/lib/python3.8/site packages/pytorch\u lightning/trainer/trainer.py”,第864行,运行中的健康检查
_,eval\u results=self.run\u evaluation(最大批数=self.num\u健全性\u val\u批数)
文件“/home/.local/lib/python3.8/site packages/pytorch\u lightning/trainer/trainer.py”,第733行,运行中评估
输出=self.evaluation\u loop.evaluation\u步骤(批处理、批处理\u idx、数据加载器\u idx)
文件“/home/.local/lib/python3.8/site packages/pytorch\u lightning/trainer/evaluation\u loop.py”,评估步骤第164行
输出=自我培训师加速器验证步骤(args)
文件“/home/.local/lib/python3.8/site packages/pytorch\u lightning/accelerators/accelerators.py”,第178行,在验证步骤中
返回self.training\u type\u plugin.validation\u步骤(*args)
文件“/home/.local/lib/python3.8/site packages/pytorch\u lightning/plugins/training\u type/ddp.py”,第290行,在验证步骤中
返回self.model(*args,**kwargs)
文件“/home/.local/lib/python3.8/site packages/torch/nn/modules/module.py”,第889行,在
结果=自我转发(*输入,**kwargs)
文件“/home/.local/lib/python3.8/site packages/torch/nn/parallel/distributed.py”,第705行,向前
输出=自身模块(*输入[0],**kwargs[0])
文件“/home/.local/lib/python3.8/site packages/torch/nn/modules/module.py”,第889行,在
结果=自我转发(*输入,**kwargs)
文件“/home/.local/lib/python3.8/site packages/pytorch\u lightning/overrides/base.py”,第63行,向前
输出=自我模块验证步骤(*输入,**kwargs)
文件代码.py”,第314行,在验证步骤中
pred=自前向(x)
文件代码.py”,第259行,向前
x=self.conv0(x)#([1260600])
文件“/home/.local/lib/python3.8/site packages/torch/nn/modules/module.py”,第889行,在
结果=自我转发(*输入,**kwargs)
文件“/home/.local/lib/python3.8/site packages/torch/nn/modules/container.py”,第119行,向前
输入=模块(输入)
文件“/home/.local/lib/python3.8/site packages/torch/nn/modules/module.py”,第889行,在
结果=自我转发(*输入,**kwargs)
文件“/home/.local/lib/python3.8/site packages/torch/nn/modules/conv.py”,第399行,向前
返回self.\u conv\u forward(输入、自重、自偏压)
文件“/home/.local/lib/python3.8/site packages/torch/nn/modules/conv.py”,第395行,在_conv_forward中
返回F.conv2d(输入、重量、偏差、自步、,
NVIDIA-MSI:

代码在另一台驱动程序版本为450.51.06和Cuda版本为11的机器上运行,没有任何问题。您可以在上面看到新机器的nvidia smi。我检查了与此问题相同的其他问题的不同意见,没有人解决了我的问题