Pytorch CUDA运行时错误:哪个CUDA版本与使用BERT-NER运行NER任务兼容

Pytorch CUDA运行时错误:哪个CUDA版本与使用BERT-NER运行NER任务兼容,pytorch,ner,huggingface-transformers,bert-language-model,Pytorch,Ner,Huggingface Transformers,Bert Language Model,我已经在我的虚拟机上安装了所有的要求软件包,我发现没有安装nvidia GPU驱动程序,在要求中没有nvidia GPU驱动程序安装说明,我想知道哪个cuda版本和它兼容的nvidia驱动程序也需要解决以下错误 Github链接: 错误日志: File "run_ner.py", line 594, in <module> main() File "run_ner.py", line 489, in main loss = model(input_ids, s

我已经在我的虚拟机上安装了所有的要求软件包,我发现没有安装nvidia GPU驱动程序,在要求中没有nvidia GPU驱动程序安装说明,我想知道哪个cuda版本和它兼容的nvidia驱动程序也需要解决以下错误

Github链接:

错误日志:

  File "run_ner.py", line 594, in <module>
    main()
  File "run_ner.py", line 489, in main
    loss = model(input_ids, segment_ids, input_mask, label_ids,valid_ids,l_mask)
  File "/home/pt3_gcp/BERT-NER/ber_ner/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "run_ner.py", line 35, in forward
    valid_output = torch.zeros(batch_size,max_len,feat_dim,dtype=torch.float32,device='cuda')
  File "/home/pt3_gcp/BERT-NER/ber_ner/lib/python3.7/site-packages/torch/cuda/__init__.py", line 178, in _lazy_init
    _check_driver()
  File "/home/pt3_gcp/BERT-NER/ber_ner/lib/python3.7/site-packages/torch/cuda/__init__.py", line 99, in _check_driver
    http://www.nvidia.com/Download/index.aspx""")
AssertionError: 
**Found no NVIDIA driver on your system. Please check that you
have an NVIDIA GPU and installed a driver from
http://www.nvidia.com/Download/index.aspx
**
文件“run_ner.py”,第594行,在
main()
文件“run_ner.py”,第489行,在main中
损失=模型(输入\u ID、段\u ID、输入\u掩码、标签\u ID、有效\u ID、l\u掩码)
文件“/home/pt3_gcp/BERT-NER/ber_-NER/lib/python3.7/site packages/torch/nn/modules/module.py”,第547行,in_uu调用__
结果=自我转发(*输入,**kwargs)
文件“run_ner.py”,第35行,向前
有效输出=火炬.0(批量大小,最大长度,专长尺寸,数据类型=火炬.float32,设备='cuda')
文件“/home/pt3_gcp/BERT-NER/ber_NER/lib/python3.7/site packages/torch/cuda/__init__.py”,第178行,in_lazy_init
_检查驱动程序()
文件“/home/pt3_gcp/BERT-NER/ber_-NER/lib/python3.7/site packages/torch/cuda/_init_uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
http://www.nvidia.com/Download/index.aspx""")
断言者错误:
**在您的系统上找不到NVIDIA驱动程序。请检查您是否
拥有NVIDIA GPU并从安装了驱动程序
http://www.nvidia.com/Download/index.aspx
**
从以下链接安装最新cuda版本后, 我犯了以下错误

06/04/2020 07:38:40 - INFO - __main__ -   ***** Running training *****
06/04/2020 07:38:40 - INFO - __main__ -     Num examples = 14041
06/04/2020 07:38:40 - INFO - __main__ -     Batch size = 32
06/04/2020 07:38:40 - INFO - __main__ -     Num steps = 2190
Epoch:   0%|                                                                                 | 0/5 [00:00<?, ?it/sTHCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=50 error=38 : no CUDA-capable device is detectedt/s]
Traceback (most recent call last):
  File "run_ner.py", line 594, in <module>
    main()
  File "run_ner.py", line 489, in main
    loss = model(input_ids, segment_ids, input_mask, label_ids,valid_ids,l_mask)
  File "/home/pt3_gcp/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "run_ner.py", line 35, in forward
    valid_output = torch.zeros(batch_size,max_len,feat_dim,dtype=torch.float32,device='cuda')
  File "/home/pt3_gcp/.local/lib/python3.7/site-packages/torch/cuda/__init__.py", line 179, in _lazy_init
    torch._C._cuda_init()
RuntimeError: cuda runtime error (38) : no CUDA-capable device is detected at /pytorch/aten/src/THC/THCGeneral.cpp:50
06/04/2020 07:38:40-信息-主要内容-跑步训练*****
2020年4月6日07:38:40-INFO-main-Num示例=14041
2020年4月6日07:38:40-信息-主批次大小=32
2020年4月6日07:38:40-信息-主要步骤数=2190

Epoch:0%| | 0/5[00:00一段时间前我遇到了相同的问题。以下命令为我修复了问题

这是一个问题,如果你有多个安装,很可能你现在已经尝试了很多东西。基本上删除所有东西

sudo apt-get purge nvidia-*
sudo apt-get remove nvidia-cuda-toolkit
sudo apt autoremove --purge cuda-10-0 // you might have a different version, check it git cuda --version
同时删除用户群中的现有文件

rm -rf /usr/local/cuda* // anything related to cuda
rm -rf /usr/local/nvidia* // anything related to nvidia
现在,终于有了新的安装

sudo apt-get update // update your packages

sudo apt search nvidia-driver  // to get the latest version of the driver. After finding out the latest version, install it with

sudo apt install nvidia-driver-450 (or any other number, depending on the latest version) 
安装后,必须重新启动

sudo reboot

<> MichaelJungo > Navdia Smith

>,您的GPU/P>安装英伟达驱动程序后,是否重新启动VM?@Yes@MichaelJungo此错误是否会影响从sudo apt获取安装cuda-driver-440的最新cuda驱动程序?或者哪个驱动程序版本更可取?$nvidia smi nvidia-smi已失败因为英伟达无法与NVIDIA驱动程序通信。请确保最新的NVIDIA驱动程序已安装和运行。是的,但得到同样的,英伟达SMI Nvidia-SMI失败了,因为它无法与NVIDIA驱动程序通信。确保最新的NVIDIA驱动程序已安装和运行。