更新主机中的GPU驱动程序后,旧docker容器不可用(没有GPU)
今天,我们更新了主机的GPU驱动程序,我们创建的新容器都运行良好。但是,所有现有docker容器在内部运行更新主机中的GPU驱动程序后,旧docker容器不可用(没有GPU),docker,nvidia-docker,Docker,Nvidia Docker,今天,我们更新了主机的GPU驱动程序,我们创建的新容器都运行良好。但是,所有现有docker容器在内部运行nvidia smi命令时都会出现以下错误: 未能初始化NVML:驱动程序/库版本不匹配 如何营救这些旧货柜?主机上以前的GPU驱动程序版本是384.125,现在是430.64 主机配置 nvidia smi +-----------------------------------------------------------------------------+ | NVIDIA-SMI
nvidia smi
命令时都会出现以下错误:
未能初始化NVML:驱动程序/库版本不匹配
如何营救这些旧货柜?主机上以前的GPU驱动程序版本是384.125
,现在是430.64
主机配置
nvidia smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.64 Driver Version: 430.64 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-DGXS... Off | 00000000:07:00.0 On | 0 |
| N/A 40C P0 39W / 300W | 182MiB / 32505MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-DGXS... Off | 00000000:08:00.0 Off | 0 |
| N/A 40C P0 39W / 300W | 12MiB / 32508MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla V100-DGXS... Off | 00000000:0E:00.0 Off | 0 |
| N/A 39C P0 40W / 300W | 12MiB / 32508MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla V100-DGXS... Off | 00000000:0F:00.0 Off | 0 |
| N/A 40C P0 38W / 300W | 12MiB / 32508MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1583 G /usr/lib/xorg/Xorg 169MiB |
+-----------------------------------------------------------------------------+
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176
ii dgx-docker-cleanup 1.0-1 amd64 DGX Docker cleanup script
rc dgx-docker-options 1.0-7 amd64 DGX docker daemon options
ii dgx-docker-repo 1.0-1 amd64 docker repository configuration file
ii docker-ce 5:18.09.2~3-0~ubuntu-xenial amd64 Docker: the open-source application container engine
ii docker-ce-cli 5:18.09.2~3-0~ubuntu-xenial amd64 Docker CLI: the open-source application container engine
ii nvidia-container-runtime 2.0.0+docker18.09.2-1 amd64 NVIDIA container runtime
ii nvidia-docker 1.0.1-1 amd64 NVIDIA Docker container tools
rc nvidia-docker2 2.0.3+docker18.09.2-1 all nvidia-docker CLI wrapper
Client:
Version: 18.09.2
API version: 1.39
Go version: go1.10.6
Git commit: 6247962
Built: Sun Feb 10 04:13:50 2019
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 18.09.2
API version: 1.39 (minimum version 1.12)
Go version: go1.10.6
Git commit: 6247962
Built: Sun Feb 10 03:42:13 2019
OS/Arch: linux/amd64
Experimental: false
nvcc——版本
给出
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.64 Driver Version: 430.64 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-DGXS... Off | 00000000:07:00.0 On | 0 |
| N/A 40C P0 39W / 300W | 182MiB / 32505MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-DGXS... Off | 00000000:08:00.0 Off | 0 |
| N/A 40C P0 39W / 300W | 12MiB / 32508MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla V100-DGXS... Off | 00000000:0E:00.0 Off | 0 |
| N/A 39C P0 40W / 300W | 12MiB / 32508MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla V100-DGXS... Off | 00000000:0F:00.0 Off | 0 |
| N/A 40C P0 38W / 300W | 12MiB / 32508MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1583 G /usr/lib/xorg/Xorg 169MiB |
+-----------------------------------------------------------------------------+
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176
ii dgx-docker-cleanup 1.0-1 amd64 DGX Docker cleanup script
rc dgx-docker-options 1.0-7 amd64 DGX docker daemon options
ii dgx-docker-repo 1.0-1 amd64 docker repository configuration file
ii docker-ce 5:18.09.2~3-0~ubuntu-xenial amd64 Docker: the open-source application container engine
ii docker-ce-cli 5:18.09.2~3-0~ubuntu-xenial amd64 Docker CLI: the open-source application container engine
ii nvidia-container-runtime 2.0.0+docker18.09.2-1 amd64 NVIDIA container runtime
ii nvidia-docker 1.0.1-1 amd64 NVIDIA Docker container tools
rc nvidia-docker2 2.0.3+docker18.09.2-1 all nvidia-docker CLI wrapper
Client:
Version: 18.09.2
API version: 1.39
Go version: go1.10.6
Git commit: 6247962
Built: Sun Feb 10 04:13:50 2019
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 18.09.2
API version: 1.39 (minimum version 1.12)
Go version: go1.10.6
Git commit: 6247962
Built: Sun Feb 10 03:42:13 2019
OS/Arch: linux/amd64
Experimental: false
dpkg-l | grep-i docker
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.64 Driver Version: 430.64 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-DGXS... Off | 00000000:07:00.0 On | 0 |
| N/A 40C P0 39W / 300W | 182MiB / 32505MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-DGXS... Off | 00000000:08:00.0 Off | 0 |
| N/A 40C P0 39W / 300W | 12MiB / 32508MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla V100-DGXS... Off | 00000000:0E:00.0 Off | 0 |
| N/A 39C P0 40W / 300W | 12MiB / 32508MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla V100-DGXS... Off | 00000000:0F:00.0 Off | 0 |
| N/A 40C P0 38W / 300W | 12MiB / 32508MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1583 G /usr/lib/xorg/Xorg 169MiB |
+-----------------------------------------------------------------------------+
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176
ii dgx-docker-cleanup 1.0-1 amd64 DGX Docker cleanup script
rc dgx-docker-options 1.0-7 amd64 DGX docker daemon options
ii dgx-docker-repo 1.0-1 amd64 docker repository configuration file
ii docker-ce 5:18.09.2~3-0~ubuntu-xenial amd64 Docker: the open-source application container engine
ii docker-ce-cli 5:18.09.2~3-0~ubuntu-xenial amd64 Docker CLI: the open-source application container engine
ii nvidia-container-runtime 2.0.0+docker18.09.2-1 amd64 NVIDIA container runtime
ii nvidia-docker 1.0.1-1 amd64 NVIDIA Docker container tools
rc nvidia-docker2 2.0.3+docker18.09.2-1 all nvidia-docker CLI wrapper
Client:
Version: 18.09.2
API version: 1.39
Go version: go1.10.6
Git commit: 6247962
Built: Sun Feb 10 04:13:50 2019
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 18.09.2
API version: 1.39 (minimum version 1.12)
Go version: go1.10.6
Git commit: 6247962
Built: Sun Feb 10 03:42:13 2019
OS/Arch: linux/amd64
Experimental: false
docker版本
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.64 Driver Version: 430.64 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-DGXS... Off | 00000000:07:00.0 On | 0 |
| N/A 40C P0 39W / 300W | 182MiB / 32505MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-DGXS... Off | 00000000:08:00.0 Off | 0 |
| N/A 40C P0 39W / 300W | 12MiB / 32508MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla V100-DGXS... Off | 00000000:0E:00.0 Off | 0 |
| N/A 39C P0 40W / 300W | 12MiB / 32508MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla V100-DGXS... Off | 00000000:0F:00.0 Off | 0 |
| N/A 40C P0 38W / 300W | 12MiB / 32508MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1583 G /usr/lib/xorg/Xorg 169MiB |
+-----------------------------------------------------------------------------+
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176
ii dgx-docker-cleanup 1.0-1 amd64 DGX Docker cleanup script
rc dgx-docker-options 1.0-7 amd64 DGX docker daemon options
ii dgx-docker-repo 1.0-1 amd64 docker repository configuration file
ii docker-ce 5:18.09.2~3-0~ubuntu-xenial amd64 Docker: the open-source application container engine
ii docker-ce-cli 5:18.09.2~3-0~ubuntu-xenial amd64 Docker CLI: the open-source application container engine
ii nvidia-container-runtime 2.0.0+docker18.09.2-1 amd64 NVIDIA container runtime
ii nvidia-docker 1.0.1-1 amd64 NVIDIA Docker container tools
rc nvidia-docker2 2.0.3+docker18.09.2-1 all nvidia-docker CLI wrapper
Client:
Version: 18.09.2
API version: 1.39
Go version: go1.10.6
Git commit: 6247962
Built: Sun Feb 10 04:13:50 2019
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 18.09.2
API version: 1.39 (minimum version 1.12)
Go version: go1.10.6
Git commit: 6247962
Built: Sun Feb 10 03:42:13 2019
OS/Arch: linux/amd64
Experimental: false
我也遇到了这个问题。就我而言,我有这样一句话:
apt install -y nvidia-cuda-toolkit
在我的档案里。删除此行解决了问题。一般来说,我建议在本地机器上使用与驱动程序兼容的驱动程序。我也遇到了这个问题。就我而言,我有这样一句话:
apt install -y nvidia-cuda-toolkit
在我的档案里。删除此行解决了问题。一般来说,我建议您在本地计算机上使用与驱动程序兼容的。如何启动容器?(什么命令)在创建容器时,我使用了
nvidia docker run-it--name
。但是我用我们创建的新容器进行了测试,不管是nvidia docker-run-it
还是docker-run-it--runtime=nvidia
,它都能工作。再次进入容器时,我们使用docker exec-it
我发现的一个解决方法是使用docker commit
从容器中创建一个图像,然后使用该图像创建一个新容器。然后,nvidia smi在新容器中再次工作。我想知道是否有一种方法可以在不执行docker commit的情况下恢复现有的容器。这个问题解决了吗?450.*->460.**我有一个类似的问题,它还没有解决。我当前的解决方法是将容器导出为图像。然后我使用这些图像创建新的容器,新的容器以某种方式工作。但是它非常耗时,而且存储所有这些图像还需要占用大量磁盘空间。您如何启动容器?(什么命令)在创建容器时,我使用了nvidia docker run-it--name
。但是我用我们创建的新容器进行了测试,不管是nvidia docker-run-it
还是docker-run-it--runtime=nvidia
,它都能工作。再次进入容器时,我们使用docker exec-it
我发现的一个解决方法是使用docker commit
从容器中创建一个图像,然后使用该图像创建一个新容器。然后,nvidia smi在新容器中再次工作。我想知道是否有一种方法可以在不执行docker commit的情况下恢复现有的容器。这个问题解决了吗?450.*->460.**我有一个类似的问题,它还没有解决。我当前的解决方法是将容器导出为图像。然后我使用这些图像创建新的容器,新的容器以某种方式工作。但是它非常耗时,而且存储所有这些图像还需要占用大量磁盘空间。