system76 ubuntu 20.04 tensorflow gpu cuda版本冲突

system76 ubuntu 20.04 tensorflow gpu cuda版本冲突,tensorflow,ubuntu,cuda,cudnn,Tensorflow,Ubuntu,Cuda,Cudnn,从18.04升级到Ubuntu20.04后,Tensorflow不再能够使用我的gpu,因为它试图混合和加载不同的版本(大约10和11)。这是一台System76机器,我已经从System76安装了cuda 10.1(因此它可以与System76 nvidia驱动程序一起工作)。运行tensorflow时,会出现以下错误: 2021-01-07 18:12:22.584886: W tensorflow/stream_executor/platform/default/dso_loader.cc

从18.04升级到Ubuntu20.04后,Tensorflow不再能够使用我的gpu,因为它试图混合和加载不同的版本(大约10和11)。这是一台System76机器,我已经从System76安装了cuda 10.1(因此它可以与System76 nvidia驱动程序一起工作)。运行tensorflow时,会出现以下错误:

2021-01-07 18:12:22.584886: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-01-07 18:12:22.584906: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-01-07 18:12:23.640665: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-01-07 18:12:23.641412: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-01-07 18:12:23.669966: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-01-07 18:12:23.670257: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 1060 computeCapability: 6.1
coreClock: 1.733GHz coreCount: 10 deviceMemorySize: 5.93GiB deviceMemoryBandwidth: 178.99GiB/s
2021-01-07 18:12:23.670328: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-01-07 18:12:23.670379: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
2021-01-07 18:12:23.670425: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory
2021-01-07 18:12:23.671387: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-01-07 18:12:23.671667: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-01-07 18:12:23.673022: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-01-07 18:12:23.673100: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory
2021-01-07 18:12:23.673245: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-01-07 18:12:23.673259: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU.
请注意,所有警告都是针对尝试加载Cuda版本11而发出的,但仅适用于部分库。版本10的加载良好

这是nvcc--version的输出

这是nvidia smi的输出

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.38       Driver Version: 455.38       CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 1060    Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   53C    P0    26W /  N/A |    585MiB /  6069MiB |      4%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2999      G   /usr/lib/xorg/Xorg                101MiB |
|    0   N/A  N/A      3479      G   /usr/lib/xorg/Xorg                255MiB |
|    0   N/A  N/A      3720      G   /usr/bin/gnome-shell               88MiB |
|    0   N/A  N/A      6487      G   ...AAAAAAAA== --shared-files       45MiB |
|    0   N/A  N/A      6959      G   ...AAAAAAAA== --shared-files       40MiB |
|    0   N/A  N/A     11642      G   ...AAAAAAAA== --shared-files       21MiB |
|    0   N/A  N/A     25206      G   WickrMe                            17MiB |
+-----------------------------------------------------------------------------+
我看到nvidia smi输出中的驱动程序版本是版本11,但据我所知,这与cuda运行时无关。这只是驱动程序支持的版本。如果我错了,请纠正我

我必须使用版本10,因为这是System76所支持的,并且在升级之前运行良好。我也尝试过通过pip3卸载和重新安装Tensorflow,但没有成功


有人知道如何使所有库与10.1版同步吗?我还试图手动将版本11库放置到位,并让Tensorflow使用混合版本(这当然是个坏主意),但它无法识别它们(或者我没有正确放置它们)。

正如@Talonmes指出的,我误解了版本控制系统。然而,由于它是System76机器,这也令人困惑,因为System76使用自己的Nvidia驱动程序,而且安装Cuda 11和Cudnn并不简单。我正在发布答案,以防其他人在System76上遇到问题

首先,不要将System76安装用于Cuda和Cudnn。他们有自己的版本(在他们的网站上),以便与他们的Nvidia驱动程序兼容,但他们不会工作(他们是版本10,TF2.2+需要11)。此外,大多数通用Cuda指南会告诉您先卸载/安装Nvida驱动程序,以便进行干净的安装,但如果您有System76系统,则不要这样做。别碰System76司机。此外,如果您以前有任何Cuda/Cudnn,请将其全部删除/卸载

前往Nvidia获取他们最新的Cuda和Cudnn。我曾经

wget http://developer.download.nvidia.com/compute/cuda/11.0.2/local_installers/cuda_11.0.2_450.51.05_linux.run
跟我一起跑

sudo sh cuda_11.0.2_450.51.05_linux.run
当它运行时,它会告诉您与驱动程序包有冲突。忽略这一点,继续。当您进入安装菜单时,取消选中“安装驱动程序”并继续安装。完成后,添加到您的路径

/usr/local/cuda-11.0:/usr/local/cuda-11.0/bin:
您需要同时添加cuda根目录和bin,而不仅仅是bin(这与大多数一般说明不同)。将.bashrc或.profile或路径添加到任何位置(或打开新终端)

现在安装Cudnn

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/libcudnn8_8.0.5.39-1+cuda11.0_amd64.deb
用dpkg安装它。例如(就我而言)

就这样。一旦我完成了这一切,一切都很顺利。希望这能帮助一些System76的人更轻松地通过Ununtu 20.04和Cuda 11。

非常感谢。 我使用POP OS的原因之一是英伟达驱动程序+CUDA/CUDNN只与TysFooFi一起工作,直到11版本缺少这个问题。 在使用上述方法安装cuda 11.0时,我需要的一件事是安装gcc版本8:

sudo apt -y install gcc-8 g++-8
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-8 8
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-8 8

我真的希望爸爸_操作系统将直接提供CUDA 11.0软件包……

您必须在当前的tensorflow版本中使用CUDA 11.0。你的困惑是因为误解了CUDA11.0的内容并非都是11.x@talonmies版本谢谢。你说得对。根据你的评论,我能够解决这个问题。
sudo dpkg -i libcudnn8_8.0.5.39-1+cuda11.0_amd64.deb
sudo apt -y install gcc-8 g++-8
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-8 8
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-8 8