Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/.htaccess/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何使用snapd'在centos/rhel/ol8上启用gpu直通;s lxd/lxc集装箱?_Gpu_Lxc_Pass Through - Fatal编程技术网

如何使用snapd'在centos/rhel/ol8上启用gpu直通;s lxd/lxc集装箱?

如何使用snapd'在centos/rhel/ol8上启用gpu直通;s lxd/lxc集装箱?,gpu,lxc,pass-through,Gpu,Lxc,Pass Through,我在CentOS上部署LXC的指南是安装snapd的lxd SnapD是一种服务类型,它允许安装基于debian/ubuntu的软件包,逻辑是lxd在该平台上是最新的 嗯。如果更容易启用gpu直通,我愿意安装替代版本 https://theorangeone.net/posts/lxc-nvidia-gpu-passthrough/ https://www.reddit.com/r/Proxmox/comments/glog5j/lxc_gpu_passthrough/ 最终,我将尝试构建一

我在CentOS上部署LXC的指南是安装snapd的lxd

SnapD是一种服务类型,它允许安装基于debian/ubuntu的软件包,逻辑是lxd在该平台上是最新的

嗯。如果更容易启用gpu直通,我愿意安装替代版本

https://theorangeone.net/posts/lxc-nvidia-gpu-passthrough/
https://www.reddit.com/r/Proxmox/comments/glog5j/lxc_gpu_passthrough/
最终,我将尝试构建一个容器环境,在这个环境中,我可以运行最新版本的python和jupyter,它支持gpu

我有一些关于如何启用gpu直通的指南

https://theorangeone.net/posts/lxc-nvidia-gpu-passthrough/
https://www.reddit.com/r/Proxmox/comments/glog5j/lxc_gpu_passthrough/
我在ol8主机上添加了以下内核模块

/etc/modules-load.d/vfio-pci.conf
    # Nvidia modules
    nvidia
    nvidia_uvm

#noticed snapd has a modules file I can't edit  

/var/lib/snapd/snap/core18/1988/etc/modules-load.d/modules.conf
            
然后修改grub

nano /etc/default/grub 
    #https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/installation_guide/appe-configuring_a_hypervisor_host_for_pci_passthrough
    GRUB_CMDLINE_LINUX
    #iommu=on amd_iommu=on
    iommu=pt amd_iommu=pt
            
grub2-mkconfig -o /boot/grub2/grub.cfg
然后添加了udev规则

    nano /etc/udev/rules.d/70-nvidia.rules
    KERNEL=="nvidia", RUN+="/bin/bash -c '/usr/bin/nvidia-smi -L && /bin/chmod 666 /dev/nvidia*'"
    KERNEL=="nvidia_uvm", RUN+="/bin/bash -c '/usr/bin/nvidia-modprobe -c0 -u && /bin/chmod 0666 /dev/nvidia-uvm*'"

#reboot
然后将gpu添加到lxc.conf

ls -l /dev/nvidia*

# Allow cgroup access
lxc.cgroup.devices.allow: c 195:* rwm
lxc.cgroup.devices.allow: c 243:* rwm

nano /var/snap/lxd/common/lxd/logs/nvidia-test/lxc.conf
        

# Pass through device files
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none ind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file
在lxc容器内,我启动了(ol8)

当我去运行nvidia smi时

[root@nvidia-test ~]# nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

因为我不能编辑SNAPD模块文件,以为手动复制英伟达内核模块文件并通过它们(使用MMODESPORE决定-显示依赖)

我的容器中有一些诊断信息

[root@nvidia-test ~]# find /sys | grep dmar
find: '/sys/kernel/debug': Permission denied
find: '/sys/fs/pstore': Permission denied
find: '/sys/fs/fuse/connections/59': Permission denied
[root@nvidia-test ~]# lspci | grep -i nvidia
05:00.0 VGA compatible controller: NVIDIA Corporation GP107GL [Quadro P1000] (rev a1)
05:00.1 Audio device: NVIDIA Corporation GP107GL High Definition Audio Controller (rev a1)
所以。。。我还有别的事要做吗?我应该删除snapd lxd并使用OL8提供的默认lxc吗?

找到了答案

#


通过创建LXD
GPU
设备,可以使用GPU传递到LXD容器。此
gpu
设备将共同执行所有必要的任务,以将gpu公开给容器,包括您在上面明确进行的配置

以下是包含所有额外参数的文档(例如,如果有多个GPU,如何区分),

在最简单的形式中,您可以对现有容器运行以下操作,以将默认GPU添加到容器中

在NVidia容器中添加GPU时,还需要将相应的NVidia运行时添加到容器中(以便它与主机上的内核版本匹配!)。在容器中,我们不需要(也不能)添加内核驱动程序,但需要添加运行时(库、实用程序和其他软件)。LXD负责这一点,并为英伟达容器运行时下载相应的版本,并将其附加到容器中。英伟达运行时,这里创建了一个容器,然后将英伟达GPU设备添加到该容器中。
lxc config device add mycontainer mynvidia gpu
$ lxc launch ubuntu: mycontainer -c nvidia.runtime=true -c nvidia.driver.capabilities=all
Creating mycontainer
Starting mycontainer
$ lxc config device add mycontainer mynvidia gpu
Device mynvidia added to mycontainer
$ lxc shell mycontainer
root@mycontainer:~# nvidia-smi 
Mon Mar 15 13:37:24 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.102.04   Driver Version: 450.102.04   CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
...
$ 
如果您经常创建这样的GPU容器,那么可以使用GPU配置创建LXD配置文件。然后,如果您想要一个GPU容器,您可以使用
nvidia
配置文件启动容器,也可以将
nvidia
配置文件应用于现有容器,从而使其成为GPU容器

$ cat mynvidiaLXDprofile.txt
config:
  nvidia.driver.capabilities: all
  nvidia.runtime: "true"
description: ""
devices:
  mygpu:
    type: gpu
name: nvidia
used_by: []
$ lxc profile create nvidia
Profile nvidia created
$ lxc profile edit nvidia < mynvidiaLXDprofile.txt
$ lxc launch ubuntu:20.04 mycontainer --profile default --profile nvidia
Creating mycontainer
Starting mycontainer
$ 
$cat mynvidiaLXDprofile.txt
配置:
nvidia.driver.capabilities:所有
nvidia.runtime:“true”
说明:“”
设备:
mygpu:
类型:gpu
姓名:nvidia
使用人:[]
$lxc配置文件创建nvidia
创建nvidia配置文件
$lxc profile edit nvidia
我们一直在使用LXD的snap包来完成上述所有说明

$ lxc launch ubuntu: mycontainer -c nvidia.runtime=true -c nvidia.driver.capabilities=all
Creating mycontainer
Starting mycontainer
$ lxc config device add mycontainer mynvidia gpu
Device mynvidia added to mycontainer
$ lxc shell mycontainer
root@mycontainer:~# nvidia-smi 
Mon Mar 15 13:37:24 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.102.04   Driver Version: 450.102.04   CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
...
$ 
$ cat mynvidiaLXDprofile.txt
config:
  nvidia.driver.capabilities: all
  nvidia.runtime: "true"
description: ""
devices:
  mygpu:
    type: gpu
name: nvidia
used_by: []
$ lxc profile create nvidia
Profile nvidia created
$ lxc profile edit nvidia < mynvidiaLXDprofile.txt
$ lxc launch ubuntu:20.04 mycontainer --profile default --profile nvidia
Creating mycontainer
Starting mycontainer
$