Cuda 从docker容器使用GPU?

Cuda 从docker容器使用GPU?,cuda,docker,Cuda,Docker,我正在寻找一种从docker容器中使用GPU的方法 容器将执行任意代码,因此我不想使用特权模式 有什么建议吗 从以前的研究中,我了解到run-v和/或LXCcgroup是一条可行的道路,但我不确定如何准确地实现这一点好的,我最终成功地做到了,没有使用--privileged模式 我在ubuntu服务器14.04上运行,使用的是最新的cuda(linux 13.04 64位版本为6.0.37) 准备工作 在主机上安装nvidia驱动程序和cuda。(这可能有点棘手,因此我建议您遵循本指南) 注意

我正在寻找一种从docker容器中使用GPU的方法

容器将执行任意代码,因此我不想使用特权模式

有什么建议吗


从以前的研究中,我了解到
run-v
和/或LXC
cgroup
是一条可行的道路,但我不确定如何准确地实现这一点

好的,我最终成功地做到了,没有使用--privileged模式

我在ubuntu服务器14.04上运行,使用的是最新的cuda(linux 13.04 64位版本为6.0.37)


准备工作 在主机上安装nvidia驱动程序和cuda。(这可能有点棘手,因此我建议您遵循本指南)

注意:保存用于主机cuda安装的文件非常重要


让Docker守护进程使用lxc运行 我们需要使用lxc驱动程序运行docker守护程序,以便能够修改配置并允许容器访问设备

一次性利用率:

sudo service docker stop
sudo docker -d -e lxc
永久性配置 修改位于/etc/default/docker中的docker配置文件 通过添加'-e lxc'更改行DOCKER_选项 这是我修改后的线路

DOCKER_OPTS="--dns 8.8.8.8 --dns 8.8.4.4 -e lxc"
然后使用

sudo service docker restart
如何检查守护进程是否有效使用lxc驱动程序?

docker info
执行驱动程序行应该如下所示:

Execution Driver: lxc-1.0.5

用英伟达和CUDA驱动程序构建你的形象。 下面是一个基本的Dockerfile,用于构建与CUDA兼容的映像

FROM ubuntu:14.04
MAINTAINER Regan <http://stackoverflow.com/questions/25185405/using-gpu-from-a-docker-container>

RUN apt-get update && apt-get install -y build-essential
RUN apt-get --purge remove -y nvidia*

ADD ./Downloads/nvidia_installers /tmp/nvidia                             > Get the install files you used to install CUDA and the NVIDIA drivers on your host
RUN /tmp/nvidia/NVIDIA-Linux-x86_64-331.62.run -s -N --no-kernel-module   > Install the driver.
RUN rm -rf /tmp/selfgz7                                                   > For some reason the driver installer left temp files when used during a docker build (i don't have any explanation why) and the CUDA installer will fail if there still there so we delete them.
RUN /tmp/nvidia/cuda-linux64-rel-6.0.37-18176142.run -noprompt            > CUDA driver installer.
RUN /tmp/nvidia/cuda-samples-linux-6.0.37-18176142.run -noprompt -cudaprefix=/usr/local/cuda-6.0   > CUDA samples comment if you don't want them.
RUN export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64         > Add CUDA library into your PATH
RUN touch /etc/ld.so.conf.d/cuda.conf                                     > Update the ld.so.conf.d directory
RUN rm -rf /temp/*  > Delete installer files.
如果结果为空,请使用在主机上启动其中一个示例。 结果应该是这样的 如您所见,在组和日期之间有一组2个数字。 这两个数字被称为主数字和副数字(按该顺序书写)并设计一个设备。 为了方便起见,我们只使用主要数字

为什么要激活lxc驱动程序? 使用lxc-conf选项,允许我们的容器访问这些设备。 该选项是:(我建议使用*作为次要数字,因为它减少了run命令的长度)

--lxc conf='lxc.cgroup.devices.allow=c[主要编号]:[次要编号或*]rwm'

因此,如果我想启动一个容器(假设您的图像名为cuda)


Regan的答案很好,但有点过时,因为正确的方法是避免使用lxc执行上下文,因为Docker将其作为Docker 0.9的默认执行上下文

<> P>而是通过设备标志告诉Novidia英伟达设备,而只使用本地执行上下文而不是LXC.

环境 这些说明在以下环境中进行了测试:

  • Ubuntu 14.04
  • CUDA 6.5
  • AWS GPU实例
在主机上安装nvidia驱动程序和cuda 请参阅以获取主机设置

安装Docker 查找您的nvidia设备 运行预装nvidia驱动程序的Docker容器 我已经创建了一个预装了cuda驱动程序的。如果您想知道此映像是如何构建的,可以在dockerhub上找到

您需要自定义此命令以匹配nvidia设备。以下是对我有效的方法:

 $ sudo docker run -ti --device /dev/nvidia0:/dev/nvidia0 --device /dev/nvidiactl:/dev/nvidiactl --device /dev/nvidia-uvm:/dev/nvidia-uvm tleyden5iwx/ubuntu-cuda /bin/bash
验证CUDA是否正确安装 这应该从刚刚启动的docker容器内部运行

安装CUDA示例:

$ cd /opt/nvidia_installers
$ ./cuda-samples-linux-6.5.14-18745345.run -noprompt -cudaprefix=/usr/local/cuda-6.5/
生成设备查询示例:

$ cd /usr/local/cuda/samples/1_Utilities/deviceQuery
$ make
$ ./deviceQuery   
如果一切正常,您将看到以下输出:

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.5, CUDA Runtime Version = 6.5, NumDevs =    1, Device0 = GRID K520
Result = PASS

我们刚刚发布了一个实验版本,可以简化在Docker容器中使用NVIDIA GPU的过程。

在ubuntu 16.04上更新了cuda-8.0
  • 安装docker

  • > p>构建以下图像,包括英伟达驱动程序和CUDA工具包

Dockerfile

FROM ubuntu:16.04
MAINTAINER Jonathan Kosgei <jonathan@saharacluster.com>

# A docker container with the Nvidia kernel module and CUDA drivers installed

ENV CUDA_RUN https://developer.nvidia.com/compute/cuda/8.0/prod/local_installers/cuda_8.0.44_linux-run

RUN apt-get update && apt-get install -q -y \
  wget \
  module-init-tools \
  build-essential 

RUN cd /opt && \
  wget $CUDA_RUN && \
  chmod +x cuda_8.0.44_linux-run && \
  mkdir nvidia_installers && \
  ./cuda_8.0.44_linux-run -extract=`pwd`/nvidia_installers && \
  cd nvidia_installers && \
  ./NVIDIA-Linux-x86_64-367.48.run -s -N --no-kernel-module

RUN cd /opt/nvidia_installers && \
  ./cuda-linux64-rel-8.0.44-21122537.run -noprompt

# Ensure the CUDA libs and binaries are in the correct environment variables
ENV LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-8.0/lib64
ENV PATH=$PATH:/usr/local/cuda-8.0/bin

RUN cd /opt/nvidia_installers &&\
    ./cuda-samples-linux-8.0.44-21122537.run -noprompt -cudaprefix=/usr/local/cuda-8.0 &&\
    cd /usr/local/cuda/samples/1_Utilities/deviceQuery &&\ 
    make

WORKDIR /usr/local/cuda/samples/1_Utilities/deviceQuery
来自ubuntu:16.04的

维修员乔纳森·科斯吉
一个带有英伟达内核模块和CUDA驱动程序的坞箱
环境库达尤酒店https://developer.nvidia.com/compute/cuda/8.0/prod/local_installers/cuda_8.0.44_linux-run
运行apt-get-update&&apt-get-install-q-y\
wget\
模块初始化工具\
建立必要的
运行cd/opt&&\
wget$CUDA_运行和\
chmod+x cuda_8.0.44_linux-run&&\
mkdir nvidia_安装程序和\
./cuda_8.0.44_linux-run-extract=`pwd`/nvidia_安装程序&&\
cd nvidia_安装程序和\
./NVIDIA-Linux-x86_64-367.48.run-s-N--无内核模块
运行cd/opt/nvidia_安装程序&&\
./cuda-linux64-rel-8.0.44-21122537.run-无提示
#确保CUDA LIB和二进制文件位于正确的环境变量中
环境库路径=$LD\u库路径:/usr/local/cuda-8.0/lib64
环境路径=$PATH:/usr/local/cuda-8.0/bin
运行cd/opt/nvidia_安装程序&&\
./cuda-samples-linux-8.0.44-21122537.run-noprompt-cudaprefix=/usr/local/cuda-8.0&&\
cd/usr/local/cuda/samples/1_实用程序/设备查询和
制作
WORKDIR/usr/local/cuda/samples/1_实用程序/设备查询
  • 运行您的容器
  • sudo docker run-ti--device/dev/nvidia0:/dev/nvidia0--device/dev/nvidiact:/dev/nvidiact--device/dev/nvidiauvm:/dev/nvidiauvm./deviceQuery
    

    您应该看到类似以下内容的输出:

    deviceQuery,CUDA驱动程序=CUDART,CUDA驱动程序版本=8.0,CUDA运行时版本=8.0,NumDevs=1,设备0=GRID K520
    
    Result=PASS

    NVIDIA最近的增强功能提供了一种更强大的方法来实现这一点

    实际上,他们已经找到了一种方法来避免在容器中安装CUDA/GPU驱动程序,并使其与主机内核模块匹配

    相反,驱动程序在主机上,容器不需要它们。 现在需要修改docker cli

    这很好,因为现在容器更加便携

    Ubuntu上的快速测试:

    # Install nvidia-docker and nvidia-docker-plugin
    wget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.1/nvidia-docker_1.0.1-1_amd64.deb
    sudo dpkg -i /tmp/nvidia-docker*.deb && rm /tmp/nvidia-docker*.deb
    
    # Test nvidia-smi
    nvidia-docker run --rm nvidia/cuda nvidia-smi
    
    有关更多详细信息,请参阅:
    和:

    要从docker容器使用GPU,而不是使用本机docker,请使用Nvidia docker。要安装Nvidia docker,请使用以下命令

    curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey |  sudo apt-key add -
    curl -s -L https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64/nvidia-
    docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
    sudo apt-get update
    sudo apt-get install -y nvidia-docker
    sudo pkill -SIGHUP dockerd # Restart Docker Engine
    sudo nvidia-docker run --rm nvidia/cuda nvidia-smi # finally run nvidia-smi in the same container
    
    mviereck使用:

    硬件加速<
    deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.5, CUDA Runtime Version = 6.5, NumDevs =    1, Device0 = GRID K520
    Result = PASS
    
    FROM ubuntu:16.04
    MAINTAINER Jonathan Kosgei <jonathan@saharacluster.com>
    
    # A docker container with the Nvidia kernel module and CUDA drivers installed
    
    ENV CUDA_RUN https://developer.nvidia.com/compute/cuda/8.0/prod/local_installers/cuda_8.0.44_linux-run
    
    RUN apt-get update && apt-get install -q -y \
      wget \
      module-init-tools \
      build-essential 
    
    RUN cd /opt && \
      wget $CUDA_RUN && \
      chmod +x cuda_8.0.44_linux-run && \
      mkdir nvidia_installers && \
      ./cuda_8.0.44_linux-run -extract=`pwd`/nvidia_installers && \
      cd nvidia_installers && \
      ./NVIDIA-Linux-x86_64-367.48.run -s -N --no-kernel-module
    
    RUN cd /opt/nvidia_installers && \
      ./cuda-linux64-rel-8.0.44-21122537.run -noprompt
    
    # Ensure the CUDA libs and binaries are in the correct environment variables
    ENV LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-8.0/lib64
    ENV PATH=$PATH:/usr/local/cuda-8.0/bin
    
    RUN cd /opt/nvidia_installers &&\
        ./cuda-samples-linux-8.0.44-21122537.run -noprompt -cudaprefix=/usr/local/cuda-8.0 &&\
        cd /usr/local/cuda/samples/1_Utilities/deviceQuery &&\ 
        make
    
    WORKDIR /usr/local/cuda/samples/1_Utilities/deviceQuery
    
    # Install nvidia-docker and nvidia-docker-plugin
    wget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.1/nvidia-docker_1.0.1-1_amd64.deb
    sudo dpkg -i /tmp/nvidia-docker*.deb && rm /tmp/nvidia-docker*.deb
    
    # Test nvidia-smi
    nvidia-docker run --rm nvidia/cuda nvidia-smi
    
    curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey |  sudo apt-key add -
    curl -s -L https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64/nvidia-
    docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
    sudo apt-get update
    sudo apt-get install -y nvidia-docker
    sudo pkill -SIGHUP dockerd # Restart Docker Engine
    sudo nvidia-docker run --rm nvidia/cuda nvidia-smi # finally run nvidia-smi in the same container
    
    x11docker --gpu imagename
    
    $ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
    $ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | sudo tee /etc/yum.repos.d/nvidia-docker.repo
    
    $ sudo yum install -y nvidia-container-toolkit
    $ sudo systemctl restart docker
    
    # Add the package repositories
    $ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
    $ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
    $ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
    
    $ sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
    $ sudo systemctl restart docker
    
    docker run --name my_all_gpu_container --gpus all -t nvidia/cuda
    
    docker run --name my_first_gpu_container --gpus device=0 nvidia/cuda
    
    docker run --name my_first_gpu_container --gpus '"device=0"' nvidia/cuda
    
    mkdir ~/cuda11
    cd ~/cuda11
    
    echo "FROM nvidia/cuda:11.0-cudnn8-devel-ubuntu18.04" > Dockerfile
    echo "CMD [\"/bin/bash\"]" >> Dockerfile
    
    docker build --tag mirekphd/cuda11 .
    
    docker run --rm -it --gpus 1 mirekphd/cuda11 nvidia-smi
    
    
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 450.57       Driver Version: 450.57       CUDA Version: 11.0     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  GeForce GTX 108...  Off  | 00000000:01:00.0  On |                  N/A |
    |  0%   50C    P8    17W / 280W |    409MiB / 11177MiB |      7%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    
    FROM sidazhou/scipy-notebook:latest
    # FROM ubuntu:18.04 
    
    ###########################################################################
    # See https://gitlab.com/nvidia/container-images/cuda/-/blob/master/dist/10.1/ubuntu18.04-x86_64/base/Dockerfile
    # See https://sarus.readthedocs.io/en/stable/user/custom-cuda-images.html
    ###########################################################################
    USER root
    
    ###########################################################################
    # base
    RUN apt-get update && apt-get install -y --no-install-recommends \
        gnupg2 curl ca-certificates && \
        curl -fsSL https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub | apt-key add - && \
        echo "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 /" > /etc/apt/sources.list.d/cuda.list && \
        echo "deb https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 /" > /etc/apt/sources.list.d/nvidia-ml.list && \
        apt-get purge --autoremove -y curl \
        && rm -rf /var/lib/apt/lists/*
    
    ENV CUDA_VERSION 10.1.243
    ENV CUDA_PKG_VERSION 10-1=$CUDA_VERSION-1
    
    # For libraries in the cuda-compat-* package: https://docs.nvidia.com/cuda/eula/index.html#attachment-a
    RUN apt-get update && apt-get install -y --no-install-recommends \
        cuda-cudart-$CUDA_PKG_VERSION \
        cuda-compat-10-1 \
        && ln -s cuda-10.1 /usr/local/cuda && \
        rm -rf /var/lib/apt/lists/*
    
    # Required for nvidia-docker v1
    RUN echo "/usr/local/nvidia/lib" >> /etc/ld.so.conf.d/nvidia.conf && \
        echo "/usr/local/nvidia/lib64" >> /etc/ld.so.conf.d/nvidia.conf
    
    ENV PATH /usr/local/nvidia/bin:/usr/local/cuda/bin:${PATH}
    ENV LD_LIBRARY_PATH /usr/local/nvidia/lib:/usr/local/nvidia/lib64
    
    
    ###########################################################################
    #runtime next
    ENV NCCL_VERSION 2.7.8
    
    RUN apt-get update && apt-get install -y --no-install-recommends \
        cuda-libraries-$CUDA_PKG_VERSION \
        cuda-npp-$CUDA_PKG_VERSION \
        cuda-nvtx-$CUDA_PKG_VERSION \
        libcublas10=10.2.1.243-1 \
        libnccl2=$NCCL_VERSION-1+cuda10.1 \
        && apt-mark hold libnccl2 \
        && rm -rf /var/lib/apt/lists/*
    
    # apt from auto upgrading the cublas package. See https://gitlab.com/nvidia/container-images/cuda/-/issues/88
    RUN apt-mark hold libcublas10
    
    
    ###########################################################################
    #cudnn7 (not cudnn8) next
    
    ENV CUDNN_VERSION 7.6.5.32
    
    RUN apt-get update && apt-get install -y --no-install-recommends \
        libcudnn7=$CUDNN_VERSION-1+cuda10.1 \
        && apt-mark hold libcudnn7 && \
        rm -rf /var/lib/apt/lists/*
    
    
    ENV NVIDIA_VISIBLE_DEVICES all
    ENV NVIDIA_DRIVER_CAPABILITIES all
    ENV NVIDIA_REQUIRE_CUDA "cuda>=10.1"
    
    
    ###########################################################################
    #docker build -t sidazhou/scipy-notebook-gpu:latest .
    
    #docker run -itd -gpus all\
    #  -p 8888:8888 \
    #  -p 6006:6006 \
    #  --user root \
    #  -e NB_UID=$(id -u) \
    #  -e NB_GID=$(id -g) \
    #  -e GRANT_SUDO=yes \
    #  -v ~/workspace:/home/jovyan/work \
    #  --name sidazhou-jupyter-gpu \
    #  sidazhou/scipy-notebook-gpu:latest
    
    #docker exec sidazhou-jupyter-gpu python -c "import tensorflow as tf; print(tf.config.experimental.list_physical_devices('GPU'))"