Python 深入学习Nvidia 1070 Ti Ubuntu 18.04

Python 深入学习Nvidia 1070 Ti Ubuntu 18.04,python,docker,tensorflow,nvidia-docker,Python,Docker,Tensorflow,Nvidia Docker,我现在正在努力,我花了很多时间尝试不同的方法来让我的卡使用Tensorflow 我最近的一次尝试(与以前有类似的问题)是尝试安装tensorflow docker https://hub.docker.com/r/tensorflow/tensorflow/ 我安装了nvidia docker并运行了SMI,它似乎报告说我的GPU存在 然后我运行了这个命令 nvidia-docker run -it -p 8888:8888 tensorflow/tensorflow:latest-gpu

我现在正在努力,我花了很多时间尝试不同的方法来让我的卡使用Tensorflow

我最近的一次尝试(与以前有类似的问题)是尝试安装tensorflow docker

https://hub.docker.com/r/tensorflow/tensorflow/
我安装了nvidia docker并运行了SMI,它似乎报告说我的GPU存在

然后我运行了这个命令

nvidia-docker run -it -p 8888:8888 tensorflow/tensorflow:latest-gpu
下载并启动后,我尝试运行笔记本(首先是hello tensorflow笔记本)

只要我尝试“导入”tensorflow(只使用默认的未修改笔记本),我就会得到一个内核重启

KernelRestarter: restarting kernel (1/5), keep random ports
我不确定下一个最好的步骤是什么,我不知道如何对docker容器进行故障排除,然后在jupyter笔记本中进行故障排除

我以前在尝试在没有docker容器的情况下在本地运行时遇到过类似的问题

有什么好的下一步建议吗?我花在这张卡上的钱比我想的要多,我不知道如何让它发挥作用

(我相信我可以使用安装的tensorflow gpu在我的机器上本地导入,但是当我到达conv2d部分时,我将无法创建cudnn句柄:cudnn_状态_未初始化,如果我回忆的话,但这已经是忙碌的几天了)

编辑:是的,cuda和cudnn和我都安装了nvidia-390,这似乎是一个很好的测试nvidia smi的工作。我刚刚从零开始编译tf,但仍然失败(在本例中,导入tf不会失败,但相同的非初始化错误,可能不是它提到的正确的nvidia版本,我想是nvidia-390.77) 我正在考虑一个新的18.04版本安装和一个更早的nvidia-3xx版本安装,试图“降级”导致apt损坏,并尝试了几天的修复

编辑2: 我也意识到我安装了CUDA 9.0,但是cudnn7.1和9.1 CUDA(你可以从nvidia下载这个组合,不管它意味着什么)。 我正在尝试恢复,但我在退出时遇到了很多麻烦,我非常接近于擦除并重新安装ubuntu,然后从那里开始。我有所有的命令,我认为这可能更容易,但我不确定这是否能解决它。(例如cudnn-9.0-linux-x64-v7.1)

编辑3: 我回来回应这件事。我写了一个要点,我必须做什么让我的GPU在ubuntu 16.04中为我的主机工作,但是我没有在docker中测试它,这里是要点

复制粘贴到此处:

# 1070 Ti
Fresh Install 16.04
(download updates, and include 3rd party)
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install nvidia-384
# Contents
sudo bash -c 'cat >> /etc/modprobe.d/blacklist-nouveau.conf << 'EOF'
blacklist nouveau
options nouveau modeset=0
EOF'
sudo update-initramfs -u
sudo reboot
# Takes about 30-40 minutes 1.5GB approx
wget https://developer.download.nvidia.com/compute/cuda/9.0/secure/Prod/local_installers/cuda_9.0.176_384.81_linux.run
sudo sh cuda_9.0.176_384.81_linux.run
    No to install nvidia accelerated Graphics Driver for Linux
    yes to Cuda 9.0 toolkit
    default
    yes to symbolic link
    yes to samples
    default location is fine


#Alternately (need to test)
#sudo sh cuda_9.0.176_384.81_linux.run --silent --toolkit --samples

cat >> ~/.bashrc << 'EOF'
export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64\
${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
EOF
cd ~/NVIDIA_CUDA-9.0_Samples/1_Utilities/deviceQuery
make
./deviceQuery # Assuming make was successful
cd ~/NVIDIA_CUDA-9.0_Samples/1_Utilities/bandwidthTest
make
./bandwidthTest # Assuming make was successful
# Look for Result = PASS

sudo apt-get install nvidia-cuda-toolkit

# Couldn't find on 16.04 maybe this is a 18.04 upgrade?
#sudo apt-get install cuda-toolkit-9.0 cuda-command-line-tools-9-0

# At this point the driver and CUDA are installed, now it's time to install the CUDNN driver/piece.
#This is the link that I have, be sure to use v7 not v7.1 as I haven't had luck in the past with that (though it might work).
https://developer.nvidia.com/compute/machine-learning/cudnn/secure/v7.0.5/prod/9.0_20171129/cudnn-9.0-linux-x64-v7
# 333 MB so will take a bit
cd ~/Downloads
tar -xvf cudnn-9.0-linux-x64-v7.tgz
cd cuda
sudo cp lib64/* /usr/local/cuda/lib64/
sudo cp include/* /usr/local/cuda/include/

sudo apt-get install git tmux
cd ~/Downloads
# At this point I'm going to install Anaconda
wget https://repo.continuum.io/archive/Anaconda3-4.3.1-Linux-x86_64.sh -O anaconda-install.sh 
bash anaconda-install.sh # Follow Prompts adding path to bash
source ~/.bashrc
conda create --name ml
source activate ml
pip install tensorflow-gpu==1.5

# test the install
cd ~
mkdir projects
cd projects
git clone https://github.com/tensorflow/models




# Addional notes
Run a sample from the cuda samples folder

/NVIDIA_CUDA-9.0_Samples/1_Utilities/deviceQuery
make
./deviceQuery

Output:

Plenty but ends with the following
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 9.0, NumDevs = 2
Result = PASS


This tells you which cudnn is installed

cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2

Outputs:
#define CUDNN_MAJOR 7
#define CUDNN_MINOR 1
#define CUDNN_PATCHLEVEL 4
--
#define CUDNN_VERSION    (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)


# This tells you what

nvcc --version 

Outputs:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

#1070 Ti
新安装16.04
(下载更新,包括第三方)
更新源
升级
sudo apt获取安装nvidia-384
#内容

SUDO BASH-C’CAT> /ETC/MODPREB.D/BLASLIST NoWuAU.CONF安装了<代码> CUDA < /C>和<代码> CUDNN < /代码>?安装英伟达专有驱动程序吗?这张卡还能用吗?在没有tensorflow的情况下,有没有测试卡是否正常运行?请检查您的cuda版本。最新的tensorflow版本需要cuda 9、cudnn 7+、ubuntu 16.04。