Python 在virtualenv的GPU集群上运行tensorflow

Python 在virtualenv的GPU集群上运行tensorflow,python,installation,tensorflow,Python,Installation,Tensorflow,我在一个virtualenv中安装了tensorflow的GPU版本,如下所示。问题是,我在开始会话时遇到了分段错误。也就是说,这个代码: import tensorflow as tf sess = tf.InteractiveSession() 退出时出现以下错误: (tesnsorflowenv)user@machine$ python testtensorflow.py I tensorflow/stream_executor/dso_loader.cc:101] successfu

我在一个virtualenv中安装了tensorflow的GPU版本,如下所示。问题是,我在开始会话时遇到了分段错误。也就是说,这个代码:

import tensorflow as tf
sess = tf.InteractiveSession()
退出时出现以下错误:

(tesnsorflowenv)user@machine$ python testtensorflow.py 
I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcublas.so.7.0 locally
I tensorflow/stream_executor/dso_loader.cc:93] Couldn't open CUDA library libcudnn.so.6.5. LD_LIBRARY_PATH: :/vol/cuda/7.0.28/lib64
I tensorflow/stream_executor/cuda/cuda_dnn.cc:1382] Unable to load cuDNN DSO
I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcufft.so.7.0 locally
I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcuda.so locally
I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcurand.so.7.0 locally
I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 40
Segmentation fault
我尝试使用gdb进行更深入的挖掘,但只获得了以下额外输出:

[New Thread 0x7fffdf880700 (LWP 32641)]
[New Thread 0x7fffdf07f700 (LWP 32642)]
... lines omitted 
[New Thread 0x7fffadffb700 (LWP 32681)]
[Thread 0x7fffadffb700 (LWP 32681) exited]
Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
你知道这里发生了什么,怎么解决吗

以下是nvidia smi的输出:

+------------------------------------------------------+                       
| NVIDIA-SMI 352.63     Driver Version: 352.63         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           On   | 0000:06:00.0     Off |                    0 |
| N/A   65C    P0   142W / 149W |    235MiB / 11519MiB |     81%   E. Process |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           On   | 0000:07:00.0     Off |                    0 |
| N/A   25C    P8    30W / 149W |     55MiB / 11519MiB |      0%   E. Process |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K80           On   | 0000:0D:00.0     Off |                    0 |
| N/A   27C    P8    26W / 149W |     55MiB / 11519MiB |      0%   Prohibited |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K80           On   | 0000:0E:00.0     Off |                    0 |
| N/A   25C    P8    28W / 149W |     55MiB / 11519MiB |      0%   E. Process |
+-------------------------------+----------------------+----------------------+
|   4  Tesla K80           On   | 0000:86:00.0     Off |                    0 |
| N/A   46C    P0    85W / 149W |    206MiB / 11519MiB |     97%   E. Process |
+-------------------------------+----------------------+----------------------+
|   5  Tesla K80           On   | 0000:87:00.0     Off |                    0 |
| N/A   27C    P8    29W / 149W |     55MiB / 11519MiB |      0%   E. Process |
+-------------------------------+----------------------+----------------------+
|   6  Tesla K80           On   | 0000:8D:00.0     Off |                    0 |
| N/A   28C    P8    26W / 149W |     55MiB / 11519MiB |      0%   Prohibited |
+-------------------------------+----------------------+----------------------+
|   7  Tesla K80           On   | 0000:8E:00.0     Off |                    0 |
| N/A   23C    P8    30W / 149W |     55MiB / 11519MiB |      0%   E. Process |
+-------------------------------+----------------------+----------------------+

谢谢你在这个问题上的帮助

它找不到CuDNN-

I tensorflow/stream_executor/dso_loader.cc:93]无法打开CUDA library>libcudnn.so.6.5。LD_库路径::/vol/cuda/7.0.28/lib64 I tensorflow/stream_executor/cuda/cuda_dnn.cc:1382]无法加载cuDNN DSO

你需要安装它。请参见解除cudnn后的

[root@localhost cudnn]# cd include/
[root@localhost include]# mv cudnn.h /usr/local/cuda/include/
[root@localhost include]# cd ../lib64/
[root@localhost lib64]# mv * /usr/local/cuda/lib
没关系

[root@localhost ~]# python
Python 2.7.5 (default, Sep 15 2016, 22:37:39) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as f
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so.8.0 locally
>>>

请尝试按照以下说明从源代码构建,最好是在调试模式下,并提供完整的堆栈跟踪。这可能有助于确定SIGSEGV的来源。是的,可能就是它!不知何故,当我在本地机器上测试它时,这个问题并没有显现出来。谢谢你的帮助。在我的cudnn申请获得批准后,我将知道它是否有效。。。