Compilation theano无法编译cuda，但python代码使用GPU运行_Compilation_Theano_Theano Cuda_Cudnn

Compilation theano无法编译cuda，但python代码使用GPU运行

compilation

Compilation theano无法编译cuda，但python代码使用GPU运行,compilation,theano,theano-cuda,cudnn,Compilation,Theano,Theano Cuda,Cudnn,我正试图在anaconda创建的python虚拟环境中，在NVIDIA 1060 GPU上使用Cuda 8.0在Ubuntu 16.04上运行theano简单代码。以下是我的TheAnoc文件： [global] floatX = float32 device = cuda 我尝试运行的代码是theano网站上的一个简短示例： from theano import function, config, shared, tensor import numpy import time vlen =

我正试图在anaconda创建的python虚拟环境中，在NVIDIA 1060 GPU上使用Cuda 8.0在Ubuntu 16.04上运行theano简单代码。以下是我的TheAnoc文件：

[global]
floatX = float32
device = cuda

我尝试运行的代码是theano网站上的一个简短示例：

from theano import function, config, shared, tensor
import numpy
import time

vlen = 10 * 30 * 768  # 10 x #cores x # threads per core
iters = 1000

rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], tensor.exp(x))
print(f.maker.fgraph.toposort())
t0 = time.time()
for i in range(iters):
    r = f()
t1 = time.time()
print("Looping %d times took %f seconds" % (iters, t1 - t0))
print("Result is %s" % (r,))
if numpy.any([isinstance(x.op, tensor.Elemwise) and
              ('Gpu' not in type(x.op).__name__)
              for x in f.maker.fgraph.toposort()]):
    print('Used the cpu')
else:
    print('Used the gpu')

当我运行代码时，会收到一系列警告和以下错误：

ERROR (theano.sandbox.cuda): Failed to compile cuda_ndarray.cu: ('nvcc return status', 1, 'for cmd', 'nvcc -shared -O3 -m64 -Xcompiler -DCUDA_NDARRAY_CUH=c72d035fdf91890f3b36710688069b2e,-DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION,-fPIC,-fvisibility=hidden -Xlinker -rpath,/home/eb/.theano/compiledir_Linux-4.8--generic-x86_64-with-debian-stretch-sid-x86_64-2.7.13-64/cuda_ndarray -I/home/eb/anaconda2/envs/deep/lib/python2.7/site-packages/theano/sandbox/cuda -I/home/eb/anaconda2/envs/deep/lib/python2.7/site-packages/numpy/core/include -I/home/eb/anaconda2/envs/deep/include/python2.7 -I/home/eb/anaconda2/envs/deep/lib/python2.7/site-packages/theano/gof -L/home/eb/anaconda2/envs/deep/lib -o /home/eb/.theano/compiledir_Linux-4.8--generic-x86_64-with-debian-stretch-sid-x86_64-2.7.13-64/cuda_ndarray/cuda_ndarray.so mod.cu -lcublas -lpython2.7 -lcudart')
Can not use cuDNN on context None: cannot compile with cuDNN. We got this error:
/tmp/try_flags_M8OZOh.c:4:19: fatal error: cudnn.h: No such file or directory
compilation terminated.

Mapped name None to device cuda: GeForce GTX 1060 6GB (0000:01:00.0)

令人惊讶的是，代码运行并打印所需的输出，如下所示：

[GpuElemwise{exp,no_inplace}(<GpuArrayType<None>(float32, (False,))>), HostFromGpu(gpuarray)(GpuElemwise{exp,no_inplace}.0)]
Looping 1000 times took 0.365814 seconds
Result is [ 1.23178029  1.61879349  1.52278066 ...,  2.20771813  2.29967761
  1.62323296]
Used the gpu

[GpuElemwise{exp，no_inplace}（），HostFromGpu（gpuarray）（GpuElemwise{exp，no_inplace}.0）]
循环1000次需要0.365814秒
结果是[1.23178029 1.61879349 1.52278066…，2.20771813 2.299677761]
1.62323296]
使用gpu

我想知道我是否丢失了Theano配置或其他什么？知道怎么回事吗

p、除Cuda库安装在系统级外，所有库都已安装在my python virtual env中。

--如果这有助于提供线索，谢谢。输出显示用红色标记的_CUDA_NDARRAY_C的源代码。然后是警告，例如：在/usr/include/host_config.h:161:0中包含的文件中，从/usr/include/cuda_runtime.h:76，从：0:/usr/include/features.h:169:0：注意：这是前面定义的位置#define XOPEN_Source700^mod.cu（940）：警告：无符号整数与零mod.cu（3000）的无意义比较：警告：不建议将字符串文字转换为“char*”。我想我也需要安装CudNN！在theano网站中，没有对它的要求。但ndarray库可能会先尝试使用它，这就是我出错的原因。下面是我如何在多次尝试和错误后最终解决这个问题的。首先，上面的错误抱怨缺乏Theano和Keras使用的CudNN库。但是通过dpkg命令安装CudNN包对我来说不起作用，我必须手动将“CudNN.h”和其他文件复制到cuda安装的“include”和“lib64”文件夹中。现在警告和错误都消失了。据我所知，Theano首先尝试使用CudNN，如果没有，它会抛出一些错误，并使用标准的Cuda库。这就是代码最终在这两种情况下都能运行的原因。以防万一，它可以作为一个线索。输出显示用红色标记的_CUDA_NDARRAY_C的源代码。然后是警告，例如：在/usr/include/host_config.h:161:0中包含的文件中，从/usr/include/cuda_runtime.h:76，从：0:/usr/include/features.h:169:0：注意：这是前面定义的位置#define XOPEN_Source700^mod.cu（940）：警告：无符号整数与零mod.cu（3000）的无意义比较：警告：不建议将字符串文字转换为“char*”。我想我也需要安装CudNN！在theano网站中，没有对它的要求。但ndarray库可能会先尝试使用它，这就是我出错的原因。下面是我如何在多次尝试和错误后最终解决这个问题的。首先，上面的错误抱怨缺乏Theano和Keras使用的CudNN库。但是通过dpkg命令安装CudNN包对我来说不起作用，我必须手动将“CudNN.h”和其他文件复制到cuda安装的“include”和“lib64”文件夹中。现在警告和错误都消失了。据我所知，Theano首先尝试使用CudNN，如果没有，它会抛出一些错误，并使用标准的Cuda库。这就是代码最终在这两种情况下运行的原因。