Chainer-Python-CUDA\u错误\u无效\u源

Chainer-Python-CUDA\u错误\u无效\u源,python,chainer,cupy,Python,Chainer,Cupy,我有一台带有两个不同GPU(一个RTX和一个Titan V)的机器,它经常无法运行任务。这种行为主要在id=1的GPU中观察到 同一任务可以在id为0的其他计算机或GPU中成功运行 精确的堆栈如下所示: File "cupy/core/core.pyx", line 1689, in cupy.core.core.ndarray.__setitem__ File "cupy/core/core.pyx", line 3598, in cupy.core.core._scatter_op

我有一台带有两个不同GPU(一个RTX和一个Titan V)的机器,它经常无法运行任务。这种行为主要在id=1的GPU中观察到

同一任务可以在id为0的其他计算机或GPU中成功运行

精确的堆栈如下所示:

  File "cupy/core/core.pyx", line 1689, in cupy.core.core.ndarray.__setitem__
  File "cupy/core/core.pyx", line 3598, in cupy.core.core._scatter_op
  File "cupy/core/_kernel.pyx", line 828, in cupy.core._kernel.ufunc.__call__
  File "cupy/util.pyx", line 48, in cupy.util.memoize.decorator.ret
  File "cupy/core/_kernel.pyx", line 617, in cupy.core._kernel._get_ufunc_kernel
  File "cupy/core/_kernel.pyx", line 51, in cupy.core._kernel._get_simple_elementwise_kernel
  File "cupy/core/carray.pxi", line 164, in cupy.core.core.compile_with_cache
  File "[miniconda]/envs/[env_name]/lib/python3.5/site-packages/cupy/cuda/compiler.py", line 161, in compile_with_cache
    mod.load(cubin)
  File "cupy/cuda/function.pyx", line 181, in cupy.cuda.function.Module.load
  File "cupy/cuda/function.pyx", line 183, in cupy.cuda.function.Module.load
  File "cupy/cuda/driver.pyx", line 185, in cupy.cuda.driver.moduleLoadData
  File "cupy/cuda/driver.pyx", line 81, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_INVALID_SOURCE: device kernel image is invalid

chainer                   5.2.0                     <pip>
chainercv                 0.12.0                    <pip>
cupy-cuda100              5.2.0                     <pip>
我的设置如下所示:

  File "cupy/core/core.pyx", line 1689, in cupy.core.core.ndarray.__setitem__
  File "cupy/core/core.pyx", line 3598, in cupy.core.core._scatter_op
  File "cupy/core/_kernel.pyx", line 828, in cupy.core._kernel.ufunc.__call__
  File "cupy/util.pyx", line 48, in cupy.util.memoize.decorator.ret
  File "cupy/core/_kernel.pyx", line 617, in cupy.core._kernel._get_ufunc_kernel
  File "cupy/core/_kernel.pyx", line 51, in cupy.core._kernel._get_simple_elementwise_kernel
  File "cupy/core/carray.pxi", line 164, in cupy.core.core.compile_with_cache
  File "[miniconda]/envs/[env_name]/lib/python3.5/site-packages/cupy/cuda/compiler.py", line 161, in compile_with_cache
    mod.load(cubin)
  File "cupy/cuda/function.pyx", line 181, in cupy.cuda.function.Module.load
  File "cupy/cuda/function.pyx", line 183, in cupy.cuda.function.Module.load
  File "cupy/cuda/driver.pyx", line 185, in cupy.cuda.driver.moduleLoadData
  File "cupy/cuda/driver.pyx", line 81, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_INVALID_SOURCE: device kernel image is invalid

chainer                   5.2.0                     <pip>
chainercv                 0.12.0                    <pip>
cupy-cuda100              5.2.0                     <pip>
chainer 5.2.0
chainercv 0.12.0
cupy-cuda100 5.2.0
同样的问题也出现在Chainer5.3中(我从头创建了一个新的conda环境)

我相信这在某种程度上与多线程有关,但我找不到如何在cupy中关闭它,或者如何完全避免这个问题

一些可能不相关的信息: 这是一个相当随机的过程。在GPU id=1的情况下,十次中有八次由于上述错误而无法运行


有什么想法吗?

GPU id=1是否指向RTX?您可以使用v5.4.0再试一次吗?不,GPU id=1,指向Titan X。事实上,5.4解决了这个问题。您能描述一下问题是什么以供将来参考吗?
GPU id=1
是否指向RTX?您可以使用v5.4.0再试一次吗?不,GPU id=1,指向Titan X。事实上,5.4解决了这个问题。你能描述一下这个问题是什么以供将来参考吗?