Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/tensorflow/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/arduino/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Tensorflow GPU停止工作 复制问题_Tensorflow_Tensorflow Gpu - Fatal编程技术网

Tensorflow GPU停止工作 复制问题

Tensorflow GPU停止工作 复制问题,tensorflow,tensorflow-gpu,Tensorflow,Tensorflow Gpu,几天前我让tensorflow运行,但它停止了工作。在使用教程代码对其进行测试时,和都失败。Tensorflow成功地运行了简单的内容 我试过的 与之一样,我尝试将allow_growth设置为True,或将per_process\u gpu\u memory\u分数设置为0.1,但这没有帮助 我已尝试重新安装我的cudnn文件 附加说明 我不记得对Tensorflow安装或CUDA/cuDNN设置做过任何更改,因此我最好的猜测是,这可能是自动更新的驱动程序的问题 系统信息 我是否编写了自

几天前我让tensorflow运行,但它停止了工作。在使用教程代码对其进行测试时,和都失败。Tensorflow成功地运行了简单的内容

我试过的
  • 与之一样,我尝试将
    allow_growth
    设置为True,或将
    per_process\u gpu\u memory\u分数设置为0.1,但这没有帮助
  • 我已尝试重新安装我的
    cudnn
    文件
附加说明 我不记得对Tensorflow安装或CUDA/cuDNN设置做过任何更改,因此我最好的猜测是,这可能是自动更新的驱动程序的问题

系统信息
  • 我是否编写了自定义代码(而不是使用TensorFlow中提供的股票示例脚本):否。使用TensorFlow教程中的代码可以复制该问题
  • 操作系统平台和发行版(如Linux Ubuntu 16.04):Ubuntu 16.04.3 LTS
  • TensorFlow安装自(源或二进制):源
  • TensorFlow版本(使用下面的命令):v1.3.0-rc2-20-g0787eee 1.3.0
  • Python版本:Python 3.5.2(默认,2017年8月18日17:48:00)
  • Bazel版本(如果从源代码处编译):不适用
  • CUDA/cuDNN版本:CUDA 8.0版,V8.0.61/libcudnn.so.6.0.21
  • GPU型号和内存:384.90驱动程序上的GeForce GTX 1080,8GB
源代码/日志 对于
helloworld
REPL中的代码

>>> import tensorflow as tf
>>> hello = tf.constant('Hello, TensorFlow!')
>>> sess = tf.Session()
2017-10-26 21:56:00.418991: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-26 21:56:00.419027: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-26 21:56:00.419036: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-10-26 21:56:00.419046: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-26 21:56:00.419054: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-10-26 21:56:00.565143: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-10-26 21:56:00.565417: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties: 
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.7335
pciBusID 0000:01:00.0
Total memory: 7.92GiB
Free memory: 6.48GiB
2017-10-26 21:56:00.565432: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0 
2017-10-26 21:56:00.565437: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0:   Y 
2017-10-26 21:56:00.565447: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
>>> print(sess.run(hello))
b'Hello, TensorFlow!'
对于
python3 mnist\u deep.py

2017-10-26 21:37:56.993479: E tensorflow/stream_executor/cuda/cuda_dnn.cc:371] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2017-10-26 21:37:56.993560: E tensorflow/stream_executor/cuda/cuda_dnn.cc:338] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
2017-10-26 21:37:56.993580: F tensorflow/core/kernels/conv_ops.cc:672] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms) 
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.7335
pciBusID 0000:01:00.0
Total memory: 7.92GiB
Free memory: 6.50GiB
2017-10-26 21:53:16.150706: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0 
2017-10-26 21:53:16.150712: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0:   Y 
2017-10-26 21:53:16.150723: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
2017-10-26 21:53:16.422081: E tensorflow/stream_executor/cuda/cuda_blas.cc:366] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2017-10-26 21:53:16.422132: W tensorflow/stream_executor/stream.cc:1756] attempting to perform BLAS operation using StreamExecutor without BLAS support
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1327, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1306, in _run_fn
    status, run_metadata)
  File "/usr/lib/python3.5/contextlib.py", line 66, in __exit__
    next(self.gen)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(100, 784), b.shape=(784, 10), m=100, n=10, k=784
         [[Node: MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_arg_Placeholder_0_0/_9, Variable/read)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "mnist_softmax.py", line 78, in <module>
    tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "mnist_softmax.py", line 65, in main
    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 895, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1124, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1321, in _do_run
    options, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1340, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(100, 784), b.shape=(784, 10), m=100, n=10, k=784
         [[Node: MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_arg_Placeholder_0_0/_9, Variable/read)]]

Caused by op 'MatMul', defined at:
  File "mnist_softmax.py", line 78, in <module>
    tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "mnist_softmax.py", line 42, in main
    y = tf.matmul(x, W) + b
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py", line 1844, in matmul
    a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 1289, in _mat_mul
    transpose_b=transpose_b, name=name)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 2630, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1204, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InternalError (see above for traceback): Blas GEMM launch failed : a.shape=(100, 784), b.shape=(784, 10), m=100, n=10, k=784
         [[Node: MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_arg_Placeholder_0_0/_9, Variable/read)]]
以下是英伟达smi的输出

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.90                 Driver Version: 384.90                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1080    Off  | 00000000:01:00.0  On |                  N/A |
| 34%   51C    P0    35W / 180W |   1340MiB /  8110MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1250      G   /usr/lib/xorg/Xorg                           785MiB |
|    0      2426      G   compiz                                       359MiB |
|    0      3840      G   ...-token=44A975F4EE134A1BF9C8CD1C7223C977   107MiB |
|    0      4944      G   ...-token=4F87ADEE5575E9B5125D41E08D43BF0E    83MiB |
+-----------------------------------------------------------------------------+

尝试关闭其他进程中活动的会话。请跟随这条线索-


尝试关闭其他进程中活动的会话。请跟随这条线索-


OT:gpu是如何为您工作的?值这个钱吗?我也遇到过这个问题。使用AWS
p2 xlarge
和单个Nvidia K80。它将运行20-60分钟,但GPU将停止。进程仍列在nvidia smi中,CPU仍为100%,但GPU利用率降至零。我有一个理论,GPU有故障,CPU继续工作,但速度很慢。OT:GPU是如何为您工作的?值这个钱吗?我也遇到过这个问题。使用AWS
p2 xlarge
和单个Nvidia K80。它将运行20-60分钟,但GPU将停止。进程仍列在nvidia smi中,CPU仍为100%,但GPU利用率降至零。我有一个理论,GPU有故障,CPU正在继续工作,但做得很慢;不幸的是,我没有在任何其他进程中打开会话(从
nvidia smi
的结果可以看出),感谢您的建议;不幸的是,我没有在任何其他进程中打开会话(从
nvidia smi
的结果可以看出)