tensorflow.python.framework.errors\u impl.InternalError:GPU同步失败

tensorflow.python.framework.errors\u impl.InternalError:GPU同步失败,python,tensorflow,nvidia,Python,Tensorflow,Nvidia,我已安装以下设备: 窗口10 Python 3.8 Tensorflow gpu 2.3 Cuda 10.1 CudNN 7.6.5 英伟达gtx 1080 驱动程序版本:451.48 内存:8192MiB 在培训过程中,会出现以下错误: 回溯(最近一次呼叫最后一次): 文件“training.py”,第519行,在 历史=模型.fit(X\u序列,y\u序列,年代=n\u年代,批量大小=批量大小\ 文件“C:\Anaconda3\u 64\lib\site packages\tensor

我已安装以下设备:

  • 窗口10
  • Python 3.8
  • Tensorflow gpu 2.3
  • Cuda 10.1
  • CudNN 7.6.5
  • 英伟达gtx 1080
  • 驱动程序版本:451.48
  • 内存:8192MiB
在培训过程中,会出现以下错误:

回溯(最近一次呼叫最后一次):
文件“training.py”,第519行,在
历史=模型.fit(X\u序列,y\u序列,年代=n\u年代,批量大小=批量大小\
文件“C:\Anaconda3\u 64\lib\site packages\tensorflow\python\keras\engine\training.py”,第108行,在\u方法\u包装中
返回方法(self、*args、**kwargs)
文件“C:\Anaconda3\u 64\lib\site packages\tensorflow\python\keras\engine\training.py”,第1103行
回调。在列车上批处理结束(结束步骤,日志)
文件“C:\Anaconda3\u 64\lib\site packages\tensorflow\python\keras\callbacks.py”,第440行,在“on\u train\u batch\u”末尾
self.\u调用\u批处理\u钩子(ModeKeys.TRAIN,'end',batch,logs=logs)
文件“C:\Anaconda3\u 64\lib\site packages\tensorflow\python\keras\callbacks.py”,第289行,位于调用批处理钩子中
self.\u调用\u批处理\u结束\u挂钩(模式、批处理、日志)
文件“C:\Anaconda3\u 64\lib\site packages\tensorflow\python\keras\callbacks.py”,第309行,在\u call\u batch\u end\u hook中
self.\u call\u batch\u hook\u helper(hook\u名称、批次、日志)
文件“C:\Anaconda3\u 64\lib\site packages\tensorflow\python\keras\callbacks.py”,第342行,在\u call\u batch\u hook\u helper中
挂钩(批次、日志)
文件“C:\Anaconda3\u 64\lib\site packages\tensorflow\python\keras\callbacks.py”,第961行,在列车\u批次\u末尾
self.\u批处理\u更新\u程序条(批处理,日志)
文件“C:\Anaconda3\u 64\lib\site packages\tensorflow\python\keras\callbacks.py”,第1016行,在批处理更新程序栏中
logs=tf_utils.to_numpy_或_python_类型(日志)
文件“C:\Anaconda3\u 64\lib\site packages\tensorflow\python\keras\utils\tf\u utils.py”,第537行,to\u numpy\u或\u python\u类型
return nest.map_结构(_to_single_numpy_或_python_类型,张量)
文件“C:\Anaconda3\u 64\lib\site packages\tensorflow\python\util\nest.py”,第635行,映射结构
结构[0],[func(*x)表示条目中的x],
文件“C:\Anaconda3\u 64\lib\site packages\tensorflow\python\util\nest.py”,第635行,在
结构[0],[func(*x)表示条目中的x],
文件“C:\Anaconda3\u 64\lib\site packages\tensorflow\python\keras\utils\tf\u utils.py”,第533行,为单字节或python类型
x=t.numpy()
文件“C:\Anaconda3\u 64\lib\site packages\tensorflow\python\framework\ops.py”,第1063行,numpy格式
可能_arr=self._numpy()#pylint:disable=protected access
文件“C:\Anaconda3\u 64\lib\site packages\tensorflow\python\framework\ops.py”,第1031行,in\u numpy
六、将_从(核心状态_)提升到_异常(e.code,e.message),无#pylint:disable=受保护访问
文件“”,第3行,从
tensorflow.python.framework.errors\u impl.InternalError:GPU同步失败
内部错误:GPU同步失败


有什么线索吗?

虽然你没有提到,但你似乎是在windows上。如果GPU内核运行时间超过2秒,你可以点击CUDA_错误\u启动\u超时。你可能希望阅读。你还会发现许多关于堆栈溢出的问题,讨论了这一点。@RobertCrovella我已经更新了这个问题,我添加了更多detail@RobertCrovella我尝试将WDDM TDR延迟设置为10和30,但仍然得到相同的错误。看起来最常见的原因可能是GPU内存不足。
Traceback (most recent call last):
 File "training.py", line 519, in <module>
   history = model.fit(X_train, y_train, epochs=n_epochs, batch_size=batch_size, \
 File "C:\Anaconda3_64\lib\site-packages\tensorflow\python\keras\engine\training.py", line 108, in _method_wrapper
   return method(self, *args, **kwargs)
 File "C:\Anaconda3_64\lib\site-packages\tensorflow\python\keras\engine\training.py", line 1103, in fit
  callbacks.on_train_batch_end(end_step, logs)
 File "C:\Anaconda3_64\lib\site-packages\tensorflow\python\keras\callbacks.py", line 440, in on_train_batch_end
  self._call_batch_hook(ModeKeys.TRAIN, 'end', batch, logs=logs)
 File "C:\Anaconda3_64\lib\site-packages\tensorflow\python\keras\callbacks.py", line 289, in _call_batch_hook
  self._call_batch_end_hook(mode, batch, logs)
 File "C:\Anaconda3_64\lib\site-packages\tensorflow\python\keras\callbacks.py", line 309, in _call_batch_end_hook
  self._call_batch_hook_helper(hook_name, batch, logs)
 File "C:\Anaconda3_64\lib\site-packages\tensorflow\python\keras\callbacks.py", line 342, in _call_batch_hook_helper
  hook(batch, logs)
 File "C:\Anaconda3_64\lib\site-packages\tensorflow\python\keras\callbacks.py", line 961, in on_train_batch_end
   self._batch_update_progbar(batch, logs)
 File "C:\Anaconda3_64\lib\site-packages\tensorflow\python\keras\callbacks.py", line 1016, in _batch_update_progbar
   logs = tf_utils.to_numpy_or_python_type(logs)
 File "C:\Anaconda3_64\lib\site-packages\tensorflow\python\keras\utils\tf_utils.py", line 537, in to_numpy_or_python_type
  return nest.map_structure(_to_single_numpy_or_python_type, tensors)
 File "C:\Anaconda3_64\lib\site-packages\tensorflow\python\util\nest.py", line 635, in map_structure
  structure[0], [func(*x) for x in entries],
 File "C:\Anaconda3_64\lib\site-packages\tensorflow\python\util\nest.py", line 635, in <listcomp>
  structure[0], [func(*x) for x in entries],
 File "C:\Anaconda3_64\lib\site-packages\tensorflow\python\keras\utils\tf_utils.py", line 533, in _to_single_numpy_or_python_type
   x = t.numpy()
 File "C:\Anaconda3_64\lib\site-packages\tensorflow\python\framework\ops.py", line 1063, in numpy
  maybe_arr = self._numpy()  # pylint: disable=protected-access
 File "C:\Anaconda3_64\lib\site-packages\tensorflow\python\framework\ops.py", line 1031, in _numpy
  six.raise_from(core._status_to_exception(e.code, e.message), None)  # pylint: disable=protected-access
 File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InternalError: GPU sync failed