Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/tensorflow/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 未知错误/崩溃-带GPU的TensorFlow LSTM(第一个历元开始后无输出)_Python_Tensorflow_Keras_Lstm - Fatal编程技术网

Python 未知错误/崩溃-带GPU的TensorFlow LSTM(第一个历元开始后无输出)

Python 未知错误/崩溃-带GPU的TensorFlow LSTM(第一个历元开始后无输出),python,tensorflow,keras,lstm,Python,Tensorflow,Keras,Lstm,我正在尝试使用LSTM层训练模型。我正在使用GPU,所有需要的库都已加载。 当我以这种方式构建模型时: model = keras.Sequential() model.add(layers.LSTM(256, activation="relu", return_sequences=False)) # note the activation function model.add(layers.Dropout(0.2)) model.add(layers.Dense(25

我正在尝试使用LSTM层训练模型。我正在使用GPU,所有需要的库都已加载。

当我以这种方式构建模型时:

model = keras.Sequential()

model.add(layers.LSTM(256, activation="relu", return_sequences=False))  # note the activation function
model.add(layers.Dropout(0.2))

model.add(layers.Dense(256, activation="relu"))
model.add(layers.Dropout(0.2))

model.add(layers.Dense(1))
model.add(layers.Activation(activation="sigmoid"))

model.compile(
    loss=keras.losses.BinaryCrossentropy(),
    optimizer="adam",
    metrics=["accuracy"]
)
它起作用了。但是它在LSTM层上使用了
activation=“relu”
,所以它不是CuDNNLSTM——如果我没有错的话,当激活函数为tanh(默认值)时,会自动选择CuDNNLSTM

所以,它的速度非常慢,我想运行更快的CuDNNLSTM。我的代码是:

model = keras.Sequential()

model.add(layers.LSTM(256, return_sequences=False))
model.add(layers.Dropout(0.2))

model.add(layers.Dense(256, activation="relu"))
model.add(layers.Dropout(0.2))

model.add(layers.Dense(1))
model.add(layers.Activation(activation="sigmoid"))

model.compile(
    loss=keras.losses.BinaryCrossentropy(),
    optimizer="adam",
    metrics=["accuracy"]
)
基本相同,只是没有提供激活功能,因此将使用tanh。 但现在不是培训,输出的结尾是这样的:

2021-04-19 22:41:46.046218: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-04-19 22:41:46.046426: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-04-19 22:41:46.046642: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-04-19 22:41:46.046942: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-04-19 22:41:46.047124: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-04-19 22:41:46.047312: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-04-19 22:41:46.047489: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-04-19 22:41:46.047663: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-04-19 22:41:46.047936: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-04-19 22:41:46.665456: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-04-19 22:41:46.665712: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      0 
2021-04-19 22:41:46.665876: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0:   N 
2021-04-19 22:41:46.666186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2982 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce GTX 1050 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2021-04-19 22:41:46.667505: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-04-19 22:42:07.374456: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Epoch 1/50
2021-04-19 22:42:08.922891: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-04-19 22:42:09.272264: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-04-19 22:42:09.302667: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll

Process finished with exit code -1073740791 (0xC0000409)
它只是开始第一个纪元,然后冻结一分钟,然后用这个奇怪的退出代码退出。

  • 输入数据的形状:
    tf.Tensor([50985 29 7],Shape=(3,),dtype=int32)
  • 我的GPU:
    Nvidia GTX 1050 Ti
  • CUDA:
    v11.3
  • 操作系统:
    windows10
  • IDE:
    PyCharm
找到这个问题的解决方案有点困难,因为我没有输出任何错误。我做错什么了吗?有没有人遇到过类似的问题?你应该做些什么

//编辑;我试过:

  • 使用更少的单元(2而不是256)和更小的批量大小运行此模型
  • 使用python将tensorflow降级为
    2.4.0
    ,将CUDA降级为
    11.0
    ,将cudnn降级为
    8.0.1
    (根据需要,这应该是正确的组合)
  • 重新启动我的电脑:)

    • 我找到了解决方案。。。有点。

      因此,当我将tensorflow降级为
      2.1.0
      ,CUDA降级为
      10.1
      ,cudnn降级为
      7.6.5
      (从第四次组合时)时,它的工作方式应该是一样的

      我不知道为什么它不能在最新版本下工作,或者在tensorflow
      2.4.0
      的有效组合下工作


      它工作得很好,所以我的问题解决了。尽管如此,我还是很高兴知道为什么在更高版本上使用LSTM和cudnn对我不起作用,因为我在任何地方都没有发现这个问题。

      模型编码正确,因此我相信错误可能与GPU内存(缺少内存)等有关。为确保情况并非如此,您可以尝试使用相同的型号,但单位要少得多(例如,8而不是256)和/或尝试减少批量大小。谢谢,我尝试了,但没有帮助。我编辑了这篇文章,并将其与我尝试过的其他内容一起列出