Python tensorflow v2上的CuDNNLSTM tf.v1层或使用Keras在Colab上实施CuDNN LSTM的培训

Python tensorflow v2上的CuDNNLSTM tf.v1层或使用Keras在Colab上实施CuDNN LSTM的培训,python,tensorflow,keras,deep-learning,google-colaboratory,Python,Tensorflow,Keras,Deep Learning,Google Colaboratory,我正在努力确保我的keras LSTM模型实际上是在Colab上的GPU上训练的(我有Colab Pro)。纪元的速度变化很大,所以我不知道发生了什么。我检查了LSTM层的keras文档,它说您必须满足以下要求: The requirements to use the cuDNN implementation are: 1. activation == tanh 2. recurrent_activation == sigmoid 3. recurrent_dropout == 0 4.

我正在努力确保我的keras LSTM模型实际上是在Colab上的GPU上训练的(我有Colab Pro)。纪元的速度变化很大,所以我不知道发生了什么。我检查了LSTM层的keras文档,它说您必须满足以下要求:

The requirements to use the cuDNN implementation are:
1. activation
 == tanh
2. recurrent_activation
 == sigmoid
3. recurrent_dropout
 == 0
4. unroll
 is False
5. use_bias
 is True
6. Inputs are not masked or strictly right padded.
它们都满足(无论如何都是默认值)。但是,当我将激活改为“relu”时,速度不会改变。这是没有意义的,因为如果不使用cuDNN实现,我预计速度会大幅下降(即,使用activation=relu而不是activation=tanh时的较慢速度)。模型如下:

def LSTM_model_2(LSTM_units=[200,200], n_steps_in=10, num_features=6, loss=custom_loss_metric, LSTM_activation='tanh', verbose=False):
  model = Sequential()
  model.add(LSTM(LSTM_units[0], activation=LSTM_activation, return_sequences=True, input_shape=(n_steps_in, num_features)))
  model.add(Dropout(0.2))
  model.add(LSTM(LSTM_units[1], activation=LSTM_activation))
  model.add(Dropout(0.2))
  model.add(Dense(num_features))
  # model.add(Activation("relu"))
  model.add(LeakyReLU(alpha=0.3))
  # opt_LSTM = SGD(lr=0.05, momentum=0.9, clipnorm=1.0)
  opt_LSTM = Adam(lr=0.001)
  model.compile(optimizer=opt_LSTM, loss=loss)
  if verbose: 
    print(model.summary())
  return model
def CuDNNLSTM_model(LSTM_units=[200,200], n_steps_in=10, num_features=6, loss=custom_loss_metric, verbose=False):
  model = tensorflow.keras.Sequential()
  model.add(CuDNNLSTM(LSTM_units[0], return_sequences=True, input_shape=(n_steps_in, num_features)))
  # model.add(tensorflow.keras.Dropout(0.2))
  model.add(CuDNNLSTM(LSTM_units[1]))
  # model.add(tensorflow.keras.Dropout(0.2))
  model.add(Dense(num_features))
  # model.add(Activation("relu"))
  model.add(LeakyReLU(alpha=0.3))
  # opt_LSTM = SGD(lr=0.05, momentum=0.9, clipnorm=1.0)
  # opt_LSTM = Adam(lr=0.001)
  model.compile(optimizer='adam', loss=loss)
  if verbose: 
    print(model.summary())
  return model
或者我发现tensorflow v1有一个特定的CuDNNLSTM层。我尝试这样导入它:

import tensorflow
from tensorflow.compat.v1.keras.layers import CuDNNLSTM 
但是我的模型不适用于这一层,因为它似乎无法混合和匹配v1和v2的内容,以下是模型:

def LSTM_model_2(LSTM_units=[200,200], n_steps_in=10, num_features=6, loss=custom_loss_metric, LSTM_activation='tanh', verbose=False):
  model = Sequential()
  model.add(LSTM(LSTM_units[0], activation=LSTM_activation, return_sequences=True, input_shape=(n_steps_in, num_features)))
  model.add(Dropout(0.2))
  model.add(LSTM(LSTM_units[1], activation=LSTM_activation))
  model.add(Dropout(0.2))
  model.add(Dense(num_features))
  # model.add(Activation("relu"))
  model.add(LeakyReLU(alpha=0.3))
  # opt_LSTM = SGD(lr=0.05, momentum=0.9, clipnorm=1.0)
  opt_LSTM = Adam(lr=0.001)
  model.compile(optimizer=opt_LSTM, loss=loss)
  if verbose: 
    print(model.summary())
  return model
def CuDNNLSTM_model(LSTM_units=[200,200], n_steps_in=10, num_features=6, loss=custom_loss_metric, verbose=False):
  model = tensorflow.keras.Sequential()
  model.add(CuDNNLSTM(LSTM_units[0], return_sequences=True, input_shape=(n_steps_in, num_features)))
  # model.add(tensorflow.keras.Dropout(0.2))
  model.add(CuDNNLSTM(LSTM_units[1]))
  # model.add(tensorflow.keras.Dropout(0.2))
  model.add(Dense(num_features))
  # model.add(Activation("relu"))
  model.add(LeakyReLU(alpha=0.3))
  # opt_LSTM = SGD(lr=0.05, momentum=0.9, clipnorm=1.0)
  # opt_LSTM = Adam(lr=0.001)
  model.compile(optimizer='adam', loss=loss)
  if verbose: 
    print(model.summary())
  return model
如果我只使用
keras.Sequential()
它将找不到
CuDNNLSTM
层,但如果我使用
tensorflow.keras.Sequential()
则找不到
辍学
密集
泄漏吕路
层:

TypeError: The added layer must be an instance of class Layer. Found: <keras.layers.core.Dense object at 0x7f736a163e10>
TypeError:添加的层必须是类层的实例。发现:
你知道如何确保它在CuDNN实现上运行吗? 作为参考,培训该模型时,X_列车大小约为750000个滚动窗口,每个窗口为10个时间步,y_列车大小相同,窗口为1个时间步,每个历元的培训时间约为1.5小时(批量大小=10)。这是典型的吗?看起来很长