Python 在LSTM tensorflow2.0中计算给定时间步长的输出相对于输入的导数_Python_Tensorflow_Lstm_Gradient Descent

Python 在LSTM tensorflow2.0中计算给定时间步长的输出相对于输入的导数

python tensorflow

Python 在LSTM tensorflow2.0中计算给定时间步长的输出相对于输入的导数,python,tensorflow,lstm,gradient-descent,Python,Tensorflow,Lstm,Gradient Descent,我编写了一个示例代码来生成我在项目中面临的实际问题。我正在使用tensorflow中的LSTM对一些时间序列数据进行建模。输入维度为（10100,1），即10个实例，100个时间步，特征数为1。输出的形状相同在训练模型之后，我想要实现的是研究每个特定时间步的每个输入对每个输出的影响。换句话说，我想看看在每个时间步，哪些输入变量对我的输出影响最大（或者哪个输入对输出影响最大/可能是较大的梯度）。以下是此问题的代码： tf.keras.backend.clear_session() tf.rand

我编写了一个示例代码来生成我在项目中面临的实际问题。我正在使用tensorflow中的LSTM对一些时间序列数据进行建模。输入维度为

（10100,1）

，即10个实例，100个时间步，特征数为1。输出的形状相同

在训练模型之后，我想要实现的是研究每个特定时间步的每个输入对每个输出的影响。换句话说，我想看看在每个时间步，哪些输入变量对我的输出影响最大（或者哪个输入对输出影响最大/可能是较大的梯度）。以下是此问题的代码：

tf.keras.backend.clear_session()
tf.random.set_seed(42)

model_input = tf.data.Dataset.from_tensor_slices(np.random.normal(size=(10, 100, 1)))
model_input = model_input.batch(10)
model_output = tf.data.Dataset.from_tensor_slices(np.random.normal(size=(10, 100, 1)))
model_output = model_output.batch(10)

my_dataset = tf.data.Dataset.zip((model_input, model_output))

m_inputs = tf.keras.Input(shape=(None, 1))

lstm_outputs = tf.keras.layers.LSTM(32, return_sequences=True)(m_inputs)
m_outputs = tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(1))(lstm_outputs)

my_model = tf.keras.Model(m_inputs, m_outputs, name="my_model")

my_optimizer=tf.keras.optimizers.Adam(learning_rate=0.001)
my_loss_fn = tf.keras.losses.MeanSquaredError()

my_epochs = 3

for epoch in range(my_epochs):

    for step, (x_batch_tr, y_batch_tr) in enumerate(my_dataset):
        x += 1
        # open a gradient tape to record the operations run during the forward pass, which enables autodifferentiation
        with tf.GradientTape() as tape:

            # Run the forward pass of the layer
            logits = my_model(x_batch_tr, training=True)

            # compute the loss value for this mismatch
            loss_value = my_loss_fn(y_batch_tr, logits)

        # use the gradient tape to automatically retrieve the gradients of the trainable variables with respect to the loss.
        grads = tape.gradient(loss_value, my_model.trainable_weights)

        # Run one step of gradient descent by updating the value of the variables to minimize the loss.
        my_optimizer.apply_gradients(zip(grads, my_model.trainable_weights))

        print(f"Step {step}, loss: {loss_value}")


print("\n\nCalculate gradient of ouptuts w.r.t inputs\n\n")

for step, (x_batch_tr, y_batch_tr) in enumerate(my_dataset):
    # open a gradient tape to record the operations run during the forward pass, which enables autodifferentiation
    with tf.GradientTape() as tape:

        tape.watch(x_batch_tr)

        # Run the forward pass of the layer
        logits = my_model(x_batch_tr, training=True)
        #tape.watch(logits[:, 10, :])   # this didn't help
        # compute the loss value for this mismatch
        loss_value = my_loss_fn(y_batch_tr, logits)

    # use the gradient tape to automatically retrieve the gradients of the trainable variables with respect to the loss.
#     grads = tape.gradient(logits, x_batch_tr)   # This works
#     print(grads.numpy().shape)                  # This works
    grads = tape.gradient(logits[:, 10, :], x_batch_tr)
    print(grads)

换句话说，我想关注那些对我的输出影响最大的输入（在每个特定的时间步）

对我来说，

grads=tape.gradient（logits，x_batch_tr）

不起作用，因为这会将所有输出的梯度与每个输入的梯度相加

但是，渐变始终为“无”

非常感谢您的帮助

您可以使用来精确获取该信息：

grads=tape.batch\u jacobian（logits，x\u batch\u tr）
打印（渐变形状）
# (10, 100, 1, 100, 1)

这里，

grads[i，t1，f1，t2，f2]

给出了例如

时输出特征

f1

相对于时间

t2

时输入特征

f2

的梯度。如果像在你的例子中一样，你只有一个特征，你可以说，

grads[i，t1，0，t2，0]

给出了

t1

相对于

t2

的梯度。您还可以方便地聚合此结果的不同轴或切片，以获得聚合的渐变。例如，

tf.reduce_sum（grads[：，：，：，：10]，axis=3）

将给出每个输出时间步相对于前十个输入时间步的梯度

关于获取

None

梯度在您的示例中，我认为这是因为您在梯度磁带上下文之外执行切片操作，因此梯度跟踪丢失。

因此解决方案是为我们需要在

磁带中使用的部分逻辑创建临时张量，并使用磁带在磁带上注册该张量。观察
应该这样做：
for step, (x_batch_tr, y_batch_tr) in enumerate(my_dataset):
    # open a gradient tape to record the operations run during the forward pass, which enables autodifferentiation
    with tf.GradientTape() as tape:

        tape.watch(x_batch_tr)

        # Run the forward pass of the layer
        logits = my_model(x_batch_tr, training=True)
        tensor_logits = tf.constant(logits[:, 10, :])
        tape.watch(tensor_logits)   # this didn't help

        # compute the loss value for this mismatch
        loss_value = my_loss_fn(y_batch_tr, logits)

    # use the gradient tape to automatically retrieve the gradients of the trainable variables with respect to the loss.
    grads = tape.gradient(tensor_logits, x_batch_tr)
    print(grads.numpy())

修复了梯度中轴顺序的解释。在初始批量尺寸标注之后，第一个轴对应于输出形状，最后一个轴对应于输入形状。感谢@jdehesa提供了我正在寻找的漂亮完整的答案。然而，调用batch_jacobian时出现的一个问题是，巨大的返回张量几乎冻结了我的计算机，使其具有完整的RAM。有什么想法吗？感谢you@I.A恐怕不多，计算起来相当昂贵。但是，您可以减少内存使用以换取更多的执行时间，可以尝试使用parallel\u iterations
参数或传递experimental\u use\u pfor=False
（我认为这需要您在急切模式下将persistent=True
传递到渐变磁带）。