Python 3.x 使用Keras/Tensorflow或autograd计算验证误差w.r.t输入的梯度

Python 3.x 使用Keras/Tensorflow或autograd计算验证误差w.r.t输入的梯度,python-3.x,keras,tensorflow2.0,autograd,Python 3.x,Keras,Tensorflow2.0,Autograd,我需要计算验证误差w.r.t输入x的梯度。我试图看到当我扰动其中一个训练样本时,验证错误的变化有多大 验证误差(E)明确取决于模型权重(W) 模型权重明确取决于输入(x和y) 因此,验证错误隐式地取决于输入 我试图直接计算ew.r.tx的梯度。 另一种方法是计算ew.r.t的梯度w(可以轻松计算)和ww.r.t的梯度x(目前无法计算),这将允许计算ew.r.t的梯度x 我附上了一个玩具的例子。提前谢谢 import numpy as np import mnist from tensorfl

我需要计算验证误差w.r.t输入x的梯度。我试图看到当我扰动其中一个训练样本时,验证错误的变化有多大

  • 验证误差(E)明确取决于模型权重(W)
  • 模型权重明确取决于输入(x和y)
  • 因此,验证错误隐式地取决于输入
我试图直接计算ew.r.tx的梯度。 另一种方法是计算ew.r.t的梯度w(可以轻松计算)和ww.r.t的梯度x(目前无法计算),这将允许计算ew.r.t的梯度x

我附上了一个玩具的例子。提前谢谢

import numpy as np
import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.utils import to_categorical
import tensorflow as tf
from autograd import grad

train_images = mnist.train_images()
train_labels = mnist.train_labels()
test_images = mnist.test_images()
test_labels = mnist.test_labels()

# Normalize the images.
train_images = (train_images / 255) - 0.5
test_images = (test_images / 255) - 0.5

# Flatten the images.
train_images = train_images.reshape((-1, 784))
test_images = test_images.reshape((-1, 784))

# Build the model.
model = Sequential([
    Dense(64, activation='relu', input_shape=(784,)),
    Dense(64, activation='relu'),
    Dense(10, activation='softmax'),
])

# Compile the model.
model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy'],
)

# Train the model.
model.fit(
    train_images,
    to_categorical(train_labels),
    epochs=5,
    batch_size=32,
)
model.save_weights('model.h5')
# Load the model's saved weights.
# model.load_weights('model.h5')

calculate_mse = tf.keras.losses.MeanSquaredError()

test_x = test_images[:5]
test_y = to_categorical(test_labels)[:5]

train_x = train_images[:1]
train_y = to_categorical(train_labels)[:1]

train_y = tf.convert_to_tensor(train_y, np.float32)
train_x = tf.convert_to_tensor(train_x, np.float64)

with tf.GradientTape() as tape:
    tape.watch(train_x)
    model.fit(train_x, train_y, epochs=1, verbose=0)
    valid_y_hat = model(test_x, training=False)
    mse = calculate_mse(test_y, valid_y_hat)
de_dx = tape.gradient(mse, train_x)
print(de_dx)


# approach 2 - does not run
def calculate_validation_mse(x):
    model.fit(x, train_y, epochs=1, verbose=0)
    valid_y_hat = model(test_x, training=False)
    mse = calculate_mse(test_y, valid_y_hat)
    return mse


train_x = train_images[:1]
train_y = to_categorical(train_labels)[:1]

validation_gradient = grad(calculate_validation_mse)
de_dx = validation_gradient(train_x)
print(de_dx)



以下是你可以做到这一点的方法。推导如下

没什么值得注意的

  • 由于colab(代码中标记的行)中的内存不足,我已将功能大小从784减少到256。可能需要做一些mem分析来找出原因
  • 仅计算第一层的渐变。易于扩展到其他层
免责声明:据我所知,此推导是正确的。请做一些研究,并核实情况是否属实。对于较大的输入和层大小,您将遇到内存问题

import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.utils import to_categorical
import tensorflow as tf

f = 256

model = Sequential([
    Dense(64, activation='relu', input_shape=(f,)),
    Dense(64, activation='relu'),
    Dense(10, activation='softmax'),
])

# Compile the model.
model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy'],
)
w = model.weights[0]

# Inputs and labels
x_tr = tf.Variable(np.random.normal(size=(1,f)), shape=(1, f), dtype='float32')
y_tr = np.random.choice([0,1,2,3,4,5,6,7,8,9], size=(1,1))
y_tr_onehot = tf.keras.utils.to_categorical(y_tr, num_classes=10).astype('float32')
x_v = tf.Variable(np.random.normal(size=(1,f)), shape=(1, f), dtype='float32')
y_v = np.random.choice([0,1,2,3,4,5,6,7,8,9], size=(1,1))
y_v_onehot = tf.keras.utils.to_categorical(y_v, num_classes=10).astype('float32')

# In the context of GradientTape

with tf.GradientTape() as tape1:

  with tf.GradientTape() as tape2:
    y_tr_pred = model(x_tr)   
    tr_loss = tf.keras.losses.MeanSquaredError()(y_tr_onehot, y_tr_pred)

  tmp_g = tape2.gradient(tr_loss, w)
  print(tmp_g.shape)

# d(dE_tr/d(theta))/dx
# Warning this step consumes lot of memory for large layers
lr = 0.001
grads_1 = -lr * tape1.jacobian(tmp_g, x_tr)

with tf.GradientTape() as tape3:
  y_v_pred = model(x_v)   
  v_loss = tf.keras.losses.MeanSquaredError()(y_v_onehot, y_v_pred)

# dE_val/d(theta)
grads_2 = tape3.gradient(v_loss, w)[tf.newaxis, :]

# Just crunching the dimension to get the final desired shape of (1,256)
grad = tf.matmul(tf.reshape(grads_2,[1, -1]), tf.reshape(tf.transpose(grads_1,[2,1,0,3]),[1, -1, 256]))