Neural network 如何使用pytorch高效地计算大型数据集中每个示例的梯度?
给定一个经过训练的模型(M),我感兴趣的是计算池中新的(看不见的)示例的效用(用于主动学习任务)。为此,我需要计算在每个新示例上训练M时梯度的大小。在代码中,它类似于:Neural network 如何使用pytorch高效地计算大型数据集中每个示例的梯度?,neural-network,pytorch,gradient,Neural Network,Pytorch,Gradient,给定一个经过训练的模型(M),我感兴趣的是计算池中新的(看不见的)示例的效用(用于主动学习任务)。为此,我需要计算在每个新示例上训练M时梯度的大小。在代码中,它类似于: losses, grads = [], [] for i in range(X_pool.shape[0]): pred = model(X_pool[i:i+1]) loss = loss_func(pred, y_pool[i:i+1]) model.zero_grad() loss.bac
losses, grads = [], []
for i in range(X_pool.shape[0]):
pred = model(X_pool[i:i+1])
loss = loss_func(pred, y_pool[i:i+1])
model.zero_grad()
loss.backward()
losses.append(loss)
grads.append(layer.weight.grad.norm())
然而,当有大量示例时,这相当慢,特别是因为这将是我的场景中的内部循环。在pytorch中有没有更有效的方法来实现这一点?根据代码,看起来您只是在查看模型中某一层的渐变。您可以将该层拆分为多个副本,每个副本只接受批次的一个组件。这样,梯度只针对特定样本计算,而其他地方则使用批处理 下面是一个完整的示例,将您的方法(方法1)与我提出的方法(方法2)进行比较。这应该很容易扩展到更复杂的网络
import torch
import torch.nn as nn
import copy
batch_size = 50
num_classes = 10
class SimpleModel(nn.Module):
def __init__(self, num_classes):
super(SimpleModel, self).__init__()
# input 3x10x10
self.conv1 = nn.Conv2d(3, 10, kernel_size=3, padding=1, bias=False)
# 10x10x10
self.conv2 = nn.Conv2d(10, 20, kernel_size=3, stride=2, padding=1, bias=False)
# 20x5x5
self.fc = nn.Linear(20*5*5, num_classes, bias=False)
def forward(self, x):
x = self.conv1(x)
x = self.conv2(x)
x = x.view(x.shape[0], -1)
return self.fc(x)
def method1(model, X_pool, y_pool):
loss_func = nn.CrossEntropyLoss()
layer = model.conv2
losses, grads = [], []
for i in range(X_pool.shape[0]):
pred = model(X_pool[i:i+1])
loss = loss_func(pred, y_pool[i:i+1])
model.zero_grad()
loss.backward()
losses.append(loss)
grads.append(layer.weight.grad.norm())
return losses, grads
def method2(model, X_pool, y_pool):
class Replicated(nn.Module):
""" Instead of running a batch through one layer, run individuals through copies of layer """
def __init__(self, layer, batch_size):
super(Replicated, self).__init__()
self.batch_size = batch_size
self.layers = [copy.deepcopy(layer) for _ in range(batch_size)]
def forward(self, x):
assert x.shape[0] <= self.batch_size
return torch.stack([self.layers[idx](x[idx:idx+1, :]) for idx in range(x.shape[0])])
# compute individual loss functions so we can return them
loss_func = nn.CrossEntropyLoss(reduction='none')
# replace layer in model with replicated layer
layer = model.conv2
model.conv2 = Replicated(layer, batch_size)
layers = model.conv2.layers
# batch of predictions
pred = model(X_pool)
losses = loss_func(pred, y_pool)
# reduce with sum so that the individual loss terms aren't scaled (like with mean) which would also scale the gradients
loss = torch.sum(losses)
model.zero_grad()
loss.backward()
# gradients of each layer scaled by batch_size to match original
grads = [layers[idx].weight.grad.norm() for idx in range(X_pool.shape[0])]
# convert to list of tensors to match method1 output
losses = [l for l in losses]
# put original layer back
model.conv2 = layer
return losses, grads
model = SimpleModel(num_classes)
X_pool = torch.rand(batch_size, 3, 10, 10)
y_pool = torch.randint(0, num_classes, (batch_size,))
losses2, grads2 = method2(model, X_pool, y_pool)
losses1, grads1 = method1(model, X_pool, y_pool)
print("Losses Diff:", sum([abs(l1.item()-l2.item()) for l1,l2 in zip(losses1, losses2)]))
print("Grads Diff:", sum([abs(g1.item()-g2.item()) for g1,g2 in zip(grads1, grads2)]))
我没有在更大的网络中进行测试,但我使用了batch\u size
并在网络中运行了多个批次,在这个简单的模型中,速度提高了2-3倍。在更复杂的模型中,它应该更为重要,因为除了复制的层之外,在所有层上都可以获得批处理的性能优势
警告这可能不适用于DataParallel
Losses Diff: 3.337860107421875e-06
Grads Diff: 1.9431114196777344e-05