Python 手动计算的梯度和PyTorch链规则之间的数值差异_Python_Deep Learning_Pytorch_Backpropagation

Python 手动计算的梯度和PyTorch链规则之间的数值差异

python deep-learning pytorch

Python 手动计算的梯度和PyTorch链规则之间的数值差异,python,deep-learning,pytorch,backpropagation,Python,Deep Learning,Pytorch,Backpropagation,我自己手动计算梯度，使用这个具有MSE损耗的非常简单的线性网络的公式然后，我将其与Pytork计算的梯度进行比较，并使用Pytork的allclose功能检查Pytork是否正确计算了梯度（即手动计算的梯度与Pytork之间的相对差值足够小）。由于公式正确，所有测试都应通过。但对于一些种子来说，它就是不存在显然PyTorch没有做错任何事，但由于公式是正确的，它一定来自公式中的一些数值不稳定性问题 import torch class Network(torch.nn.Module):

我自己手动计算梯度，使用这个具有MSE损耗的非常简单的线性网络的公式

然后，我将其与Pytork计算的梯度进行比较，并使用Pytork的

allclose

功能检查Pytork是否正确计算了梯度（即手动计算的梯度与Pytork之间的相对差值足够小）。
由于公式正确，所有测试都应通过。但对于一些种子来说，它就是不存在

显然PyTorch没有做错任何事，但由于公式是正确的，它一定来自公式中的一些数值不稳定性问题

import torch

class Network(torch.nn.Module):
    def __init__(self):
        super(Network, self).__init__()

        self.linear = torch.nn.Linear(10, 1)


    def forward(self, x):
        return self.linear(x)

loss = torch.nn.MSELoss()

for i in range(0, 1000):
    torch.manual_seed(i)
    X = torch.randn(100, 10)
    y = torch.randn(100, 1)





    model=Network()
    model.train()
    optimizer=torch.optim.SGD(model.parameters(),lr=1.)
    optimizer.zero_grad()
    output = loss(model(X), y)
    output.backward()

    torch_grads=[]
    for p in model.parameters():
        torch_grads.append(p.grad.detach().data)



    #df/dW = (-2X.T*y+2*X.T*b+2*X.T*X*W)/nsamples 
    #df/db = (2*b-2*y+2*W.T*X.T).mean() (the mean comes from implicit broadcasting of b)

    theory_grad_w = (-2 * torch.matmul(X.t(), y)
                     +2 * torch.matmul(torch.t(X), torch.ones((X.shape[0], 1)))* list(model.parameters())[1]
                     +2 * torch.matmul(torch.matmul(X.t(), X), list(model.parameters())[0].t())
                     ) / float(X.shape[0])

    theory_grad_w = theory_grad_w.t()


    theory_grad_b = torch.mean(2 * list(model.parameters())[1]- 2 * y+ 2 * torch.matmul((list(model.parameters())[0]), torch.t(X)))

    theory_grads = [theory_grad_w, theory_grad_b]

    b=all([torch.allclose(u, d) for u, d in zip(torch_grads, theory_grads)])
    if not(b):

      print("i=%s, pass=%s"%(i, b))

观测到的数值不稳定性的来源是什么，以及如何处理它们，以便测试能够一直通过。这只是操作顺序不同的问题吗？

尝试使用

np.testing来代替allclose（）。也许差别没有那么大。如果我想让测试通过，我必须将a_tol
增加100倍，所以这不是我的解决方案。（顺便说一句，np

还有一个

allclose

函数，您可以在其中使用参数）