.grad()在pytorch中不返回任何值
我试图为参数估计编写一个简单的脚本(这里参数是权重)。当.grad()返回None时,我面临一个问题。我在理论和实践上也经历和理解了这个概念。对我来说,下面的脚本应该可以工作,但不幸的是,它不工作 我的第一次尝试:下面的脚本是我的第一次尝试.grad()在pytorch中不返回任何值,pytorch,autograd,Pytorch,Autograd,我试图为参数估计编写一个简单的脚本(这里参数是权重)。当.grad()返回None时,我面临一个问题。我在理论和实践上也经历和理解了这个概念。对我来说,下面的脚本应该可以工作,但不幸的是,它不工作 我的第一次尝试:下面的脚本是我的第一次尝试 alpha_xy = torch.tensor(3.7, device=device, dtype=torch.float, requires_grad=True) beta_y = torch.tensor(1.5, device=device, dtyp
alpha_xy = torch.tensor(3.7, device=device, dtype=torch.float, requires_grad=True)
beta_y = torch.tensor(1.5, device=device, dtype=torch.float, requires_grad=True)
alpha0 = torch.tensor(1.1, device=device, dtype=torch.float, requires_grad=True)
alpha_y = torch.tensor(0.9, device=device, dtype=torch.float, requires_grad=True)
alpha1 = torch.tensor(0.1, device=device, dtype=torch.float, requires_grad=True)
alpha2 = torch.tensor(0.9, device=device, dtype=torch.float, requires_grad=True)
alpha3 = torch.tensor(0.001, device=device, dtype=torch.float, requires_grad=True)
learning_rate = 1e-4
total_loss = []
for epoch in tqdm(range(500)):
loss_1 = 0
for j in range(x_train.size(0)):
input = x_train[j:j+1]
target = y_train[j:j+1]
input = input.to(device,non_blocking=True)
target = target.to(device,non_blocking=True)
x_dt = gamma*input[0][0] + \
alpha_xy*input[0][0]*input[0][2] + \
alpha1*input[0][0]
y0_dt = beta_y*input[0][0] + \
alpha2*input[0][1]
y_dt = alpha0*input[0][1] + \
alpha_y*input[0][2] + \
alpha3*input[0][0]*input[0][2]
pred = torch.tensor([[x_dt],
[y0_dt],
[y_dt]],device=device
)
loss = (pred - target).pow(2).sum()
loss_1 += loss
loss.backward()
print(pred.grad, x_dt.grad, gamma.grad)
上面的代码抛出一条错误消息
张量的元素0不需要梯度,也没有梯度fn
在第loss.backward()行
我的尝试2:第一次尝试的改进如下:
gamma = torch.tensor(2.0, device=device, dtype=torch.float, requires_grad=True)
alpha_xy = torch.tensor(3.7, device=device, dtype=torch.float, requires_grad=True)
beta_y = torch.tensor(1.5, device=device, dtype=torch.float, requires_grad=True)
alpha0 = torch.tensor(1.1, device=device, dtype=torch.float, requires_grad=True)
alpha_y = torch.tensor(0.9, device=device, dtype=torch.float, requires_grad=True)
alpha1 = torch.tensor(0.1, device=device, dtype=torch.float, requires_grad=True)
alpha2 = torch.tensor(0.9, device=device, dtype=torch.float, requires_grad=True)
alpha3 = torch.tensor(0.001, device=device, dtype=torch.float, requires_grad=True)
learning_rate = 1e-4
total_loss = []
for epoch in tqdm(range(500)):
loss_1 = 0
for j in range(x_train.size(0)):
input = x_train[j:j+1]
target = y_train[j:j+1]
input = input.to(device,non_blocking=True)
target = target.to(device,non_blocking=True)
x_dt = gamma*input[0][0] + \
alpha_xy*input[0][0]*input[0][2] + \
alpha1*input[0][0]
y0_dt = beta_y*input[0][0] + \
alpha2*input[0][1]
y_dt = alpha0*input[0][1] + \
alpha_y*input[0][2] + \
alpha3*input[0][0]*input[0][2]
pred = torch.tensor([[x_dt],
[y0_dt],
[y_dt]],device=device,
dtype=torch.float,
requires_grad=True)
loss = (pred - target).pow(2).sum()
loss_1 += loss
loss.backward()
print(pred.grad, x_dt.grad, gamma.grad)
# with torch.no_grad():
# gamma -= leraning_rate * gamma.grad
现在脚本正在工作,但除了pred.gred之外,其他两个脚本都没有返回任何值
我想在计算loss.backward()后更新所有参数并进行更新,但由于没有,所以无法进行更新。有人能建议我如何改进这个脚本吗?谢谢。您为pred
声明了一个新的张量,这打破了计算图表。相反,您可以使用torch.stack
。另外,x_dt
和pred
是非叶张量,因此默认情况下不会保留渐变。您可以使用.retain\u grad()
覆盖此行为
封闭式解决方案
假设您希望优化函数顶部定义的参数gamma
,alpha_xy
,beta_y
,等等。。。那么你这里有一个例子。有关该主题的更友好的介绍,请参见。看看pred
的组件,您会注意到x_dt
、y0_dt
和y_dt
在参数方面实际上是相互独立的(在这种情况下很明显,因为它们各自使用完全不同的参数)。这使得问题变得更容易,因为这意味着我们可以分别优化术语(x_dt-target[0])**2
,(y0_dt-target[1])**2
和(y_dt-target[2])**2
如果不深入细节,解决方案(没有反向传播或梯度下降)最终会失败
# supposing x_train is [N,3] and y_train is [N,3]
x1 = torch.stack((x_train[:, 0], x_train[:, 0] * x_train[:, 2]), dim=0)
y1 = y_train[:, 0].unsqueeze(1)
# avoid inverses using solve to get p1 = inv(x1 . x1^T) . x1 . y1
p1, _ = torch.solve(x1 @ y1, x1 @ x1.transpose(1, 0))
# gamma and alpha1 are redundant. As long as gamma + alpha1 = p1[0] we get the same optimal value for loss
gamma = p1[0] / 2
alpha_xy = p1[1]
alpha1 = p1[0] / 2
x2 = torch.stack((x_train[:, 0], x_train[:, 1]), dim=0)
y2 = y_train[:, 1].unsqueeze(1)
p2, _ = torch.solve(x2 @ y2, x2 @ x2.transpose(1, 0))
beta_y = p2[0]
alpha2 = p2[1]
x3 = torch.stack((x_train[:, 1], x_train[:, 2], x_train[:, 0] * x_train[:, 2]), dim=0)
y3 = y_train[:, 2].unsqueeze(1)
p3, _ = torch.solve(x3 @ y3, x3 @ x3.transpose(1, 0))
alpha0 = p3[0]
alpha_y = p3[1]
alpha3 = p3[2]
loss_1 = torch.sum((x1.transpose(1, 0) @ p1 - y1)**2 + (x2.transpose(1, 0) @ p2 - y2)**2 + (x3.transpose(1, 0) @ p3 - y3)**2)
mse = loss_1 / x_train.size(0)
为了测试这段代码是否有效,我生成了一些我知道底层模型系数的假数据(添加了一些噪声,因此最终结果不会完全符合预期)
导致
loss_1: 1491.731201171875
MSE: 0.029834624379873276
Expected 0.5, 2.0, 0.5, 3.0, 4.0, 5.0, 6.0, 7.0
Actual 0.50002 2.0011 0.50002 3.0009 3.9997 5.0000 6.0002 6.9994
谢谢你的回复。脚本正在运行。我还添加了其他参数更新规则,但现在我发现损失不断增加,然后设置为nan
。这是意外的,因为更新规则旨在减少损失。您只是在优化代码顶部定义的8个参数吗?还有,什么是x\u train.shape
和y\u train.shape
?如果这两个参数都是[500,3],并且如果您的目标是最小化loss_1
w.r.t.,则参数alpha_xy、alpha1等。。然后似乎有一个简单的封闭式解决方案(在初始检查时),如果您愿意,我将发布。是的。我想训练我的模型8个参数和x\u序列。shape
和y\u序列。shape
是50000。你可以发布这个问题的封闭式解决方案吗。谢谢
def gen_fake_data(samples=50000):
x_train = torch.randn(samples, 3)
# define fake data with known minimal solutions
x1 = torch.stack((x_train[:, 0], x_train[:, 0] * x_train[:, 2]), dim=0)
x2 = torch.stack((x_train[:, 0], x_train[:, 1]), dim=0)
x3 = torch.stack((x_train[:, 1], x_train[:, 2], x_train[:, 0] * x_train[:, 2]), dim=0)
y1 = x1.transpose(1, 0) @ torch.tensor([[1.0], [2.0]]) # gamma + alpha1 = 1.0
y2 = x2.transpose(1, 0) @ torch.tensor([[3.0], [4.0]])
y3 = x3.transpose(1, 0) @ torch.tensor([[5.0], [6.0], [7.0]])
y_train = torch.cat((y1, y2, y3), dim=1) + 0.1 * torch.randn(samples, 3)
return x_train, y_train
x_train, y_train = gen_fake_data()
# optimization code from above
...
print('loss_1:', loss_1.item())
print('MSE:', mse.item())
print('Expected 0.5, 2.0, 0.5, 3.0, 4.0, 5.0, 6.0, 7.0')
print('Actual', gamma.item(), alpha_xy.item(), alpha1.item(), beta_y.item(), alpha2.item(), alpha0.item(), alpha_y.item(), alpha3.item())
loss_1: 1491.731201171875
MSE: 0.029834624379873276
Expected 0.5, 2.0, 0.5, 3.0, 4.0, 5.0, 6.0, 7.0
Actual 0.50002 2.0011 0.50002 3.0009 3.9997 5.0000 6.0002 6.9994