即使所有变量都需要_grad=False,PyTorch损失也会减少
当我使用PyTorch创建一个神经网络时,使用即使所有变量都需要_grad=False,PyTorch损失也会减少,pytorch,Pytorch,当我使用PyTorch创建一个神经网络时,使用torch.nn.Sequential方法定义层,默认情况下,参数具有requires_grad=False。但是,当我训练这个网络时,损耗会减少。如果没有通过渐变更新图层,这怎么可能 例如,这是定义我的网络的代码: class Network(torch.nn.Module): def __init__(self): super(Network, self).__init__() self.layers = torch.nn.Se
torch.nn.Sequential
方法定义层,默认情况下,参数具有requires_grad=False
。但是,当我训练这个网络时,损耗会减少。如果没有通过渐变更新图层,这怎么可能
例如,这是定义我的网络的代码:
class Network(torch.nn.Module):
def __init__(self):
super(Network, self).__init__()
self.layers = torch.nn.Sequential(
torch.nn.Linear(10, 5),
torch.nn.Linear(5, 2)
)
print('Network Parameters:')
model_dict = self.state_dict()
for param_name in model_dict:
param = model_dict[param_name]
print('Name: ' + str(param_name))
print('\tRequires Grad: ' + str(param.requires_grad))
def forward(self, input):
prediction = self.layers(input)
return prediction
network = Network()
network.train()
optimiser = torch.optim.SGD(network.parameters(), lr=0.001)
criterion = torch.nn.MSELoss()
inputs = np.random.random([100, 10]).astype(np.float32)
inputs = torch.from_numpy(inputs)
labels = np.random.random([100, 2]).astype(np.float32)
labels = torch.from_numpy(labels)
while True:
prediction = network.forward(inputs)
loss = criterion(prediction, labels)
print('loss = ' + str(loss.item()))
optimiser.zero_grad()
loss.backward()
optimiser.step()
这个打印出来:
Network Parameters:
Name: layers.0.weight
Requires Grad: False
Name: layers.0.bias
Requires Grad: False
Name: layers.1.weight
Requires Grad: False
Name: layers.1.bias
Requires Grad: False
loss = 0.284633219242
loss = 0.278225809336
loss = 0.271959483624
loss = 0.265835255384
loss = 0.259853869677
loss = 0.254015892744
loss = 0.248321473598
loss = 0.242770522833
loss = 0.237362638116
loss = 0.232097044587
loss = 0.226972639561
loss = 0.221987977624
loss = 0.217141270638
loss = 0.212430402637
loss = 0.207852959633
loss = 0.203406244516
loss = 0.199087426066
loss = 0.19489350915
loss = 0.190821439028
loss = 0.186868071556
loss = 0.183030322194
loss = 0.179305106401
loss = 0.175689414144
loss = 0.172180294991
loss = 0.168774917722
loss = 0.165470585227
loss = 0.162264674902
loss = 0.159154698253
这是训练我的网络的代码:
class Network(torch.nn.Module):
def __init__(self):
super(Network, self).__init__()
self.layers = torch.nn.Sequential(
torch.nn.Linear(10, 5),
torch.nn.Linear(5, 2)
)
print('Network Parameters:')
model_dict = self.state_dict()
for param_name in model_dict:
param = model_dict[param_name]
print('Name: ' + str(param_name))
print('\tRequires Grad: ' + str(param.requires_grad))
def forward(self, input):
prediction = self.layers(input)
return prediction
network = Network()
network.train()
optimiser = torch.optim.SGD(network.parameters(), lr=0.001)
criterion = torch.nn.MSELoss()
inputs = np.random.random([100, 10]).astype(np.float32)
inputs = torch.from_numpy(inputs)
labels = np.random.random([100, 2]).astype(np.float32)
labels = torch.from_numpy(labels)
while True:
prediction = network.forward(inputs)
loss = criterion(prediction, labels)
print('loss = ' + str(loss.item()))
optimiser.zero_grad()
loss.backward()
optimiser.step()
这个打印出来:
Network Parameters:
Name: layers.0.weight
Requires Grad: False
Name: layers.0.bias
Requires Grad: False
Name: layers.1.weight
Requires Grad: False
Name: layers.1.bias
Requires Grad: False
loss = 0.284633219242
loss = 0.278225809336
loss = 0.271959483624
loss = 0.265835255384
loss = 0.259853869677
loss = 0.254015892744
loss = 0.248321473598
loss = 0.242770522833
loss = 0.237362638116
loss = 0.232097044587
loss = 0.226972639561
loss = 0.221987977624
loss = 0.217141270638
loss = 0.212430402637
loss = 0.207852959633
loss = 0.203406244516
loss = 0.199087426066
loss = 0.19489350915
loss = 0.190821439028
loss = 0.186868071556
loss = 0.183030322194
loss = 0.179305106401
loss = 0.175689414144
loss = 0.172180294991
loss = 0.168774917722
loss = 0.165470585227
loss = 0.162264674902
loss = 0.159154698253
如果所有参数都有requires\u grad=False
,为什么损失会减少?这很有趣-state\u dict()
和parameters()
之间似乎存在差异:
等级网络(火炬网络模块):
定义初始化(自):
超级(网络,自我)。\uuuu初始化
self.layers=torch.nn.Sequential(
火炬nn线性(10,5),
火炬nn线性(5,2)
)
打印(self.layers[0]。weight.requires_grad)#True
打印(self.state_dict()
打印(列表(self.parameters())[0]。需要梯度)#True
def前进(自我,输入):
预测=自身层(输入)
收益预测
因此,你的损失似乎在减少,因为事实上网络正在学习,因为需要_grad
是正确的。(一般来说,对于调试,我更喜欢查询实际对象(self.layers[0]…
)
[编辑]Ahah-发现了问题:有一个keep\u vars
boolean选项,您可以将其传递到state\u dict
,该选项执行以下操作(以及其他操作):()
因此,如果需要实际的参数
,请使用keep_vars=True
——如果只需要数据,请使用默认的keep_vars=False
因此:
检查
sum([x.requires\u grad for x in model.parameters()])
在之前,而True
的总和是4。因此,看起来这些参数确实需要梯度,即使state\u dict表示不同。总结一下:OP的检查的方法需要
(使用state\u dict()
)不正确,所有参数的.requires\u grad
事实上都是True
。要获得正确的.requires\u grad
,可以使用.parameters()
或访问层。直接或通过keep\u vars=True
到state\u dict()
。