Python Pytorch:如何将L1正则化器添加到激活中?

Python Pytorch:如何将L1正则化器添加到激活中?,python,pytorch,Python,Pytorch,我想将L1正则化器添加到ReLU的激活输出中。 更一般地说,如何仅向网络中的特定层添加正则化器 相关材料: 指添加L2正则化,但似乎将正则化惩罚添加到网络的所有层 nn.modules.loss.L1Loss()似乎相关,但我还不知道如何使用它 遗留模块似乎也很相关,但为什么它会被弃用呢 以下是如何做到这一点: 在模块的前向返回最终输出和要应用L1正则化的层输出中 损失变量将是输出w.r.t.目标和L1惩罚的交叉熵损失之和 下面是一个示例代码 import torch from

我想将L1正则化器添加到ReLU的激活输出中。 更一般地说,如何仅向网络中的特定层添加正则化器


相关材料:

  • 指添加L2正则化,但似乎将正则化惩罚添加到网络的所有层

  • nn.modules.loss.L1Loss()
    似乎相关,但我还不知道如何使用它

  • 遗留模块似乎也很相关,但为什么它会被弃用呢


以下是如何做到这一点:

  • 在模块的前向返回最终输出和要应用L1正则化的层输出中
  • 损失
    变量将是输出w.r.t.目标和L1惩罚的交叉熵损失之和
下面是一个示例代码

import torch
from torch.autograd import Variable
from torch.nn import functional as F


class MLP(torch.nn.Module):
    def __init__(self):
        super(MLP, self).__init__()
        self.linear1 = torch.nn.Linear(128, 32)
        self.linear2 = torch.nn.Linear(32, 16)
        self.linear3 = torch.nn.Linear(16, 2)

    def forward(self, x):
        layer1_out = F.relu(self.linear1(x))
        layer2_out = F.relu(self.linear2(layer1_out))
        out = self.linear3(layer2_out)
        return out, layer1_out, layer2_out

batchsize = 4
lambda1, lambda2 = 0.5, 0.01

model = MLP()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)

# usually following code is looped over all batches 
# but let's just do a dummy batch for brevity

inputs = Variable(torch.rand(batchsize, 128))
targets = Variable(torch.ones(batchsize).long())

optimizer.zero_grad()
outputs, layer1_out, layer2_out = model(inputs)
cross_entropy_loss = F.cross_entropy(outputs, targets)

all_linear1_params = torch.cat([x.view(-1) for x in model.linear1.parameters()])
all_linear2_params = torch.cat([x.view(-1) for x in model.linear2.parameters()])
l1_regularization = lambda1 * torch.norm(all_linear1_params, 1)
l2_regularization = lambda2 * torch.norm(all_linear2_params, 2)

loss = cross_entropy_loss + l1_regularization + l2_regularization
loss.backward()
optimizer.step()
@萨桑克奇拉姆库蒂 正则化应该是模型每一层的加权参数,而不是每一层的输出。请看下面:


我认为最初的帖子希望正则化ReLU的输出,所以正则化器应该在输出上,而不是网络的权重上。他们不一样

  • 用l1范数正则化权值,训练具有稀疏权值的神经网络

  • 通过l1范数正则化,一层的输出是训练一个网络,该网络具有该特定层的稀疏输出

以上答案(包括被接受的答案)要么没有抓住重点,要么我误解了原来的帖子问题

所有(其他当前)响应在某些方面都不正确。最接近的是,它建议对输出的规范求和,这是正确的,但代码对权重的规范求和,这是错误的

正确的方法不是修改网络代码,而是通过前向钩子捕获输出,如
OutputHook
类。从这里开始,输出规范的求和非常简单,但是每次迭代都需要注意清除捕获的输出

import torch


class OutputHook(list):
    """ Hook to capture module outputs.
    """
    def __call__(self, module, input, output):
        self.append(output)


class MLP(torch.nn.Module):
    def __init__(self):
        super(MLP, self).__init__()
        self.linear1 = torch.nn.Linear(128, 32)
        self.linear2 = torch.nn.Linear(32, 16)
        self.linear3 = torch.nn.Linear(16, 2)
        # Instantiate ReLU, so a hook can be registered to capture its output.
        self.relu = torch.nn.ReLU()

    def forward(self, x):
        layer1_out = self.relu(self.linear1(x))
        layer2_out = self.relu(self.linear2(layer1_out))
        out = self.linear3(layer2_out)
        return out


batch_size = 4
l1_lambda = 0.01

model = MLP()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)
# Register hook to capture the ReLU outputs. Non-trivial networks will often
# require hooks to be applied more judiciously.
output_hook = OutputHook()
model.relu.register_forward_hook(output_hook)

inputs = torch.rand(batch_size, 128)
targets = torch.ones(batch_size).long()

optimizer.zero_grad()
outputs = model(inputs)
cross_entropy_loss = torch.nn.functional.cross_entropy(outputs, targets)

# Compute the L1 penalty over the ReLU outputs captured by the hook.
l1_penalty = 0.
for output in output_hook:
    l1_penalty += torch.norm(output, 1)
l1_penalty *= l1_lambda

loss = cross_entropy_loss + l1_penalty
loss.backward()
optimizer.step()
output_hook.clear()

您可以使用以下代码将模型的单个层的权重的L1正则化
my_层
应用于损失函数:

def l1_惩罚(参数,l1_λ=0.001):
“”“返回参数的L1惩罚。”“”
l1_norm=sum(p.abs().sum()表示参数中的p)
返回l1_λ*l1_范数
损耗=损耗fn(输出、标签)+l1\u惩罚(my\u layer.parameters())

对于相对高级的解决方案,您可以查看。这为您提供了一个类似于keras的接口,可以在pytorch中轻松完成许多事情,特别是添加各种正则化器。谢谢,我没有意识到您可以更改核心函数(如forward()的“签名”)这不会正则化层的权重吗?我猜最初的海报想要规范化层的输出,而不是权重。如何才能只规范(使稀疏)PyTorch中的激活?看起来答案中有错误。对于
范数(所有线性2参数,2)
:torch返回L2正则化的平方根。也就是说,应该将表达式提高到2的幂,为什么在不使用这些变量的情况下需要从forward返回layer1\u out、layer2\u out?这会使权重正则化,您应该正则化返回的层输出(即激活)。这就是为什么你一开始就把它们退了!正则化术语应该类似于:
l1\u正则化=lambda1*火炬。norm(layer1\u out,1)
l2\u正则化=lambda2*火炬。norm(layer2\u out,2)
看起来答案中有错误。对于
norm(param,2)
:torch返回L2正则化的平方根。也就是说,表达式应该提高到2调整权重的能力更为标准,但有工作表明(L1)调整激活更为可取。正如上面的答案所指出的,任何东西都可以进行调整。这应该是公认的答案
import torch


class OutputHook(list):
    """ Hook to capture module outputs.
    """
    def __call__(self, module, input, output):
        self.append(output)


class MLP(torch.nn.Module):
    def __init__(self):
        super(MLP, self).__init__()
        self.linear1 = torch.nn.Linear(128, 32)
        self.linear2 = torch.nn.Linear(32, 16)
        self.linear3 = torch.nn.Linear(16, 2)
        # Instantiate ReLU, so a hook can be registered to capture its output.
        self.relu = torch.nn.ReLU()

    def forward(self, x):
        layer1_out = self.relu(self.linear1(x))
        layer2_out = self.relu(self.linear2(layer1_out))
        out = self.linear3(layer2_out)
        return out


batch_size = 4
l1_lambda = 0.01

model = MLP()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)
# Register hook to capture the ReLU outputs. Non-trivial networks will often
# require hooks to be applied more judiciously.
output_hook = OutputHook()
model.relu.register_forward_hook(output_hook)

inputs = torch.rand(batch_size, 128)
targets = torch.ones(batch_size).long()

optimizer.zero_grad()
outputs = model(inputs)
cross_entropy_loss = torch.nn.functional.cross_entropy(outputs, targets)

# Compute the L1 penalty over the ReLU outputs captured by the hook.
l1_penalty = 0.
for output in output_hook:
    l1_penalty += torch.norm(output, 1)
l1_penalty *= l1_lambda

loss = cross_entropy_loss + l1_penalty
loss.backward()
optimizer.step()
output_hook.clear()