Python 批量标准化使培训变得更糟

Python 批量标准化使培训变得更糟,python,neural-network,pytorch,batch-normalization,Python,Neural Network,Pytorch,Batch Normalization,我试图用Pytorch实现批量标准化,并使用一个简单的完全连接的神经网络来逼近给定的函数 代码如下。结果表明,不采用批量归一化技术的神经网络性能优于采用批量归一化技术的神经网络。这意味着批量规范化会使培训更加糟糕。有人能解释一下这个结果吗?谢谢 import matplotlib.pyplot as plt import numpy as np import torch class Net(torch.nn.Module): def __init__(self, num_in

我试图用Pytorch实现批量标准化,并使用一个简单的完全连接的神经网络来逼近给定的函数

代码如下。结果表明,不采用批量归一化技术的神经网络性能优于采用批量归一化技术的神经网络。这意味着批量规范化会使培训更加糟糕。有人能解释一下这个结果吗?谢谢

import matplotlib.pyplot as plt
import numpy as np
import torch

class Net(torch.nn.Module):
    
    def __init__(self, num_inputs, num_outputs, hidden_size=256, is_bn=True):
        super(Net, self).__init__()
        self.num_inputs = num_inputs
        self.num_outputs = num_outputs
        self.is_bn = is_bn
        
        # no bias is needed if batch normalization
        if self.is_bn:
            self.linear1 = torch.nn.Linear(num_inputs, hidden_size, bias=False)
            self.linear2 = torch.nn.Linear(hidden_size, hidden_size, bias=False)
        else:            
            self.linear1 = torch.nn.Linear(num_inputs, hidden_size)
            self.linear2 = torch.nn.Linear(hidden_size, hidden_size)
                
        self.linear3 = torch.nn.Linear(hidden_size, num_outputs)
        
        if self.is_bn:
            self.bn1 = torch.nn.BatchNorm1d(hidden_size)
            self.bn2 = torch.nn.BatchNorm1d(hidden_size)

        self.activation = torch.nn.ReLU()
        
    def forward(self, inputs):
        x = inputs
        if self.is_bn:
            x = self.activation(self.bn1(self.linear1(x)))
            x = self.activation(self.bn2(self.linear2(x)))
        else:
            x = self.activation(self.linear1(x))
            x = self.activation(self.linear2(x))
        out = self.linear3(x)        
        return out


torch.manual_seed(0)    # reproducible

Nx = 100
x = torch.linspace(-1., 1., Nx)
x = torch.reshape(x, (Nx, 1))
y = torch.sin(3*x)

fcn_bn, fcn_no_bn = Net(num_inputs=1, num_outputs=1, is_bn=True), Net(num_inputs=1, num_outputs=1, is_bn=False)

criterion = torch.nn.MSELoss()
optimizer_bn = torch.optim.Adam(fcn_bn.parameters(), lr=0.001)
optimizer_no_bn = torch.optim.Adam(fcn_no_bn.parameters(), lr=0.001)

total_epoch = 5000

# record loss history    
loss_history_bn = np.zeros(total_epoch)
loss_history_no_bn = np.zeros(total_epoch)

fcn_bn.train()
fcn_no_bn.train()
for epoch in range(total_epoch):
        
    optimizer_bn.zero_grad()
    loss = criterion(fcn_bn(x), y)    
    loss_history_bn[epoch] = loss.item()
    loss.backward()
    optimizer_bn.step()

    optimizer_no_bn.zero_grad()
    loss = criterion(fcn_no_bn(x), y)    
    loss_history_no_bn[epoch] = loss.item()
    loss.backward()
    optimizer_no_bn.step()
    
    if epoch%1000 == 0:
        print("epoch: %d; MSE (with bn): %.2e; MSE (without bn): %.2e"%(epoch, loss_history_bn[epoch], loss_history_no_bn[epoch]))
        
fcn_bn.eval()
fcn_no_bn.eval()

plt.figure()
plt.semilogy(np.arange(total_epoch), loss_history_bn, label='neural network (with bn)')
plt.semilogy(np.arange(total_epoch), loss_history_no_bn, label='neural network (without bn)')
plt.legend()

plt.figure()
plt.plot(x, y, '-', label='exact')
plt.plot(x, fcn_bn(x).detach(), 'o', markersize=2, label='neural network (with bn)')
plt.plot(x, fcn_no_bn(x).detach(), 'o', markersize=2, label='neural network (without bn)')
plt.legend()

plt.figure()
plt.plot(x, np.abs(fcn_bn(x).detach() - y), 'o', markersize=2, label='neural network (with bn)')
plt.plot(x, np.abs(fcn_no_bn(x).detach() - y), 'o', markersize=2, label='neural network (without bn)')
plt.legend()

plt.show()

结果如下:

epoch: 0; MSE (with bn): 3.99e-01; MSE (without bn): 4.84e-01
epoch: 1000; MSE (with bn): 4.70e-05; MSE (without bn): 1.27e-06
epoch: 2000; MSE (with bn): 1.81e-04; MSE (without bn): 7.93e-07
epoch: 3000; MSE (with bn): 2.73e-04; MSE (without bn): 7.45e-07
epoch: 4000; MSE (with bn): 4.04e-04; MSE (without bn): 5.68e-07

为哈立德提供了另一种观点,它更关注泛化性能而不是训练损失,认为这是:

批量标准化被认为具有规则化效果。将BN分解为总体标准化和伽马衰减,并观察类似的训练损失曲线(将BN与无BN进行比较——然而,请注意,他们使用的是普通SGD,而不是Adam)。有两件事会影响BN(正如Khalid所概述的):例如,一方面,批量大小应该足够大,以便对总体参数进行稳健估计,但是,随着批量大小的增加,泛化性能也会下降(见Luo等人的论文:要点是较低的批量会导致噪声总体参数估计,基本上会干扰输入)


在你的情况下,我不会凭直觉预期会有很大的不同(考虑到你的数据是如何设置的),但也许深入研究BN的理论分析的人仍能提供见解。

你的问题是什么?你能更清楚地提出一个要求吗?看起来你自己解决了。@Ivan结果表明批量规范化会使培训更糟。这是对的吗?@MLDev结果表明批量规范化会使培训更糟训练更糟糕。这是正确的吗?我认为这是一个有趣的观察,但我建议你编辑你的问题-包括标题-这样它就包含了一个实际的问题。这将帮助你找到答案。