Neural network FCN32模型不收敛,损失在某些点后波动。为什么?

Neural network FCN32模型不收敛,损失在某些点后波动。为什么?,neural-network,deep-learning,caffe,image-segmentation,pycaffe,Neural Network,Deep Learning,Caffe,Image Segmentation,Pycaffe,我正在努力训练fcn32。我正在为我自己的班数不平衡的数据建立训练模型。这是18000次迭代的学习曲线: 正如你所见,训练在某些方面正在减少,然后是波动。我在网上读到一些建议,他们建议降低学习率或改变填充层卷积层中的偏差值。所以,我所做的是,我对这两层做了如下更改: .... layer { name: "score_fr" type: "Convolution" bottom: "fc7" top: "score_fr" par

我正在努力训练fcn32。我正在为我自己的班数不平衡的数据建立训练模型。这是18000次迭代的学习曲线:

正如你所见,训练在某些方面正在减少,然后是波动。我在网上读到一些建议,他们建议降低学习率或改变填充层卷积层中的偏差值。所以,我所做的是,我对这两层做了如下更改:

 .... 
layer {
      name: "score_fr"
      type: "Convolution"
      bottom: "fc7"
      top: "score_fr"
      param {
        lr_mult: 1
        decay_mult: 1
      }
      param {
        lr_mult: 2
        decay_mult: 0
      }
      convolution_param {
        num_output: 5 # the number of classes
        pad: 0
        kernel_size: 1
        weight_filler {
            type: "xavier"
        }   
        bias_filler {
            type: "constant"
            value: 0.5 #+
        } 
      }
    }
    layer {
      name: "upscore"
      type: "Deconvolution"
      bottom: "score_fr"
      top: "upscore"
      param {
        lr_mult: 0
      }
      convolution_param {
        num_output: 5 # the number of classes
        bias_term: true   #false
        kernel_size: 64
        stride: 32
        group: 5 #2
        weight_filler: { 
             type: "bilinear" 
             value:0.5   #+
        }
      }
    }
....
这是模型的发展趋势

模型的行为似乎没有太大变化

1) 我正在以正确的方式将这些值添加到
重量\u填充物中

2) 我是否应该将解算器中的学习策略从
fixed
更改为
step
,每次减少10倍?这是否有助于解决这个问题

我担心我做了错误的事情,我的模型无法收敛。有人对此有什么建议吗?在培训模式中,我应该考虑哪些重要的事情?我可以对
solver
train\u val
要收敛的模型做什么样的更改

我真的很感谢你的帮助

添加BatchNorm层后的更多详细信息

感谢@Shai和@Jonathan建议添加
batchNorm
层。 我在
reLU
层之前添加了
Batch Normalization层
,这一个示例层:

layer {
  name: "conv1_1"
  type: "Convolution"
  bottom: "data"
  top: "conv1_1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 64
    pad: 100
    kernel_size: 3
    stride: 1
  }
}
layer {
  name: "bn1_1"
  type: "BatchNorm"
  bottom: "conv1_1"
  top: "bn1_1"
  batch_norm_param {
    use_global_stats: false
  }
  param {
    lr_mult: 0
  }
  include {
    phase: TRAIN
  }
}
layer {
  name: "bn1_1"
  type: "BatchNorm"
  bottom: "conv1_1"
  top: "bn1_1"
  batch_norm_param {
    use_global_stats: true
  }
  param {
    lr_mult: 0
  }
  include {
    phase: TEST
  }
}
layer {
  name: "scale1_1"
  type: "Scale"
  bottom: "bn1_1"
  top: "bn1_1"
  scale_param {
     bias_term: true
  }
}
layer {
  name: "relu1_1"
  type: "ReLU"
  bottom: "bn1_1"
  top: "bn1_1"
}
layer {
  name: "conv1_2"
  type: "Convolution"
  bottom: "bn1_1"
  top: "conv1_2"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 64
    pad: 1
    kernel_size: 3
    stride: 1
  }
}
据我在文档中所知,我只能在批量标准化中添加一个参数,而不是三个,因为我有单通道图像。这是我的理解吗?详情如下:

param {
    lr_mult: 0
  }
我是否应该像文档中提到的那样,向缩放层添加更多参数?这些参数在
Scale
层中的含义是什么?比如:

layer { bottom: 'layerx-bn' top: 'layerx-bn' name: 'layerx-bn-scale' type: 'Scale',
  scale_param { 
    bias_term: true
    axis: 1      # scale separately for each channel
    num_axes: 1  # ... but not spatially (default)
    filler { type: 'constant' value: 1 }           # initialize scaling to 1
    bias_filler { type: 'constant' value: 0.001 }  # initialize bias
}}
这是网络的一部分。我不确定我错了/对了多少。我加对了吗? 另一个问题是关于。激活
debug\u info
后,这些日志文件行的含义是什么?
diff
data
是什么意思?为什么值是0?我的网络工作正常吗

 I0123 23:17:49.498327 15230 solver.cpp:228] Iteration 50, loss = 105465
    I0123 23:17:49.498337 15230 solver.cpp:244]     Train net output #0: accuracy = 0.643982
    I0123 23:17:49.498349 15230 solver.cpp:244]     Train net output #1: loss = 105446 (* 1 = 105446 loss)
    I0123 23:17:49.498359 15230 sgd_solver.cpp:106] Iteration 50, lr = 1e-11
    I0123 23:19:12.680325 15230 net.cpp:608]     [Forward] Layer data, top blob data data: 34.8386
    I0123 23:19:12.680615 15230 net.cpp:608]     [Forward] Layer data_data_0_split, top blob data_data_0_split_0 data: 34.8386
    I0123 23:19:12.680670 15230 net.cpp:608]     [Forward] Layer data_data_0_split, top blob data_data_0_split_1 data: 34.8386
    I0123 23:19:12.680778 15230 net.cpp:608]     [Forward] Layer label, top blob label data: 0
    I0123 23:19:12.680829 15230 net.cpp:608]     [Forward] Layer label_label_0_split, top blob label_label_0_split_0 data: 0
    I0123 23:19:12.680896 15230 net.cpp:608]     [Forward] Layer label_label_0_split, top blob label_label_0_split_1 data: 0
    I0123 23:19:12.688591 15230 net.cpp:608]     [Forward] Layer conv1_1, top blob conv1_1 data: 0
    I0123 23:19:12.688695 15230 net.cpp:620]     [Forward] Layer conv1_1, param blob 0 data: 0
    I0123 23:19:12.688742 15230 net.cpp:620]     [Forward] Layer conv1_1, param blob 1 data: 0
    I0123 23:19:12.721791 15230 net.cpp:608]     [Forward] Layer bn1_1, top blob bn1_1 data: 0
    I0123 23:19:12.721853 15230 net.cpp:620]     [Forward] Layer bn1_1, param blob 0 data: 0
    I0123 23:19:12.721890 15230 net.cpp:620]     [Forward] Layer bn1_1, param blob 1 data: 0
    I0123 23:19:12.721901 15230 net.cpp:620]     [Forward] Layer bn1_1, param blob 2 data: 96.1127    
    I0123 23:19:12.996196 15230 net.cpp:620]     [Forward] Layer scale4_1, param blob 0 data: 1
    I0123 23:19:12.996237 15230 net.cpp:620]     [Forward] Layer scale4_1, param blob 1 data: 0
    I0123 23:19:12.996939 15230 net.cpp:608]     [Forward] Layer relu4_1, top blob bn4_1 data: 0
    I0123 23:19:13.012020 15230 net.cpp:608]     [Forward] Layer conv4_2, top blob conv4_2 data: 0
    I0123 23:19:13.012403 15230 net.cpp:620]     [Forward] Layer conv4_2, param blob 0 data: 0
    I0123 23:19:13.012446 15230 net.cpp:620]     [Forward] Layer conv4_2, param blob 1 data: 0
    I0123 23:19:13.015959 15230 net.cpp:608]     [Forward] Layer bn4_2, top blob bn4_2 data: 0
    I0123 23:19:13.016005 15230 net.cpp:620]     [Forward] Layer bn4_2, param blob 0 data: 0
    I0123 23:19:13.016046 15230 net.cpp:620]     [Forward] Layer bn4_2, param blob 1 data: 0
    I0123 23:19:13.016054 15230 net.cpp:620]     [Forward] Layer bn4_2, param blob 2 data: 96.1127
    I0123 23:19:13.017211 15230 net.cpp:608]     [Forward] Layer scale4_2, top blob bn4_2 data: 0
    I0123 23:19:13.017251 15230 net.cpp:620]     [Forward] Layer scale4_2, param blob 0 data: 1
    I0123 23:19:13.017292 15230 net.cpp:620]     [Forward] Layer scale4_2, param blob 1 data: 0
    I0123 23:19:13.017980 15230 net.cpp:608]     [Forward] Layer relu4_2, top blob bn4_2 data: 0
    I0123 23:19:13.032080 15230 net.cpp:608]     [Forward] Layer conv4_3, top blob conv4_3 data: 0
    I0123 23:19:13.032452 15230 net.cpp:620]     [Forward] Layer conv4_3, param blob 0 data: 0
    I0123 23:19:13.032493 15230 net.cpp:620]     [Forward] Layer conv4_3, param blob 1 data: 0
    I0123 23:19:13.036018 15230 net.cpp:608]     [Forward] Layer bn4_3, top blob bn4_3 data: 0
    I0123 23:19:13.036064 15230 net.cpp:620]     [Forward] Layer bn4_3, param blob 0 data: 0
    I0123 23:19:13.036105 15230 net.cpp:620]     [Forward] Layer bn4_3, param blob 1 data: 0
    I0123 23:19:13.036114 15230 net.cpp:620]     [Forward] Layer bn4_3, param blob 2 data: 96.1127
    I0123 23:19:13.038148 15230 net.cpp:608]     [Forward] Layer scale4_3, top blob bn4_3 data: 0
    I0123 23:19:13.038189 15230 net.cpp:620]     [Forward] Layer scale4_3, param blob 0 data: 1
    I0123 23:19:13.038230 15230 net.cpp:620]     [Forward] Layer scale4_3, param blob 1 data: 0
    I0123 23:19:13.038969 15230 net.cpp:608]     [Forward] Layer relu4_3, top blob bn4_3 data: 0
    I0123 23:19:13.039417 15230 net.cpp:608]     [Forward] Layer pool4, top blob pool4 data: 0
    I0123 23:19:13.043354 15230 net.cpp:608]     [Forward] Layer conv5_1, top blob conv5_1 data: 0

    I0123 23:19:13.128515 15230 net.cpp:608]     [Forward] Layer score_fr, top blob score_fr data: 0.000975524
    I0123 23:19:13.128569 15230 net.cpp:620]     [Forward] Layer score_fr, param blob 0 data: 0.0135222
    I0123 23:19:13.128607 15230 net.cpp:620]     [Forward] Layer score_fr, param blob 1 data: 0.000975524
    I0123 23:19:13.129696 15230 net.cpp:608]     [Forward] Layer upscore, top blob upscore data: 0.000790174
    I0123 23:19:13.129734 15230 net.cpp:620]     [Forward] Layer upscore, param blob 0 data: 0.25
    I0123 23:19:13.130656 15230 net.cpp:608]     [Forward] Layer score, top blob score data: 0.000955503
    I0123 23:19:13.130709 15230 net.cpp:608]     [Forward] Layer score_score_0_split, top blob score_score_0_split_0 data: 0.000955503
    I0123 23:19:13.130754 15230 net.cpp:608]     [Forward] Layer score_score_0_split, top blob score_score_0_split_1 data: 0.000955503
    I0123 23:19:13.146767 15230 net.cpp:608]     [Forward] Layer accuracy, top blob accuracy data: 1
    I0123 23:19:13.148967 15230 net.cpp:608]     [Forward] Layer loss, top blob loss data: 105320
    I0123 23:19:13.149173 15230 net.cpp:636]     [Backward] Layer loss, bottom blob score_score_0_split_1 diff: 0.319809
    I0123 23:19:13.149323 15230 net.cpp:636]     [Backward] Layer score_score_0_split, bottom blob score diff: 0.319809
    I0123 23:19:13.150310 15230 net.cpp:636]     [Backward] Layer score, bottom blob upscore diff: 0.204677
    I0123 23:19:13.152452 15230 net.cpp:636]     [Backward] Layer upscore, bottom blob score_fr diff: 253.442
    I0123 23:19:13.153218 15230 net.cpp:636]     [Backward] Layer score_fr, bottom blob bn7 diff: 9.20469
    I0123 23:19:13.153254 15230 net.cpp:647]     [Backward] Layer score_fr, param blob 0 diff: 0
    I0123 23:19:13.153291 15230 net.cpp:647]     [Backward] Layer score_fr, param blob 1 diff: 20528.8
    I0123 23:19:13.153420 15230 net.cpp:636]     [Backward] Layer drop7, bottom blob bn7 diff: 9.21666
    I0123 23:19:13.153554 15230 net.cpp:636]     [Backward] Layer relu7, bottom blob bn7 diff: 0
    I0123 23:19:13.153856 15230 net.cpp:636]     [Backward] Layer scale7, bottom blob bn7 diff: 0
   E0123 23:19:14.382714 15230 net.cpp:736]     [Backward] All net params (data, diff): L1 norm = (19254.6, 102644); L2 norm = (391.485, 57379.6)

我真的很感激如果有人知道,请在这里分享想法/链接/资源。再次感谢

我不希望改变偏差值来帮助训练。我要做的第一件事就是降低学习率。您可以通过重新训练已达到平台的权重并使用具有较低基准的解算器来手动执行此操作。或者您可以更改solver.prototxt以使用不同的更新策略。您可以将方法设置为步骤,也可以使用更新策略,如Adam。见:

同样,添加
“BatchNorm”
层应该会有所帮助。批量标准化类似于“白化”/“标准化输入数据”,但适用于中间层。关于批量标准化的论文已经发表了

您还应该保留一些数据以供验证。仅看培训损失可能会产生误导。

关于
“BatchNorm”
参数:
该层有三个内部参数:(0)平均值,(1)方差和(2)移动平均因子,与通道数或blob形状无关。因此,如果您希望显式设置
lr\u mult
,则需要为这三个变量定义它

关于日志中的零:
请说明如何阅读caffe的调试日志。
看起来您是从头开始训练模型(而不是微调),并且所有权重都设置为零。这是一个非常糟糕的初始化策略。

请考虑定义<代码>填充> <代码> >和<代码> BiasyFult s,用于初始化权值。

imHo,我不认为添加<代码> BasasyTime< /C> > <代码>“反褶积”Lead是个好主意。你的模型里有吗?您在线性单元上使用的激活方式是什么,
“ReLU”
?您是否尝试过设置和检查调试日志?@Shai非常感谢您的编辑和建议。不,我没有对FCN模型做太多更改,因为我对层及其体系结构不够熟悉。您是对的,更改
bias\u term
并没有改变输出上的任何内容。请问
BatchNorm
图层的确切用途是什么?它是在规范化数据吗?在创建LMDB数据库之前,我已经在0-1之间对图像进行了标准化,这是否有必要?另一个问题是,我们有多大的自由度来更改模型的体系结构?@S.EB我在回答中添加了一些关于批量标准化的信息。另见。如果您还有其他问题,您可能需要单独提问。@Shai非常感谢您的评论您的问题变得非常大。你可以考虑问几个更小的更集中的问题吗?谢谢你的帮助,我会尝试,我会编辑并把结果带到这里。