Machine learning 如何在CAFFE中编写满足给定条件的solver.prototxt？_Machine Learning_Neural Network_Deep Learning_Caffe

Machine learning 如何在CAFFE中编写满足给定条件的solver.prototxt？

machine-learning neural-network deep-learning

Machine learning 如何在CAFFE中编写满足给定条件的solver.prototxt？,machine-learning,neural-network,deep-learning,caffe,Machine Learning,Neural Network,Deep Learning,Caffe,我正在编写solver.prototxt，它遵循论文的规则在培训阶段，最初将学习率设置为0.001，当损失停止减少至10时，学习率降低了10倍−7.折扣权重最初设置为1，每一万次迭代减少10倍，直到边际值为10−三, 请注意，折扣重量是Caffe中的损耗重量。基于以上信息，我将我的解算器编写为 train_net: "train.prototxt" lr_policy: "step" gamma: 0.1 stepsize: 10000 base_lr: 0.001 #0.002 在trai

我正在编写solver.prototxt，它遵循论文的规则

在培训阶段，最初将学习率设置为0.001，当损失停止减少至10时，学习率降低了10倍−7.折扣权重最初设置为1，每一万次迭代减少10倍，直到边际值为10−三,

请注意，折扣重量是Caffe中的损耗重量。基于以上信息，我将我的解算器编写为

train_net: "train.prototxt"
lr_policy: "step"
gamma: 0.1
stepsize: 10000
base_lr: 0.001 #0.002

在train.prototxt中，我还设置了

layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "deconv"
  bottom: "label"
  top: "loss"
  loss_weight: 1
}

然而，我仍然不知道如何设置解算器以满足当损失停止减少到10时减少了10倍的规则−7，每一万次迭代减少10倍，直到边际值为10−3.我没有发现任何caffe规则可以作为参考：

// The learning rate decay policy. The currently implemented learning rate
// policies are as follows:
//    - fixed: always return base_lr.
//    - step: return base_lr * gamma ^ (floor(iter / step))
//    - exp: return base_lr * gamma ^ iter
//    - inv: return base_lr * (1 + gamma * iter) ^ (- power)
//    - multistep: similar to step but it allows non uniform steps defined by
//      stepvalue
//    - poly: the effective learning rate follows a polynomial decay, to be
//      zero by the max_iter. return base_lr (1 - iter/max_iter) ^ (power)
//    - sigmoid: the effective learning rate follows a sigmod decay
//      return base_lr ( 1/(1 + exp(-gamma * (iter - stepsize))))
//
// where base_lr, max_iter, gamma, step, stepvalue and power are defined
// in the solver parameter protocol buffer, and iter is the current iteration.

如果有人知道这一点，请给我一些指导来编写solver.prototxt以满足上述条件

学习率降低

部分问题是，当损失停止减少到10e时，相位减少了10倍−7不太有意义。我认为，也许，作者试图说，每次损失减少时，他们都会将学习率降低10倍，直到学习率达到10e-7

如果是这样，那么这是一个手动过程，而不是您可以使用Caffe参数选择的。最重要的是，当损失停止减少是一个不平凡的判断，虽然长基移动平均线将给你一个很好的指示。我希望作者手动完成这项工作，从检查点停止并重新启动培训

使用步骤的“学习速率衰减”策略可以获得类似的效果：将gamma设置为0.1，并将步骤参数设置得足够高，以确保在每次速率降低之前训练已趋于平稳。这将浪费一些计算机时间，但可能会节省您的全部麻烦

折扣重量

在Caffe中，损失权重只是模型中各种损失之间的相对权重，用于实现最终损失统计的线性因素。Caffe不提供重量的运行时更改。也许这是作者用手调整的另一个东西

我试着阅读报纸上关于折扣重量的两个参考文献，但发现阅读起来很困难。我会等到有人校对和编辑那篇论文，以确保语法和清晰度。同时，我希望这个答案对你有所帮助

您可以找到更多的信息。

谢谢您。这很有帮助。你也可以看看作者关于折扣权重的另一篇文章，他在《我看到了》中也使用了它；谢谢你的推荐。这门课的英语稍微好一点。您无法从solver.prototxt动态调整这些权重。但是，如果你想增强代码并将其提交给BVLC，我相信他们会很好地考虑你添加的功能。还有一件事，你说我们可以在训练期间手动调整体重。我们怎么做？因为我们只能在每个快照之后保存solverstate和caffemodel文件。对。运行10000次迭代，拍摄快照，然后停止训练。更改solver.prototxt并从该快照重新启动。简而言之，这是一个乏味的微调过程。