Machine learning Caffe：使用刻度层添加Softmax温度_Machine Learning_Neural Network_Computer Vision_Caffe_Conv Neural Network

Machine learning Caffe：使用刻度层添加Softmax温度

machine-learning neural-network computer-vision

Machine learning Caffe：使用刻度层添加Softmax温度,machine-learning,neural-network,computer-vision,caffe,conv-neural-network,Machine Learning,Neural Network,Computer Vision,Caffe,Conv Neural Network,我正在尝试使用“温度”参数实现Caffe Softmax层。我正在利用概述的蒸馏技术实现一个网络基本上，我希望我的Softmax层利用Softmax w/温度功能，如下所示： F(X) = exp(zi(X)/T) / sum(exp(zl(X)/T)) 使用此功能，我希望能够在训练前调整温度T。我发现了一个类似的问题，但这个问题是试图在部署网络上实现带温度的Softmax。我正在努力实现第一个答案中描述为“选项4”的附加缩放层我正在使用Caffe的examples目录中的prototxt

我正在尝试使用“温度”参数实现Caffe Softmax层。我正在利用概述的蒸馏技术实现一个网络

基本上，我希望我的Softmax层利用Softmax w/温度功能，如下所示：

F(X) = exp(zi(X)/T) / sum(exp(zl(X)/T))

使用此功能，我希望能够在训练前调整温度

。我发现了一个类似的问题，但这个问题是试图在部署网络上实现带温度的Softmax。我正在努力实现第一个答案中描述为“选项4”的附加缩放层

我正在使用Caffe的examples目录中的prototxt文件。我已尝试进行以下更改：

原创

...
...
...
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "ip1"
  bottom: "label"
  top: "accuracy"
  include {
    phase: TEST
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "ip1"
  bottom: "label"
  top: "loss"
}

已修改

...
...
...
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "ip1"
  bottom: "label"
  top: "accuracy"
  include {
    phase: TEST
  }
}
layer {
  type: "Scale"
  name: "temperature"
  top: "zi/T"
  bottom: "ip1"
  scale_param {
    filler: { type: 'constant' value: 0.025 } ### I wanted T = 40, so 1/40=.025
  }
  param { lr_mult: 0 decay_mult: 0 }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "ip1"
  bottom: "label"
  top: "loss"
}

在快速训练（5000次迭代）之后，我检查了一下我的分类概率是否更均匀，但它们实际上似乎分布不均匀

例如：

高温T:F（X）=[0.2,0.5,0.1,0.2]

低温温度T:F（X）=[0.02,0.95,0.01,0.02]

~z~我的尝试：F（X）=[0,1.0,0,0]

我的实现是否正确？不管怎样，我遗漏了什么？

您没有使用“冷却”预测

“zi/T”

您的

“缩放”

层生成

layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "zi/T"  # Use the "cooled" predictions instead of the originals.
  bottom: "label"
  top: "loss"
}

接受的答案帮助我理解了我对Softmax温度实现的误解

正如@Shai所指出的，为了观察我所期望的“冷却”概率输出，

Scale

层必须只添加到“deploy”prototxt文件中。在train/val协议中根本不需要包含

比例

层。换句话说，温度必须应用于

Softmax

层，而不是

SoftmaxWithLoss

层

如果要将“冷却”效应应用于概率向量，只需确保最后两层为：

deploy.prototxt

layer {
  type: "Scale"
  name: "temperature"
  top: "zi/T"
  bottom: "ip1"
  scale_param {
    filler: { type: 'constant' value: 1/T } ## Replace "1/T" with actual 1/T value
  }
  param { lr_mult: 0 decay_mult: 0 }
}
layer {
  name: "prob"
  type: "Softmax"
  bottom: "zi/T"
  top: "prob"
}

我的困惑主要是因为我误解了Softmax WithLoss和Softmax之间的区别。

感谢您指出这一点！然而，我的产出并不是我想要的。在部署校正后的网络时，我对测试集中每个图像的前1类预测的概率都是1.0，其他类的概率显然是0.0。我的缩放层实现是否和我认为的一样？我希望我的概率向量在类之间更“均匀地分布”。@Mink您在这里展示的prototxt不输出类概率，而是一个标量损失。您可能有相应的

“Softmax”

层（而不是

“SoftmaxWithLoss”

）。在这种情况下，请确保此

“Softmax”

层的

“底部”

也是

“zi/T”

，而不是

“ip1”

。