Machine learning prototxt中的Caffe-num_输出给出了奇怪的行为

Machine learning prototxt中的Caffe-num_输出给出了奇怪的行为,machine-learning,neural-network,deep-learning,caffe,Machine Learning,Neural Network,Deep Learning,Caffe,我正在做一些实验,将Cifar-10数据集分成两半,每一半包含五个随机类。我在bvlc_alexnet体系结构的一半训练。因此,我将num_输出更改为5,并对网络进行了一些其他小调整。当我检查日志文件时,我发现损失增加到80左右,测试精度为0 然而,当我将num_输出更改为10时,训练似乎正常,即损失稳步减少,测试准确率约为70% 这怎么解释呢 train_val.txt 此拆分包含类0、4、5、6和8。我使用脚本创建了lmdb文件 train.txt的示例 val.txt的示例 正如在评论中指

我正在做一些实验,将Cifar-10数据集分成两半,每一半包含五个随机类。我在bvlc_alexnet体系结构的一半训练。因此,我将num_输出更改为5,并对网络进行了一些其他小调整。当我检查日志文件时,我发现损失增加到80左右,测试精度为0

然而,当我将num_输出更改为10时,训练似乎正常,即损失稳步减少,测试准确率约为70%

这怎么解释呢

train_val.txt

此拆分包含类0、4、5、6和8。我使用脚本创建了lmdb文件

train.txt的示例

val.txt的示例


正如在评论中指出的,Caffe希望标签是介于0和num_类-1之间的整数。在您的例子中,当您将标签数设置为5时,Caffe将在最后一层中创建五个输出神经元。当你要求它预测6类或8类时,你要求它最大化一个不存在的神经元的输出,这显然是Caffe无法做到的


现在,当您重新标记数据并将num_classes设置为5时,您所做的是正确的,因此它是有效的。当你将num_classes设置为10时,网络仍然可以工作,因为现在它有10个输出神经元,足以对5个类进行分类。它将了解到从5到9的类永远不存在,因此永远不应该被预测,它将以一种总是会导致这些输出神经元返回非常小的值的方式调整权重。然而,需要注意的是,神经网络自然是随机的,因此它可能偶尔会返回一个从未呈现给它的类,因此我认为num_类大于实际类数的神经网络的性能比具有正确num_类的神经网络差。

根据您的描述,还不清楚,但是我假设训练分区和验证分区包含10个类中的5个,对吗?如果您的分区逻辑不好,并且您在验证测试中有来自培训中未看到的类的样本,那么您将得到一个较低的测试错误。如果在分区之间进行类划分,则测试错误的准确率将为100%0。感谢您的回复。正确,培训和验证包含相同的课程。我唯一改变的是num_输出从5变为10。这可能是Caffe中的错误吗?当您将训练测试用作测试图像集时,准确度如何。对于标签5-9的拆分,您是否将标签更改为0-4,还是将标签保持不变?@AnoopK.Prabhu当我将训练数据用作测试数据时,Caffe似乎在这一步冻结了I0401 13:03:42.787312 24045 net.cpp:411]data->label我用num_输出5和10都做了。这非常有用,谢谢!文档中是否也描述了这一点?
name: "AlexNet"
layer {
  name: "data"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    mirror: true
    crop_size: 25

  }
  data_param {
    source: "/home/apples/caffe/cifar/cifarA/cifar_A_train_lmdb"
    batch_size: 256
    backend: LMDB
  }
}
layer {
  name: "data"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TEST
  }
  transform_param {
    mirror: false
    crop_size: 25

  }
  data_param {
    source: "/home/apples/caffe/cifar/cifarA/cifar_A_val_lmdb"
    batch_size: 100
    backend: LMDB
  }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 96
    kernel_size: 11
    stride: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "conv1"
  top: "conv1"
}
layer {
  name: "norm1"
  type: "LRN"
  bottom: "conv1"
  top: "norm1"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "norm1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 2
    kernel_size: 5
    group: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "relu2"
  type: "ReLU"
  bottom: "conv2"
  top: "conv2"
}
layer {
  name: "norm2"
  type: "LRN"
  bottom: "conv2"
  top: "norm2"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "norm2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "conv3"
  type: "Convolution"
  bottom: "pool2"
  top: "conv3"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "relu3"
  type: "ReLU"
  bottom: "conv3"
  top: "conv3"
}
layer {
  name: "conv4"
  type: "Convolution"
  bottom: "conv3"
  top: "conv4"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
    group: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "relu4"
  type: "ReLU"
  bottom: "conv4"
  top: "conv4"
}
layer {
  name: "conv5"
  type: "Convolution"
  bottom: "conv4"
  top: "conv5"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    group: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "relu5"
  type: "ReLU"
  bottom: "conv5"
  top: "conv5"
}
layer {
  name: "pool5"
  type: "Pooling"
  bottom: "conv5"
  top: "pool5"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "fc6"
  type: "InnerProduct"
  bottom: "pool5"
  top: "fc6"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 4096
    weight_filler {
      type: "gaussian"
      std: 0.005
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "relu6"
  type: "ReLU"
  bottom: "fc6"
  top: "fc6"
}
layer {
  name: "drop6"
  type: "Dropout"
  bottom: "fc6"
  top: "fc6"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc7"
  type: "InnerProduct"
  bottom: "fc6"
  top: "fc7"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 4096
    weight_filler {
      type: "gaussian"
      std: 0.005
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "relu7"
  type: "ReLU"
  bottom: "fc7"
  top: "fc7"
}
layer {
  name: "drop7"
  type: "Dropout"
  bottom: "fc7"
  top: "fc7"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc8_mnist"
  type: "InnerProduct"
  bottom: "fc7"
  top: "fc8_mnist"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 5
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "fc8_mnist"
  bottom: "label"
  top: "accuracy"
  include {
    phase: TEST
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "fc8_mnist"
  bottom: "label"
  top: "loss"
}
0/attack_aircraft_s_001759.png 0
0/propeller_plane_s_001689.png 0
4/fallow_deer_s_000021.png 4
4/alces_alces_s_000686.png 4
5/toy_spaniel_s_000327.png 5
5/toy_spaniel_s_000511.png 5
6/bufo_viridis_s_000502.png 6
6/bufo_viridis_s_001005.png 6
8/passenger_ship_s_000236.png 8
8/passenger_ship_s_000853.png 8
0/attack_aircraft_s_000002.png 0
0/propeller_plane_s_000006.png 0
4/fallow_deer_s_000001.png 4
4/alces_alces_s_000012.png 4
5/toy_spaniel_s_000020.png 5
6/bufo_viridis_s_000016.png 6
8/passenger_ship_s_000060.png 8