Deep learning 损耗增加微调caffe
我有一个280类的分类问题,有278000张图像。 我使用quick_solver.txt基于模型GoogleNet(caffe中的bvlc_GoogleNet)进行微调。 我的解决方案如下:Deep learning 损耗增加微调caffe,deep-learning,caffe,loss,Deep Learning,Caffe,Loss,我有一个280类的分类问题,有278000张图像。 我使用quick_solver.txt基于模型GoogleNet(caffe中的bvlc_GoogleNet)进行微调。 我的解决方案如下: test_iter: 1000 test_interval: 4000 test_initialization: false display: 40 average_loss: 40 base_lr: 0.001 lr_policy: "poly" power: 0.5 max_iter: 800000
test_iter: 1000
test_interval: 4000
test_initialization: false
display: 40
average_loss: 40
base_lr: 0.001
lr_policy: "poly"
power: 0.5
max_iter: 800000
momentum: 0.9
weight_decay: 0.0002
snapshot: 20000
在培训期间,我使用了32号批次,测试批次也是32号批次。我只是重新学习了三层loss1/分类器loss2/分类器和loss3/分类器,通过重命名它们。我将全局学习率设置为0.001,即比从头开始的培训中使用的学习率低10倍。但最后三层的学习率仍为0.01
第一次迭代的日志文件:
I0515 08:44:41.838122 1279 solver.cpp:228] Iteration 40, loss = 9.72169
I0515 08:44:41.838163 1279 solver.cpp:244] Train net output #0: loss1/loss1 = 5.7261 (* 0.3 = 1.71783 loss)
I0515 08:44:41.838170 1279 solver.cpp:244] Train net output #1: loss2/loss1 = 5.65961 (* 0.3 = 1.69788 loss)
I0515 08:44:41.838173 1279 solver.cpp:244] Train net output #2: loss3/loss3 = 5.46685 (* 1 = 5.46685 loss)
I0515 08:44:41.838179 1279 sgd_solver.cpp:106] Iteration 40, lr = 0.000999975
在第100000次迭代之前,我的网络获得50%的top-1精度和~80%的top-5精度:
I0515 13:45:59.789113 1279 solver.cpp:337] Iteration 100000, Testing net (#0)
I0515 13:46:53.914217 1279 solver.cpp:404] Test net output #0: loss1/loss1 = 2.08631 (* 0.3 = 0.625893 loss)
I0515 13:46:53.914274 1279 solver.cpp:404] Test net output #1: loss1/top-1 = 0.458375
I0515 13:46:53.914279 1279 solver.cpp:404] Test net output #2: loss1/top-5 = 0.768781
I0515 13:46:53.914284 1279 solver.cpp:404] Test net output #3: loss2/loss1 = 1.88489 (* 0.3 = 0.565468 loss)
I0515 13:46:53.914288 1279 solver.cpp:404] Test net output #4: loss2/top-1 = 0.494906
I0515 13:46:53.914290 1279 solver.cpp:404] Test net output #5: loss2/top-5 = 0.805906
I0515 13:46:53.914294 1279 solver.cpp:404] Test net output #6: loss3/loss3 = 1.77118 (* 1 = 1.77118 loss)
I0515 13:46:53.914297 1279 solver.cpp:404] Test net output #7: loss3/top-1 = 0.517719
I0515 13:46:53.914299 1279 solver.cpp:404] Test net output #8: loss3/top-5 = 0.827125
在第119,00次迭代时,一切仍然正常
I0515 14:43:38.669674 1279 solver.cpp:228] Iteration 119000, loss = 2.70265
I0515 14:43:38.669777 1279 solver.cpp:244] Train net output #0: loss1/loss1 = 2.41406 (* 0.3 = 0.724217 loss)
I0515 14:43:38.669783 1279 solver.cpp:244] Train net output #1: loss2/loss1 = 2.38374 (* 0.3 = 0.715123 loss)
I0515 14:43:38.669787 1279 solver.cpp:244] Train net output #2: loss3/loss3 = 1.92663 (* 1 = 1.92663 loss)
I0515 14:43:38.669798 1279 sgd_solver.cpp:106] Iteration 119000, lr = 0.000922632
紧接着损失突然增加,即等于初始损失(从8到9)
而网络无法在突然变化发生后的很长时间内减少这种损失
I0515 16:51:10.485610 1279 solver.cpp:228] Iteration 161040, loss = 9.01994
I0515 16:51:10.485649 1279 solver.cpp:244] Train net output #0: loss1/loss1 = 5.63485 (* 0.3 = 1.69046 loss)
I0515 16:51:10.485656 1279 solver.cpp:244] Train net output #1: loss2/loss1 = 5.63484 (* 0.3 = 1.69045 loss)
I0515 16:51:10.485661 1279 solver.cpp:244] Train net output #2: loss3/loss3 = 5.62972 (* 1 = 5.62972 loss)
I0515 16:51:10.485666 1279 sgd_solver.cpp:106] Iteration 161040, lr = 0.0008937
我重新运行了两次实验,它只在第119040次迭代时重复。为了进一步了解,我在创建LMDB数据库时进行了数据洗牌。我使用这个数据库来训练VGG-16(步进学习速率策略,最大80k迭代,每步20k ITER),没有任何问题。使用VGG,我获得55%的top-1精度
有人遇到过与我类似的问题吗?可能会降低学习速度。但在最初的100k迭代中,网络学习非常好,对于fc层,lr为0.01,对于其他层,lr为0.001。如果lr太高,它将无法从一开始就学到任何东西。
I0515 16:51:10.485610 1279 solver.cpp:228] Iteration 161040, loss = 9.01994
I0515 16:51:10.485649 1279 solver.cpp:244] Train net output #0: loss1/loss1 = 5.63485 (* 0.3 = 1.69046 loss)
I0515 16:51:10.485656 1279 solver.cpp:244] Train net output #1: loss2/loss1 = 5.63484 (* 0.3 = 1.69045 loss)
I0515 16:51:10.485661 1279 solver.cpp:244] Train net output #2: loss3/loss3 = 5.62972 (* 1 = 5.62972 loss)
I0515 16:51:10.485666 1279 sgd_solver.cpp:106] Iteration 161040, lr = 0.0008937