Performance Tensorflow:调整网络时参数不更新

Performance Tensorflow:调整网络时参数不更新,performance,parameters,tensorflow,batch-updates,Performance,Parameters,Tensorflow,Batch Updates,我想分两步实施我的项目:1。使用一些数据对网络进行训练;2.使用其他数据旋转经过训练的网络 对于网络培训的第一步,我取得了不错的成绩。但是,对于网络的第二步,出现了一个问题:参数不更新。详情如下: 我的损失包括两件事:1。我的项目的正常成本。2.L2正则化项。如下: c1 = y_conv - y_ c2 = tf.square(c1) c3 = tf.reduce_sum(c2,1) c4 = tf.sqrt(c3) cost = tf.reduce_mean(c4) regular = 0.

我想分两步实施我的项目:1。使用一些数据对网络进行训练;2.使用其他数据旋转经过训练的网络

对于网络培训的第一步,我取得了不错的成绩。但是,对于网络的第二步,出现了一个问题:参数不更新。详情如下:

我的损失包括两件事:1。我的项目的正常成本。2.L2正则化项。如下:

c1 = y_conv - y_
c2 = tf.square(c1)
c3 = tf.reduce_sum(c2,1)
c4 = tf.sqrt(c3)
cost = tf.reduce_mean(c4)
regular = 0.0001*( tf.nn.l2_loss(w_conv1) + tf.nn.l2_loss(b_conv1) +\
              tf.nn.l2_loss(w_conv2) + tf.nn.l2_loss(b_conv2) +\
              tf.nn.l2_loss(w_conv3) + tf.nn.l2_loss(b_conv3) +\
              tf.nn.l2_loss(w_conv4) + tf.nn.l2_loss(b_conv4) +\
              tf.nn.l2_loss(w_fc1)   + tf.nn.l2_loss(b_fc1) +\
              tf.nn.l2_loss(w_fc2)   + tf.nn.l2_loss(b_fc2) )
loss = regular + cost
调整网络时,我会打印损耗、成本和L2项目:

Epoch:     1 || loss = 0.184248179 || cost = 0.181599200 || regular = 0.002648979
Epoch:     2 || loss = 0.184086733 || cost = 0.181437753 || regular = 0.002648979
Epoch:     3 || loss = 0.184602532 || cost = 0.181953552 || regular = 0.002648979
Epoch:     4 || loss = 0.184308948 || cost = 0.181659969 || regular = 0.002648979
Epoch:     5 || loss = 0.184251788 || cost = 0.181602808 || regular = 0.002648979
Epoch:     6 || loss = 0.184105504 || cost = 0.181456525 || regular = 0.002648979
Epoch:     7 || loss = 0.184241678 || cost = 0.181592699 || regular = 0.002648979
Epoch:     8 || loss = 0.184189570 || cost = 0.181540590 || regular = 0.002648979
Epoch:     9 || loss = 0.184390061 || cost = 0.181741081 || regular = 0.002648979
Epoch:    10 || loss = 0.184064055 || cost = 0.181415075 || regular = 0.002648979
Epoch:    11 || loss = 0.184323867 || cost = 0.181674888 || regular = 0.002648979
Epoch:    12 || loss = 0.184519534 || cost = 0.181870555 || regular = 0.002648979
Epoch:    13 || loss = 0.183869445 || cost = 0.181220466 || regular = 0.002648979
Epoch:    14 || loss = 0.184313927 || cost = 0.181664948 || regular = 0.002648979
Epoch:    15 || loss = 0.184198738 || cost = 0.181549759 || regular = 0.002648979
如我们所见,L2项目不更新,但成本和损失更新。为了检查网络参数是否更新,我计算值:

gs, lr, solver, l, c, r, pY, bconv1 = sess.run([global_step, learning_rate, train, loss, cost, regular, y_conv, b_conv1], feed_dict={x: batch_X, y_: batch_Y, keep_prob:0.5})
所以bconv1是一部分参数,我确认bconv1不会在两个历元之间更新。 我很困惑,为什么成本/损失更新,而网络参数不更新

除了CNN层之外,整个代码是:

c1 = y_conv - y_
c2 = tf.square(c1)
c3 = tf.reduce_sum(c2,1)
c4 = tf.sqrt(c3)
cost = tf.reduce_mean(c4)

regular = 0.0001*( tf.nn.l2_loss(w_conv1) + tf.nn.l2_loss(b_conv1) +\
              tf.nn.l2_loss(w_conv2) + tf.nn.l2_loss(b_conv2) +\
              tf.nn.l2_loss(w_conv3) + tf.nn.l2_loss(b_conv3) +\
              tf.nn.l2_loss(w_conv4) + tf.nn.l2_loss(b_conv4) +\
              tf.nn.l2_loss(w_fc1)   + tf.nn.l2_loss(b_fc1) +\
              tf.nn.l2_loss(w_fc2)   + tf.nn.l2_loss(b_fc2) )
loss = regular + cost
global_step = tf.Variable(0, trainable=False)
initial_learning_rate = 0.001 

learning_rate = tf.train.exponential_decay(initial_learning_rate,
                                           global_step=global_step,
                                           decay_steps=int( X.shape[0]/1000 ),decay_rate=0.99, staircase=True)

train = tf.train.AdamOptimizer(learning_rate).minimize(loss,global_step=global_step)
batch_size = 1000
init = tf.initialize_all_variables()
saver = tf.train.Saver()
sess = tf.Session()
sess.run(init)
saver.restore(sess,'../TrainingData/convParameters.ckpt')
total_batch = int( X.shape[0]/batch_size )
for epoch in range(1000):
    for i in range(total_batch):
        batch_X = X[i*batch_size:(i+1)*batch_size]
        batch_Y = Y[i*batch_size:(i+1)*batch_size]
        gs, lr, solver, l, c, r, pY, bconv1 = sess.run([global_step, learning_rate, train, loss, cost, regular, y_conv, b_conv1], feed_dict={x: batch_X, y_: batch_Y, keep_prob:0.5})

    print("Epoch: %5d || loss = %.9f || cost = %.9f || regular = %.9f"%(epoch+1,L/total_batch,Mcost/total_batch,Reg/total_batch))
任何建议对我都很重要。先谢谢你


张强

事实上,我以为我能解决这个问题,但我没有。我只知道是什么导致了这个错误。 参数不更新的原因是预训练后全局_步长非常大,因此1e-24左右的学习率非常小。所以,我应该做的是在恢复网络参数后将全局_步长设置为0。此外,还应重新设定学习率

代码应该如下所示:

saver.restore(sess,'../TrainingData/convParameters.ckpt')
global_step = tf.Variable(0, trainable=False) 
learning_rate = tf.train.exponential_decay(initial_learning_rate,
                                           global_step=global_step,
                                           decay_steps=int( X.shape[0]/1000 ),decay_rate=0.99, staircase=True)
然后,您可以获取全局_步长和学习率的值,检查是否正常:

gafter,lrafter = sess.run([global_step,learning_rate])
必须在恢复网络参数后执行此操作

我想我通过上面的代码解决了这个bug。但是,全局_步骤在培训时不会更新

我所做的是:

重置优化器,如下所示:

全局_步长=tf.Variable0,可训练=False 学习率=tf.train.decayin指数初始学习率, 全局步=全局步, 衰减步长=整数X.shape[0]/1000,衰减速率=0.99,阶梯=真 train=tf.train.AdamOptimizerlearning\u rate.minimizeloss,global\u step=global\u step 全局步骤初始化=tf.初始化变量[全局步骤] sess.runglobal\u step\u init 但是有人告诉我我正在使用未初始化的变量

初始化优化器:

全局步骤初始化=tf.初始化变量[全局步骤,训练]

我被告知火车不能初始化

我太累了。最后,我放弃了。我只是将学习率设置为一个占位符,如下所示:

如果有人有办法,请告诉我。非常感谢