tensorflow CNN mnist示例训练精度意外地从1下降到0.06，迭代次数较大_Tensorflow_Conv Neural Network_Gradient Descent_Mnist

tensorflow CNN mnist示例训练精度意外地从1下降到0.06，迭代次数较大

tensorflow

tensorflow CNN mnist示例训练精度意外地从1下降到0.06，迭代次数较大,tensorflow,conv-neural-network,gradient-descent,mnist,Tensorflow,Conv Neural Network,Gradient Descent,Mnist,在26700次迭代后，训练精度意外地从1降至0.06。代码来自tensorflow的在线文档，我只是简单地将过滤器大小从5x5修改为3x3，迭代次数从20000修改为100000，批量大小从50修改为100。有人能解释这一点吗？它可能与AdamOptimizer有关，因为如果将它更改为GradientDeSentoOptimizer，它甚至不会发生56200次迭代。但我不确定。GradientDesentOptimizer也有这个问题 step 26400, training accuracy

在26700次迭代后，训练精度意外地从1降至0.06。代码来自tensorflow的在线文档，我只是简单地将过滤器大小从5x5修改为3x3，迭代次数从20000修改为100000，批量大小从50修改为100。有人能解释这一点吗？它可能与AdamOptimizer有关，因为如果将它更改为GradientDeSentoOptimizer，它甚至不会发生56200次迭代。但我不确定。GradientDesentOptimizer也有这个问题

step 26400, training accuracy 1, loss 0.00202696
step 26500, training accuracy 1, loss 0.0750173
step 26600, training accuracy 1, loss 0.0790716
step 26700, training accuracy 1, loss 0.0136688
step 26800, training accuracy 0.06, loss nan
step 26900, training accuracy 0.03, loss nan
step 27000, training accuracy 0.12, loss nan
step 27100, training accuracy 0.08, loss nan

python代码：

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

def weight_varible(shape):
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial)

def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)

def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')


mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
print("Download Done!")

sess = tf.InteractiveSession()

# paras
W_conv1 = weight_varible([3, 3, 1, 32])
b_conv1 = bias_variable([32])

# conv layer-1
x = tf.placeholder(tf.float32, [None, 784])
x_image = tf.reshape(x, [-1, 28, 28, 1])

h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

# conv layer-2
W_conv2 = weight_varible([3, 3, 32, 64])
b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

# full connection
W_fc1 = weight_varible([7 * 7 * 64, 1204])
b_fc1 = bias_variable([1204])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

# dropout
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

# output layer: softmax
W_fc2 = weight_varible([1204, 10])
b_fc2 = bias_variable([10])

y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
y_ = tf.placeholder(tf.float32, [None, 10])

# model training
cross_entropy = -tf.reduce_sum(y_ * tf.log(y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)

correct_prediction = tf.equal(tf.arg_max(y_conv, 1), tf.arg_max(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

saver = tf.train.Saver()
sess.run(tf.initialize_all_variables())
for i in range(100000):
    batch = mnist.train.next_batch(100)

    if i % 10 == 0:
        train_accuacy = accuracy.eval(feed_dict={x: batch[0], y_: batch[1], keep_prob: 1.0})
        train_cross_entropy = cross_entropy.eval(feed_dict={x: batch[0], y_: batch[1], keep_prob: 1.0})
        print("step %d, training accuracy %g, loss %g"%(i, train_accuacy, train_cross_entropy))
    train_step.run(feed_dict = {x: batch[0], y_: batch[1], keep_prob: 0.5})

# accuacy on test
save_path = saver.save(sess, "./mnist.model")
#saver.restore(sess,"./mnist.model")
print("Model saved in file: %s" % save_path)
print("test accuracy %g"%(accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0})))

事实上，我刚刚在接受CNN培训时遇到了这个问题，经过一点优化后，一切都会好起来。我认为正在发生的是成本函数中日志的数值稳定性问题。当网络开始以高置信度进行预测时（意味着随着网络训练和实现较低成本，此问题变得更可能），y_conv向量将看起来像

y_conv=[1,0]

（忽略批处理）。这意味着log of

log（y_conv）=log（[1,0]）=[0，-inf]

。假设[1，0]也是正确的，所以当你做

y_*tf.log（y_conv）

时，你实际上是在做

[1，0]*[0，-inf]=[0，nan]

，因为它不知道如何将0和无穷相乘。添加这些成本将导致nan成本。我认为可以通过在日志中添加一些小的epilon来解决这个问题，比如

y_*tf.log（y_conv+1e-5）

。我似乎已经用

tf.nn.sparse\u softmax\u cross\u entropy\u和\u logits（…）

解决了我的问题，它似乎解决了数字问题。

thx。1.softmax_cross_entropy_与_logits（）配合使用效果良好。经过50000次迭代后，我得到了99.3%的测试准确率。2.请注意，在计算测试集的精度时，内存消耗非常大（约5GB）。