Tensorflow指数衰减学习率似乎不起作用

Tensorflow指数衰减学习率似乎不起作用,tensorflow,machine-learning,optimization,neural-network,Tensorflow,Machine Learning,Optimization,Neural Network,我用tensorflow构建了一个神经网络,它如下所示: n_hidden = 32 steps = 10**5*4 decay_rate = 1e-4 initial_lr = 1e-3 tf.reset_default_graph() g = tf.Graph() dropout_rate = tf.placeholder_with_default(0.2, (), name='dropout') curr_step = tf.placeholder_with_default(1, ()

我用tensorflow构建了一个神经网络,它如下所示:

n_hidden = 32
steps = 10**5*4
decay_rate = 1e-4
initial_lr = 1e-3

tf.reset_default_graph()
g = tf.Graph()

dropout_rate = tf.placeholder_with_default(0.2, (), name='dropout')
curr_step = tf.placeholder_with_default(1, (), name='current_step')
learning_rate = tf.train.exponential_decay(initial_lr, global_step=curr_step, decay_steps=steps,
                                           decay_rate=decay_rate, name='learning_rate')

X_tensor = tf.placeholder(tf.float32, shape=[None, X.shape[1]], name='X_input')
y_tensor = tf.placeholder(tf.int64, shape=[None], name='y_input')

w = tf.Variable(tf.random_normal([X.shape[1], n_hidden]), name='w_0')
b = tf.Variable(tf.random.normal([n_hidden]), name='b_0')
product = tf.nn.leaky_relu(tf.matmul(X_tensor, tf.nn.dropout(w, rate=dropout_rate, name='w_0_dropout'),
                                     name='matmul_0') + tf.nn.dropout(b, rate=dropout_rate, name='b_0_dropout'),
                           name='activation_0')

w_1 = tf.Variable(tf.random_normal([n_hidden, n_hidden]), name='w_1')
b_1 = tf.Variable(tf.random_normal([n_hidden]), name='b_1')
product_1 = tf.nn.leaky_relu(tf.matmul(product, tf.nn.dropout(w_1, rate=dropout_rate, name='w_1_dropout'),
                                       name='matmul_1') + tf.nn.dropout(b_1, rate=dropout_rate, name='b_1_dropout'),
                             name='activation_1')

w_2 = tf.Variable(tf.random_normal([n_hidden, 1]), name='w_2')
b_2 = tf.Variable(tf.random_normal([1]), name='b_2')
product_2 = tf.reshape(tf.nn.leaky_relu(tf.matmul(product_1, tf.nn.dropout(w_2, rate=dropout_rate,
                                                                           name='w_2_dropout'),
                                                  name='matmul_2') + b_2, name='activation_2'), [-1],
                       name='reshape')
cost = tf.losses.mean_squared_error(labels=y_tensor, predictions=product_2)

#correct_predictions = tf.equal(tf.argmax(product, axis=1), y_tensor)
#accuracy = tf.reduce_mean(tf.cast(correct_predictions, tf.float64))
mae = tf.losses.absolute_difference(y_tensor, product_2)
correct_predictions = tf.equal(tf.cast(tf.round(product_2), tf.int64), y_tensor, name='correct')
accuracy = tf.reduce_mean(tf.cast(correct_predictions, tf.float64), name='accuracy')
optimizer = tf.train.GradientDescentOptimizer(learning_rate, name='optimizer').minimize(cost)
即使我将学习率降低到一文不值(1e-100),损失也会波动:

Step 2500, Minibatch Loss= 2.8308, Training Accuracy= 0.2525, Training MAE= 1.3107, lr= 0.00000000000000
Step 5000, Minibatch Loss= 2.7827, Training Accuracy= 0.2664, Training MAE= 1.2948, lr= 0.00000000000000
Step 7500, Minibatch Loss= 2.6718, Training Accuracy= 0.2481, Training MAE= 1.2784, lr= 0.00000000000000
Step 10000, Minibatch Loss= 2.6464, Training Accuracy= 0.2603, Training MAE= 1.2718, lr= 0.00000000000000
Step 12500, Minibatch Loss= 2.8204, Training Accuracy= 0.2614, Training MAE= 1.3014, lr= 0.00000000000000
也许我搞错了什么?所有数据都是按比例缩放的,因此lr=1e-100不会影响,尽管它确实会影响


我将非常感谢您的帮助

您确定参数会波动吗?您不显示执行代码,但是,很有可能显示的所有度量值都是当前历元中所有批次的平均值。这意味着第一行平均超过2500批,第二行平均超过5000批,以此类推

这可以解释这种波动。所以试着在历代之后打印出你的参数,如果它们确实也在变化,你可以消除这个假设