Python 暹罗网络的成本保持不变，为0.6932_Python_Tensorflow_Conv Neural Network

Python 暹罗网络的成本保持不变，为0.6932

python tensorflow

Python 暹罗网络的成本保持不变，为0.6932,python,tensorflow,conv-neural-network,Python,Tensorflow,Conv Neural Network,我正在尝试实现一个暹罗网络，如图所示在本文中，他们使用交叉熵作为损失函数我使用STL-10数据集进行训练，而不是本文中使用的三层网络，我将其替换为VGG-13 CNN网络，除了最后一层logit网络这是我的损失函数代码 def loss(pred,true_pred): cross_entropy_loss = tf.multiply(-1.0,tf.reduce_mean(tf.add(tf.multiply(true_pred,tf.log(pred)),tf.multiply

我正在尝试实现一个暹罗网络，如图所示

在本文中，他们使用交叉熵作为损失函数

我使用STL-10数据集进行训练，而不是本文中使用的三层网络，我将其替换为VGG-13 CNN网络，除了最后一层logit网络

这是我的损失函数代码

def loss(pred,true_pred):
    cross_entropy_loss = tf.multiply(-1.0,tf.reduce_mean(tf.add(tf.multiply(true_pred,tf.log(pred)),tf.multiply((1-true_pred),tf.log(tf.subtract(1.0,pred))))))
    total_loss = tf.add(tf.reduce_sum(tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)),cross_entropy_loss,name='total_loss')
    return cross_entropy_loss,total_loss

with tf.device('/gpu:0'):
    h1 = siamese(feed_image1)
    h2 = siamese(feed_image2)
    l1_dist = tf.abs(tf.subtract(h1,h2))

    with tf.variable_scope('pred') as scope:
        predictions = tf.contrib.layers.fully_connected(l1_dist,1,activation_fn = tf.sigmoid,weights_initializer = tf.contrib.layers.xavier_initializer(uniform=False),weights_regularizer = tf.contrib.layers.l2_regularizer(tf.constant(0.001, dtype=tf.float32)))

    celoss,cost = loss(predictions,feed_labels)

    with tf.variable_scope('adam_optimizer') as scope:
        optimizer = tf.train.AdamOptimizer(learning_rate=0.001)
        opt = optimizer.minimize(cost)

然而，当我进行培训时，成本几乎保持不变，为0.6932

我在这里使用了Adam Optimizer

但之前我使用了动量优化器。我尝试过改变学习率，但成本仍然不变

经过几次迭代后，所有预测值收敛到0.5
在获取两批图像（input1和input2）的输出后，我获取了它们的L1距离，并将完全连接的层与单个输出和sigmoid激活功能连接起来
[h1和h2包含VGG-13网络的最后一个完全连接层（而非logit层）的输出]
由于输出激活函数为sigmoid，且预测值约为0.5，因此我们可以计算并说两个网络输出的加权L1距离之和接近于零
我不明白我错在哪里。
非常感谢您的帮助。
我认为不收敛可能是由梯度消失引起的。您可以使用和张力板追踪渐变。有关更多详细信息，请参阅此
一些优化（可能）：
1）不要自己写交叉熵。您可以将sigmoid交叉熵与logits API结合使用，因为它确保了稳定性：
2）做一些可以帮助你的事
3）保持正则化损失小。你可以阅读更多信息
4）我不认为有必要使用tf.abs来测量L1距离
这是我修改的代码。希望能有帮助

mode = "training" rl_rate = .1 with tf.device('/gpu:0'): h1 = siamese(feed_image1) h2 = siamese(feed_image2) l1_dist = tf.subtract(h1, h2) # is it necessary to use abs? l1_dist_norm = tf.layers.batch_normalization(l1_dist, training=(mode=="training")) with tf.variable_scope('logits') as scope: w = tf.get_variable('fully_connected_weights', [tf.shape(l1_dist)[-1], 1], weights_initializer = tf.contrib.layers.xavier_initializer(uniform=False), weights_regularizer = tf.contrib.layers.l2_regularizer(tf.constant(0.001, dtype=tf.float32)) ) logits = tf.tensordot(l1_dist_norm, w, axis=1) xent_loss = tf.nn.sigmoid_cross_entropy_with_logits(logits=logits, labels=feed_labels) total_loss = tf.add(tf.reduce_sum(rl_rate * tf.abs(tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES))), (1-rl_rate) * xent_loss, name='total_loss') # or: # weights = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES) # l1_regularizer = tf.contrib.layers.l1_regularizer() # regularization_loss = tf.contrib.layers.apply_regularization(l1_regularizer, weights) # total_loss = xent_loss + regularization_loss with tf.variable_scope('adam_optimizer') as scope: optimizer = tf.train.AdamOptimizer(learning_rate=0.0005) opt = tf.contrib.layers.optimize_loss(total_loss, global_step, learning_rate=learning_rate, optimizer="Adam", clip_gradients=max_grad_norm, summaries=["gradients"])

我也有同样的问题，我的错误是停留在0.6931。在检查时，我观察到，距离L1距离的最后一个完全连接层的权重变得非常小，这必然导致总和为零，如您所述。不知道该怎么办
mode = "training" rl_rate = .1 with tf.device('/gpu:0'): h1 = siamese(feed_image1) h2 = siamese(feed_image2) l1_dist = tf.subtract(h1, h2) # is it necessary to use abs? l1_dist_norm = tf.layers.batch_normalization(l1_dist, training=(mode=="training")) with tf.variable_scope('logits') as scope: w = tf.get_variable('fully_connected_weights', [tf.shape(l1_dist)[-1], 1], weights_initializer = tf.contrib.layers.xavier_initializer(uniform=False), weights_regularizer = tf.contrib.layers.l2_regularizer(tf.constant(0.001, dtype=tf.float32)) ) logits = tf.tensordot(l1_dist_norm, w, axis=1) xent_loss = tf.nn.sigmoid_cross_entropy_with_logits(logits=logits, labels=feed_labels) total_loss = tf.add(tf.reduce_sum(rl_rate * tf.abs(tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES))), (1-rl_rate) * xent_loss, name='total_loss') # or: # weights = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES) # l1_regularizer = tf.contrib.layers.l1_regularizer() # regularization_loss = tf.contrib.layers.apply_regularization(l1_regularizer, weights) # total_loss = xent_loss + regularization_loss with tf.variable_scope('adam_optimizer') as scope: optimizer = tf.train.AdamOptimizer(learning_rate=0.0005) opt = tf.contrib.layers.optimize_loss(total_loss, global_step, learning_rate=learning_rate, optimizer="Adam", clip_gradients=max_grad_norm, summaries=["gradients"])