Python MLP(ReLu)在几次迭代后停止学习。张量流

Python MLP(ReLu)在几次迭代后停止学习。张量流,python,machine-learning,tensorflow,neural-network,Python,Machine Learning,Tensorflow,Neural Network,2层MLP(Relu)+Softmax 经过20次迭代后,Tensor Flow放弃并停止更新任何权重或偏差 我最初认为我的ReLu在哪里死亡,所以我显示了直方图,以确保它们中没有一个是0。他们都不是 他们只是在几次迭代后停止改变,交叉熵仍然很高。ReLu、Sigmoid和tanh给出了相同的结果。将GradientDescentOptimizer从0.01调整到0.5也不会有太大变化 一定有什么地方有虫子。就像我代码中的一个真正的bug。我连一个小样本都装不下 这是我的直方图,这是我的代码,如

2层MLP(Relu)+Softmax

经过20次迭代后,Tensor Flow放弃并停止更新任何权重或偏差

我最初认为我的ReLu在哪里死亡,所以我显示了直方图,以确保它们中没有一个是0。他们都不是

他们只是在几次迭代后停止改变,交叉熵仍然很高。ReLu、Sigmoid和tanh给出了相同的结果。将GradientDescentOptimizer从0.01调整到0.5也不会有太大变化

一定有什么地方有虫子。就像我代码中的一个真正的bug。我连一个小样本都装不下

这是我的直方图,这是我的代码,如果有人能查出来,那将是一个很大的帮助

我们有3000个标量,其中6个值介于0和255之间 分为两类:[1,0]或[0,1] (我确保将顺序随机化)

隐藏层1。输出不是零,所以这不是一个垂死的ReLu问题。但是,重量是恒定的!TF甚至没有试图修改它们

隐藏层2也一样。TF试着稍微调整一下,但很快就放弃了

交叉熵确实减少了,但仍然高得惊人

编辑: 我的代码中有很多错误。 第一个是python中的1/255=0。。。将其更改为1.0/255.0,我的代码开始生效

所以基本上,我的输入乘以0,神经网络就是完全盲的。因此,当他失明时,他试图获得最好的结果,然后放弃了。这完全解释了它的反应

现在我用了两次softmax。。。修改它也有帮助。 通过尝试不同的学习速度和不同的历元数,我最终发现了一些好东西

以下是最终工作代码:

    def runModel(self):


    def nn_layer(input_tensor, input_dim, output_dim, layer_name, act=tf.nn.relu):
        with tf.name_scope(layer_name):

            #This is standard weight for neural networks with ReLu.
            #I divide by math.sqrt(float(6)) because my input has 6 values
            weights = tf.Variable(tf.truncated_normal([input_dim, output_dim], stddev=1.0 / math.sqrt(float(6))))
            tf.summary.histogram('weights', weights)

            #I chose this bias myself. It work. Not sure why.
            biases = tf.Variable(tf.constant(0.4, shape=[output_dim]))
            tf.summary.histogram('biases', biases)

            preactivate = tf.matmul(input_tensor, weights) + biases
            tf.summary.histogram('pre_activations', preactivate)

            #Some neurons will have ReLu as activation function
            #Some won't have any activation functions
            if act == "None":
                activations = preactivate
            else :
                activations = act(preactivate, name='activation')
                tf.summary.histogram('activations', activations)

            return activations


    #We have 3000 scalars with 6 values between 0 and 255 to classify in two classes
    x = tf.placeholder(tf.float32, [None, 6])
    y = tf.placeholder(tf.float32, [None, 2])

    #After normalisation, input is between 0 and 1
    #Normalising input really helps. Nothing is doable without it
    #But my ERROR was to write 1/255. Becase in python
    #1/255 = 0 .... (integer division)
    #But 1.0/255.0 = 0,003921568 (float division)
    normalised = tf.scalar_mul(1.0/255.0,x)

    #Three layers total. The first one is just a matrix multiplication
    input = nn_layer(normalised, 6, 4, "input", act="None")
    #The second one has a ReLu after a matrix multiplication
    hidden1 = nn_layer(input, 4, 4, "hidden", act=tf.nn.relu)
    #The last one is also jsut a matrix multiplcation
    #WARNING ! No softmax here ! Because later we call a function
    #That implicitly does a softmax
    #And it's bad practice to do two softmax one after the other
    output = nn_layer(hidden1, 4, 2, "output", act="None")

    #Tried different learning rates
    #Higher learning rate means find a result faster
    #But could be a local minimum
    #Lower learning rate means we need much more epochs
    learning_rate = 0.03

    with tf.name_scope('learning_rate_'+str(learning_rate)):
        #Defining loss, accuracy etc..
        cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=output))      
        tf.summary.scalar('cross_entropy', cross_entropy)

        correct_prediction = tf.equal(tf.argmax(output, 1), tf.argmax(y, 1))

        accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) 
        tf.summary.scalar('accuracy', accuracy)

    #Init session and writers and misc
    session = tf.Session()

    train_writer = tf.summary.FileWriter('log', session.graph)
    train_writer.add_graph(session.graph)

    init= tf.global_variables_initializer()
    session.run(init)

    merged = tf.summary.merge_all()

    #Train
    train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(cross_entropy)

    batch_x, batch_y = self.trainData
    for _ in range(1000):
        session.run(train_step, {x: batch_x, y: batch_y})
        #Every 10 steps, add to the summary
        if _ % 10 == 0: 
            s = session.run(merged, {x: batch_x, y: batch_y})
            train_writer.add_summary(s, _)


    #Evaluate
    evaluate_x, evaluate_y = self.evaluateData
    print(session.run(accuracy, {x: batch_x, y: batch_y}))
    print(session.run(accuracy, {x: evaluate_x, y: evaluate_y}))

恐怕你得降低你的学习率。太高了。高学习率通常会使你达到当地最低水平,而不是全球最低水平

尝试0.001、0.0001甚至0.00001。或者让你的学习速度灵活


我没有检查代码,所以首先尝试调整LR。

以防万一将来有人需要它:


我用
np.random.randn
初始化了我的双层网络的层,但网络拒绝学习。使用He(用于ReLU)和Xavier(用于softmax)初始化完全有效

尝试了这3个值。同样的问题。我将发布并编辑我的问题和历史图。y中的类平衡是什么?请将第一层激活为线性。ReLu只在隐藏层中使用。这里似乎有一些错误:cross_entropy=tf.reduce_mean(tf.nn.softmax\u cross_entropy_with_logits(labels=y,logits=softmax))尝试不要像这里提到的那样在末尾使用softmax激活层:警告:此op需要无标度的logits,因为它在内部对logits执行softmax以提高效率。不要使用softmax的输出调用此op,因为它将产生不正确的结果。
    def runModel(self):


    def nn_layer(input_tensor, input_dim, output_dim, layer_name, act=tf.nn.relu):
        with tf.name_scope(layer_name):

            #This is standard weight for neural networks with ReLu.
            #I divide by math.sqrt(float(6)) because my input has 6 values
            weights = tf.Variable(tf.truncated_normal([input_dim, output_dim], stddev=1.0 / math.sqrt(float(6))))
            tf.summary.histogram('weights', weights)

            #I chose this bias myself. It work. Not sure why.
            biases = tf.Variable(tf.constant(0.4, shape=[output_dim]))
            tf.summary.histogram('biases', biases)

            preactivate = tf.matmul(input_tensor, weights) + biases
            tf.summary.histogram('pre_activations', preactivate)

            #Some neurons will have ReLu as activation function
            #Some won't have any activation functions
            if act == "None":
                activations = preactivate
            else :
                activations = act(preactivate, name='activation')
                tf.summary.histogram('activations', activations)

            return activations


    #We have 3000 scalars with 6 values between 0 and 255 to classify in two classes
    x = tf.placeholder(tf.float32, [None, 6])
    y = tf.placeholder(tf.float32, [None, 2])

    #After normalisation, input is between 0 and 1
    #Normalising input really helps. Nothing is doable without it
    #But my ERROR was to write 1/255. Becase in python
    #1/255 = 0 .... (integer division)
    #But 1.0/255.0 = 0,003921568 (float division)
    normalised = tf.scalar_mul(1.0/255.0,x)

    #Three layers total. The first one is just a matrix multiplication
    input = nn_layer(normalised, 6, 4, "input", act="None")
    #The second one has a ReLu after a matrix multiplication
    hidden1 = nn_layer(input, 4, 4, "hidden", act=tf.nn.relu)
    #The last one is also jsut a matrix multiplcation
    #WARNING ! No softmax here ! Because later we call a function
    #That implicitly does a softmax
    #And it's bad practice to do two softmax one after the other
    output = nn_layer(hidden1, 4, 2, "output", act="None")

    #Tried different learning rates
    #Higher learning rate means find a result faster
    #But could be a local minimum
    #Lower learning rate means we need much more epochs
    learning_rate = 0.03

    with tf.name_scope('learning_rate_'+str(learning_rate)):
        #Defining loss, accuracy etc..
        cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=output))      
        tf.summary.scalar('cross_entropy', cross_entropy)

        correct_prediction = tf.equal(tf.argmax(output, 1), tf.argmax(y, 1))

        accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) 
        tf.summary.scalar('accuracy', accuracy)

    #Init session and writers and misc
    session = tf.Session()

    train_writer = tf.summary.FileWriter('log', session.graph)
    train_writer.add_graph(session.graph)

    init= tf.global_variables_initializer()
    session.run(init)

    merged = tf.summary.merge_all()

    #Train
    train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(cross_entropy)

    batch_x, batch_y = self.trainData
    for _ in range(1000):
        session.run(train_step, {x: batch_x, y: batch_y})
        #Every 10 steps, add to the summary
        if _ % 10 == 0: 
            s = session.run(merged, {x: batch_x, y: batch_y})
            train_writer.add_summary(s, _)


    #Evaluate
    evaluate_x, evaluate_y = self.evaluateData
    print(session.run(accuracy, {x: batch_x, y: batch_y}))
    print(session.run(accuracy, {x: evaluate_x, y: evaluate_y}))