全连通神经网络的丢失不'；用tensorflow进行t形坠落训练_Tensorflow_Neural Network_Deep Learning

全连通神经网络的丢失不'；用tensorflow进行t形坠落训练

tensorflow neural-network deep-learning

全连通神经网络的丢失不'；用tensorflow进行t形坠落训练,tensorflow,neural-network,deep-learning,Tensorflow,Neural Network,Deep Learning,我正在使用tensorflow来训练我自己的完全连接的网络，但是在最初的几次迭代中，网络的损失在显著下降后不再改变，并且损失一直徘徊在4.3左右。我不知道哪里出了问题。改变学习速度似乎没有帮助我在数据集中使用的样本输入（代码中名为“feat”）是一个长度为13294的稀疏向量，其中只有大约五个位置是有效的，其余的被分配到1。一批列车x看起来像： [[1 1 1 1 1 1... - 96...1 1 1 1... - 84...1 1 1 1... - 56...1 1 1 1] [1 1

我正在使用tensorflow来训练我自己的完全连接的网络，但是在最初的几次迭代中，网络的损失在显著下降后不再改变，并且损失一直徘徊在4.3左右。我不知道哪里出了问题。改变学习速度似乎没有帮助

我在数据集中使用的样本输入（代码中名为“feat”）是一个长度为13294的稀疏向量，其中只有大约五个位置是有效的，其余的被分配到1。一批列车x看起来像：

[[1 1 1 1 1 1... - 96...1 1 1 1... - 84...1 1 1 1... - 56...1 1 1 1]
 [1 1 1 1 1... - 47...1 1 1 1 1... - 52...1 1 1 1 1.......1 1 1 1 1]
 ...
]

[
28
28
110
34
...
]

样本的标签是单个值，该值介于0和137之间。一批y列看起来像：

[[1 1 1 1 1 1... - 96...1 1 1 1... - 84...1 1 1 1... - 56...1 1 1 1]
 [1 1 1 1 1... - 47...1 1 1 1 1... - 52...1 1 1 1 1.......1 1 1 1 1]
 ...
]

[
28
28
110
34
...
]

我有26816个训练样本用于训练

使用的代码如下所示

"""Neural network applied with tensroflow.

"""

from __future__ import print_function
import tensorflow as tf
import numpy as np
from scipy.sparse import coo_matrix

file_wifi_feat = 'wifi_feat.npy'
file_shop_label = 'shop_label.npy'
num_shops = 137
num_wifis = 13294
num_hidden_1 = 8192
num_hidden_2 = 2048
num_hidden_3 = 512
num_hidden_4 = 128
num_hidden_5 = 64


class BatchReader:
    def __init__(self, feat, label):
        self.shuffle = True
        self.feat = []
        self.label = []
        self.batch_offset = 0
        self._load_data(feat, label)

    def _load_data(self, feat, label):
        self.feat = np.load(feat)
        self.label = np.load(label)

    def next_batch(self, batch_size):
        start = self.batch_offset
        self.batch_offset += batch_size
        if self.batch_offset > self.feat.shape[0]:
            perm = np.arange(self.feat.shape[0])
            np.random.shuffle(perm)
            self.feat = self.feat[perm]
            self.label = self.label[perm]
            start = 0
            self.batch_offset = batch_size
        end = self.batch_offset
        batch_feat = np.array([m.toarray()[0] for m in self.feat[start:end]])
        batch_feat[np.where(batch_feat == 0)] = 1
        batch_label = self.label[start:end]
        return batch_feat, batch_label


def weight_variable(shape):
    """weight_variable generates a weight variable of a given shape."""
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.get_variable(name='weights', initializer=initial)


def bias_variable(shape):
    """bias_variable generates a bias variable of a given shape."""
    initial = tf.constant(0.1, shape=shape)
    return tf.get_variable(name='bias', initializer=initial)


def main(argv=None):
    batch_reader = BatchReader(file_wifi_feat, file_shop_label)

    feat_ph = tf.placeholder(tf.float32, [None, num_wifis])
    label_ph = tf.placeholder(tf.int32, [None])

    with tf.variable_scope('h1'):
        weight = weight_variable([num_wifis, num_hidden_1])
        bias = bias_variable([num_hidden_1])
        L1 = tf.nn.relu(tf.matmul(feat_ph, weight) + bias)

    with tf.variable_scope('h2'):
        weight = weight_variable([num_hidden_1, num_hidden_2])
        bias = bias_variable([num_hidden_2])
        L2 = tf.nn.relu(tf.matmul(L1, weight) + bias)

    with tf.variable_scope('h3'):
        weight = weight_variable([num_hidden_2, num_hidden_3])
        bias = bias_variable([num_hidden_3])
        L3 = tf.nn.relu(tf.matmul(L2, weight) + bias)

    with tf.variable_scope('h4'):
        weight = weight_variable([num_hidden_3, num_hidden_4])
        bias = bias_variable([num_hidden_4])
        L4 = tf.nn.relu(tf.matmul(L3, weight) + bias)

    with tf.variable_scope('h5'):
        weight = weight_variable([num_hidden_4, num_hidden_5])
        bias = bias_variable([num_hidden_5])
        L5 = tf.nn.relu(tf.matmul(L4, weight) + bias)

    with tf.variable_scope('hypo'):
        weight = weight_variable([num_hidden_5, num_shops])
        bias = bias_variable([num_shops])
        hypothesis = tf.matmul(L5, weight) + bias

    loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=hypothesis, labels=label_ph))
    correct_prediction = tf.equal(tf.argmax(hypothesis, 1), tf.cast(label_ph, tf.int64))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

    global_step = tf.Variable(0, trainable=False)
    starter_learning_rate = 0.1
    learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step, 20, 0.96, staircase=True)
    optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss, global_step=global_step)

    config = tf.ConfigProto()
    config.gpu_options.allow_growth = True
    sess = tf.Session(config=config)
    sess.run(tf.initialize_all_variables())

    for itr in xrange(100001):
        feat, label = batch_reader.next_batch(256)
        feed_dict = {feat_ph: feat, label_ph: label}
        sess.run(optimizer, feed_dict=feed_dict)
        # hypothesis_val = sess.run(hypothesis, feed_dict=feed_dict)
        if itr % 10 == 0:
            loss_val, accuracy_val, learning_rate_val = sess.run([loss, accuracy, learning_rate], feed_dict=feed_dict)
            print('Step %d, loss %g, accuracy %g, learning_rate %g' % (itr, loss_val, accuracy_val, learning_rate_val))


if __name__ == '__main__':
    tf.app.run()

输出如下所示（即使运行到步骤10000，损失也不会有太大变化）：

我认为你的全局步骤出错了

通常在网络中使用tf.Variable在训练阶段自动更新is值。此外，您将全局_步骤初始化为tf.Variable，但将其设置为不可训练并将值设置为0

您可以在文档中找到有关变量的更多信息：

希望能帮助你

首先，在迭代1中获得的所有损失表明，您的网络被可怕地初始化了。初始损失不应大于10，但您的情况是1e9。将网络初始化中的std至少降低一个数量级。一般来说，您不应该手动初始化变量，请使用众所周知的启发式方法，如Xavier initialiser（准备在TF中使用）

第二件事是数据规范化——根据提供给您的数据量巨大的代码片段，确保每个特征维度的平均值为0，标准值为1。这真的很重要，尤其是当relu激活时，信号太大会导致“死亡”

最后，我们不应该从复杂的体系结构开始，为什么要从5个隐藏层和复杂的学习速率计划开始呢？这些内容应该根据需要添加，而不是作为默认值使用。通过简单地从固定学习率（甚至是默认学习率）和小型网络（比如1-2个隐藏层）开始，可以避免上述许多问题。一旦还不够-深入/使用更高级的方法是个好主意，但从它开始会让你更难理解事情为什么会变得糟糕。

我还建议你从固定的学习率开始，并将其传递给优化器。然后，如果你的损失减少，试着使用自适应学习率。OP所做的正是应该如何使用全局_步骤。谢谢你的建议。你指出的问题可能是原因之一，但不是最关键的。我指的是使用

global\u step

。我用@lejlot提到的方法解决了这个问题。现在我面临着一个新的问题，当处理过度装修，你介意看看吗？谢谢，谢谢你的建议。我修改了层数，当前网络是一个只包含一个隐藏层的结构。我计算训练集的平均值，并将

train\ux-mean

发送到神经网络，而不是原始的

train\ux

。我将初始化方法修改为Xavier。损失如预期般减少。但我在处理过度装修问题时遇到了一个新问题。我发了一个新问题，希望能得到你的帮助。