Python 尝试适应TensorFlow'；s MNIST示例给出了NAN预测_Python_Machine Learning_Tensorflow

Python 尝试适应TensorFlow'；s MNIST示例给出了NAN预测

python machine-learning tensorflow

Python 尝试适应TensorFlow'；s MNIST示例给出了NAN预测,python,machine-learning,tensorflow,Python,Machine Learning,Tensorflow,我正在使用TensorFlow，使用“MNIST for初学者”示例（）。我做了一些轻微的调整： mnist = input_data.read_data_sets(FLAGS.data_dir, one_hot=True) sess = tf.InteractiveSession() # Create the model x = tf.placeholder(tf.float32, [None, 784]) W = tf.Variable(tf.zeros([784, 10])) b =

我正在使用TensorFlow，使用“MNIST for初学者”示例（）。我做了一些轻微的调整：

mnist = input_data.read_data_sets(FLAGS.data_dir, one_hot=True)

sess = tf.InteractiveSession()

# Create the model
x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W) + b)

# Define loss and optimizer
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

fake_images = mnist.train.images.tolist() 

# Train
tf.initialize_all_variables().run()
for i in range(10):
  batch_xs, batch_ys = fake_images, mnist.train.labels
  train_step.run({x: batch_xs, y_: batch_ys})

# Test trained model
print(y.eval({x: mnist.test.images}))

具体来说，我只运行了10次训练步骤（我不关心准确性，更关心速度）。为了简单起见，我同时在所有数据上运行它。最后，我输出的是TF做出的预测，而不是准确率百分比。以下是上述代码的（部分）输出：

 [  1.08577311e-02   7.29394853e-01   5.02395593e-02 ...,   2.74689011e-02
    4.43389975e-02   2.32385024e-02]
 ..., 
 [  2.95746652e-03   1.30554764e-02   1.39354384e-02 ...,   9.16484520e-02
    9.70732421e-02   2.57733971e-01]
 [  5.94450533e-02   1.36338845e-01   5.22132218e-02 ...,   6.91468120e-02
    1.95634082e-01   4.83607128e-02]
 [  4.46179360e-02   6.66685810e-04   3.84704918e-02 ...,   6.51754031e-04
    2.46591796e-03   3.10819712e-03]]

这似乎是TF分配给每个可能性的概率（0-9）。天下无难事

我的主要目标是使其适应其他用途，但首先我想确保我可以为其提供其他数据。这就是我尝试过的：

fake_images = np.random.rand(55000, 784).astype('float32').tolist()

据我所知，它应该生成一个随机垃圾数组，其结构与MNIST的数据相同。但是做了上面的改变，我得到的是：

[[ nan  nan  nan ...,  nan  nan  nan]
 [ nan  nan  nan ...,  nan  nan  nan]
 [ nan  nan  nan ...,  nan  nan  nan]
 ..., 
 [ nan  nan  nan ...,  nan  nan  nan]
 [ nan  nan  nan ...,  nan  nan  nan]
 [ nan  nan  nan ...,  nan  nan  nan]]

这显然没什么用处。查看每个选项（

mnist.train.images

和

np.rand.rand

选项），这两个选项看起来都是

列表的列表的浮动的s
为什么TensorFlow不接受这个数组？它只是抱怨，因为它认识到无法从一堆随机数据中学习？我想不会，但我以前就错了。
真正的MNIST数据包含非常稀疏的数据。大多数值为零。您的合成数据是均匀分布的（请参阅）。经过训练的W和b假设为稀疏输入。您训练的模型可能过度拟合，并且与特定输入像素连接的W权重非常大，以允许良好的输出概率（较大的后softmax值需要较大的前softmax激活）。当你输入合成数据时，突然之间，所有的输入量都比以前大得多，导致到处都有非常大的激活，可能导致溢出。
让你困惑的是日志（softmax）在数值上并不稳定
在数值上是稳定的
那么，你能做什么
activations = tf.matmul(x, W) + b
loss = tf.nn.softmax_cross_entropy_with_logits(activations, y)

# only to get predictions, for accuracy or you know, actual forward use of the model
predictions = tf.nn.softmax(activations) 

我懒得在log softmax numerical stability上找到机器学习堆栈交换文章，但我相信您可以很快找到它们。
np.random.rand
生成范围[0，1]
的数字。您从mnist.train.images
获得的数字范围是多少？很可能有些中间值溢出或不足。我会尝试打印中间值，比如说，tf.matmul（x，W）
的结果，看看这是否是问题所在。0.5
由于学习率也相当高，请尝试0.01
或less@kevemanmnist.train.images[0]
是一个由0-0浮动组成的数组，所有浮动都在0-1范围内。在尝试拆分matmul
作为中间计算时，我将训练步骤的数量更改为2。。。突然，它开始工作，给出了预期的10%的准确度。我已经把它和修正它的那个改变隔离开来，这似乎证实了一个过流/下流。关于这可能在哪里或者我如何解决它的想法？这有助于提高准确性，但在我的测试中，它没有解决nan
问题。请参阅修订后的答案。这是有意义的，尽管这发生在新训练的模型上（使用随机数据训练）