Python Tensorflow:NaN用于自定义softmax_Python_Tensorflow

Python Tensorflow:NaN用于自定义softmax

python tensorflow

Python Tensorflow:NaN用于自定义softmax,python,tensorflow,Python,Tensorflow,简单地将nn.softmax函数交换为使用tf.exp的组合，保持其他一切不变，不仅会导致梯度包含NaN，还会导致中间变量s。我不知道这是为什么 tempX = x tempW = W tempMult = tf.matmul(tempX, W) s = tempMult + b #! ---------------------------- #p = tf.nn.softmax(s) p = tf.exp(s) / tf.reduce_sum(tf.exp(s), axis=1) #!---

简单地将nn.softmax函数交换为使用tf.exp的组合，保持其他一切不变，不仅会导致梯度包含NaN，还会导致中间变量s。我不知道这是为什么

tempX = x
tempW = W
tempMult = tf.matmul(tempX, W)
s = tempMult + b

#! ----------------------------
#p = tf.nn.softmax(s)
p = tf.exp(s) / tf.reduce_sum(tf.exp(s), axis=1)
#!------------------------------


myTemp = y*tf.log(p)
cost = tf.reduce_mean(-tf.reduce_sum(myTemp, reduction_indices=1)) + mylambda*tf.reduce_sum(tf.multiply(W,W))

grad_W, grad_b = tf.gradients(xs=[W, b], ys=cost)

new_W = W.assign(W - tf.multiply(learning_rate, grad_W))
new_b = b.assign(b - tf.multiply(learning_rate, grad_b))

答复

tf.exp（s）

对于较大的s很容易溢出。这就是为什么

tf.nn.softmax

实际上没有使用这个等式，而是做了一些与之相当的事情（根据文档）

讨论当我将softmax函数重写为

p = tf.exp(s) / tf.reshape( tf.reduce_sum(tf.exp(s), axis=1), [-1,1] )

它毫无问题地工作了

这是一个完全可以工作的python 2.7实现，它使用手工制作的softmax并可以工作（使用重塑函数）

也许你的M和b的初始值太大了。我试着重新运行我上面的代码，但权重初始化为大数字，我能够重现您的NaN问题

# -- imports --
import tensorflow as tf
import numpy as np

# np.set_printoptions(precision=1) reduces np precision output to 1 digit
np.set_printoptions(precision=2, suppress=True)

# -- constant data --
x = [[0., 0.], [1., 1.], [1., 0.], [0., 1.]]
y_ = [[1., 0.], [1., 0.], [0., 1.], [0., 1.]]

# -- induction --
# 1x2 input -> 2x3 hidden sigmoid -> 3x1 sigmoid output

# Layer 0 = the x2 inputs
x0 = tf.constant(x, dtype=tf.float32)
y0 = tf.constant(y_, dtype=tf.float32)

# Layer 1 = the 2x3 hidden sigmoid
m1 = tf.Variable(tf.random_uniform([2, 3], minval=0.1, maxval=0.9, dtype=tf.float32))
b1 = tf.Variable(tf.random_uniform([3], minval=0.1, maxval=0.9, dtype=tf.float32))
h1 = tf.sigmoid(tf.matmul(x0, m1) + b1)

# Layer 2 = the 3x2 softmax output
m2 = tf.Variable(tf.random_uniform([3, 2], minval=0.1, maxval=0.9, dtype=tf.float32))
b2 = tf.Variable(tf.random_uniform([2], minval=0.1, maxval=0.9, dtype=tf.float32))
h2 = tf.matmul(h1, m2) + b2
y_out = tf.exp(h2) / tf.reshape( tf.reduce_sum(tf.exp(h2), axis=1) , [-1,1] )


# -- loss --

# loss : sum of the squares of y0 - y_out
loss = tf.reduce_sum(tf.square(y0 - y_out))

# training step : gradient decent (1.0) to minimize loss
train = tf.train.GradientDescentOptimizer(1.0).minimize(loss)


# -- training --
# run 500 times using all the X and Y
# print out the loss and any other interesting info
#with tf.Session() as sess:
sess = tf.Session()
sess.run(tf.global_variables_initializer())
print "\nloss"
for step in range(500):
    sess.run(train)
    if (step + 1) % 100 == 0:
        print sess.run(loss)

results = sess.run([m1, b1, m2, b2, y_out, loss])
labels = "m1,b1,m2,b2,y_out,loss".split(",")
for label, result in zip(*(labels, results)):
    print ""
    print label
    print result

print ""