Python 简单的tensorflow代码,权重不更新

Python 简单的tensorflow代码,权重不更新,python,tensorflow,Python,Tensorflow,我制作了一个非常简单的tensorflow代码,它学习了一个函数G(x,x')=x'sigmoid(Wx')+x(1-sigmoid(Wx')),x'是x的扩充版本,我想学习一个G,使它总是返回x'。G没有收敛,然后我发现我的权重根本没有更新 from __future__ import division, print_function, absolute_import import tensorflow as tf import matplotlib.pyplot as plt import

我制作了一个非常简单的tensorflow代码,它学习了一个函数
G(x,x')=x'sigmoid(Wx')+x(1-sigmoid(Wx')),x'
是x的扩充版本,我想学习一个G,使它总是返回x'。G没有收敛,然后我发现我的权重根本没有更新

from __future__ import division, print_function, absolute_import
import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np

feature_dim = 5
instance_dim = 50
x = np.random.normal(loc=0.0, scale=1.0, size=[instance_dim, feature_dim])
W = np.random.normal(loc=0.0, scale=1.0, size=feature_dim)
W[0] = W[1]
y = np.dot(x, W)  # generate y using latent features
m = int(instance_dim / 3)

test_x = x[:m, :]
test_y = y[:m]
train_x = x[m:, :]
train_y = y[m:]

train_num = len(train_x)
test_num = len(test_x)
batch_size = 1
# batchnum = 2
batchnum = int(train_num / batch_size)
feature_dim = train_x.shape[1]
x = tf.placeholder(tf.float32, [None, feature_dim])
x1 = tf.placeholder(tf.float32, [None, feature_dim])
WW = tf.Variable(tf.random_normal([feature_dim, feature_dim], stddev=1 / np.sqrt(feature_dim)))
G_ = tf.nn.sigmoid(tf.matmul(x1, WW))
G = tf.multiply(x, 1 - G_, name=None) + tf.multiply(x1, G_, name=None)
loss = tf.reduce_mean(tf.squared_difference(G, x1))
train_step = tf.train.AdamOptimizer(1e-4).minimize(loss)

tf.summary.scalar("cost", loss)
summary_op = tf.summary.merge_all()

#optimizer = tf.train.AdadeltaOptimizer(0.1).minimize(loss)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
writer = tf.summary.FileWriter('visualization', graph=tf.get_default_graph())
for i in range(1000):
    c = np.random.shuffle(np.arange(instance_dim))
    train_x = np.matrix(train_x[c, :])
    train_y = np.matrix(train_y[c, :])

    for j in range(batchnum):
        batch_x = train_x[(j * batch_size): (j + 1) * batch_size, :]
        indexes = np.random.randint(feature_dim, size=[batch_size, 2])
        # indexes = np.random.randint(2, size=[batch_size, 2])
        total = 200
        batch_x1 = np.copy(batch_x)
        target_1 = indexes[:, 0]
        target_2 = indexes[:, 1]
        for k in range(batch_size):
            batch_x1[k, target_1[k]] = batch_x[k, target_1[k]] - total
            batch_x1[k, target_2[k]] = batch_x1[k, target_2[k]] + total
            """
        batch_x1[:, target_1] = batch_x[:, target_1] - total
        batch_x1[:, target_2] = batch_x1[:, target_2] + total
         """
        _, summary = sess.run([train_step, summary_op], feed_dict={x: batch_x, x1: batch_x1})
        print(sess.run(loss, feed_dict={x: batch_x, x1: batch_x1}))
        print(sess.run(WW, feed_dict={x: batch_x, x1: batch_x1}))
        # print(sess.run(G_, feed_dict={x: batch_x, x1: batch_x1}))
        # print(sess.run(G, feed_dict={x: batch_x, x1: batch_x1}))
        # print(sess.run(x1, feed_dict={x: batch_x, x1: batch_x1}))
        # print(sess.run(x, feed_dict={x: batch_x, x1: batch_x1}))
        # print(target_1)
        # print(target_2)
        writer.add_summary(summary, i)

如果你计算损失函数,你会发现

L = [(x-x_1)(1-sigmoid(Wx_1))]^2

我不明白您是如何处理批处理x1和批处理x的,但对于大多数培训示例来说,似乎都发生了奇怪的事情:要么
x=x1
,要么| x1 |非常大。当
x=x1
L=0时,
梯度不会改变;当| x1 |非常大时,梯度因S形函数而消失。事实上,对于一些不属于上述情况的示例,权重发生了变化。因此,培训代码是正确的(我认为),问题在于
batch_x1

如果你计算损失函数,你会发现

L = [(x-x_1)(1-sigmoid(Wx_1))]^2
我不明白您是如何处理批处理x1和批处理x的,但对于大多数培训示例来说,似乎都发生了奇怪的事情:要么
x=x1
,要么| x1 |非常大。当
x=x1
L=0时,
梯度不会改变;当| x1 |非常大时,梯度因S形函数而消失。事实上,对于一些不属于上述情况的示例,权重发生了变化。因此,培训代码是正确的(我认为),问题在于
batch_x1