Python 为什么TensorFlow的实现远不如Matlab'；s NN？_Python_Matlab_Neural Network_Tensorflow

Python 为什么TensorFlow的实现远不如Matlab'；s NN？

python matlab neural-network tensorflow

Python 为什么TensorFlow的实现远不如Matlab'；s NN？,python,matlab,neural-network,tensorflow,Python,Matlab,Neural Network,Tensorflow,作为一个玩具示例，我试图从100个无噪声数据点拟合函数f（x）=1/x。matlab默认实现在均方差~10^-10的情况下非常成功，并且插值非常完美我实现了一个包含10个乙状结肠神经元的隐藏层的神经网络。我是神经网络的初学者，所以要提防愚蠢的代码 import tensorflow as tf import numpy as np def weight_variable(shape): initial = tf.truncated_normal(shape, stddev=0.1)

作为一个玩具示例，我试图从100个无噪声数据点拟合函数

f（x）=1/x

。matlab默认实现在均方差~10^-10的情况下非常成功，并且插值非常完美

我实现了一个包含10个乙状结肠神经元的隐藏层的神经网络。我是神经网络的初学者，所以要提防愚蠢的代码

import tensorflow as tf
import numpy as np

def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

#Can't make tensorflow consume ordinary lists unless they're parsed to ndarray
def toNd(lst):
    lgt = len(lst)
    x = np.zeros((1, lgt), dtype='float32')
    for i in range(0, lgt):
        x[0,i] = lst[i]
    return x

xBasic = np.linspace(0.2, 0.8, 101)
xTrain = toNd(xBasic)
yTrain = toNd(map(lambda x: 1/x, xBasic))

x = tf.placeholder("float", [1,None])
hiddenDim = 10

b = bias_variable([hiddenDim,1])
W = weight_variable([hiddenDim, 1])

b2 = bias_variable([1])
W2 = weight_variable([1, hiddenDim])

hidden = tf.nn.sigmoid(tf.matmul(W, x) + b)
y = tf.matmul(W2, hidden) + b2

# Minimize the squared errors.
loss = tf.reduce_mean(tf.square(y - yTrain))
optimizer = tf.train.GradientDescentOptimizer(0.5)
train = optimizer.minimize(loss)

# For initializing the variables.
init = tf.initialize_all_variables()

# Launch the graph
sess = tf.Session()
sess.run(init)

for step in xrange(0, 4001):
    train.run({x: xTrain}, sess)
    if step % 500 == 0:
        print loss.eval({x: xTrain}, sess)

均方差在~2*10^-3处结束，因此比matlab差约7个数量级。想象

xTest = np.linspace(0.2, 0.8, 1001)
yTest = y.eval({x:toNd(xTest)}, sess)  
import matplotlib.pyplot as plt
plt.plot(xTest,yTest.transpose().tolist())
plt.plot(xTest,map(lambda x: 1/x, xTest))
plt.show()

我们可以看出，这种配合在系统上是不完美的：虽然matlab one在肉眼看来非常完美，但差异一致<10^-5：我尝试用TensorFlow复制Matlab网络图：

顺便说一句，该图似乎暗示了一个tanh而不是sigmoid激活函数。我无法在文档中找到它。然而，当我尝试在TensorFlow中使用tanh神经元时，拟合很快失败，变量为

nan

。我不知道为什么

Matlab使用Levenberg–Marquardt训练算法。贝叶斯正则化在均方为10^-12时更为成功（我们可能在浮点运算的领域）

为什么TensorFlow的实现如此糟糕，我能做些什么来让它更好呢？

我尝试了50000次迭代，结果得到了0.00012的错误。特斯拉K40大约需要180秒

对于这类问题，一阶梯度下降似乎不是一个好的选择（双关语），你需要Levenberg–Marquardt或l-BFGS。我认为还没有人在TensorFlow中实现它们

编辑

使用

tf.train.AdamOptimizer（0.1）

解决此问题。经过4000次迭代后，它达到

3.13729e-05

。另外，对于这个问题，使用默认策略的GPU似乎也是个坏主意。有很多小的操作，开销导致GPU版本的运行速度比我机器上的CPU慢3倍。

顺便说一句，这里有一个稍微改进的版本，它解决了一些形状问题和tf和np之间不必要的反弹。40k步数后达到3e-08，4000步数后达到约1.5e-5：

import tensorflow as tf
import numpy as np

def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

xTrain = np.linspace(0.2, 0.8, 101).reshape([1, -1])
yTrain = (1/xTrain)

x = tf.placeholder(tf.float32, [1,None])
hiddenDim = 10

b = bias_variable([hiddenDim,1])
W = weight_variable([hiddenDim, 1])

b2 = bias_variable([1])
W2 = weight_variable([1, hiddenDim])

hidden = tf.nn.sigmoid(tf.matmul(W, x) + b)
y = tf.matmul(W2, hidden) + b2

# Minimize the squared errors.                                                                
loss = tf.reduce_mean(tf.square(y - yTrain))
step = tf.Variable(0, trainable=False)
rate = tf.train.exponential_decay(0.15, step, 1, 0.9999)
optimizer = tf.train.AdamOptimizer(rate)
train = optimizer.minimize(loss, global_step=step)
init = tf.initialize_all_variables()

# Launch the graph                                                                            
sess = tf.Session()
sess.run(init)

for step in xrange(0, 40001):
    train.run({x: xTrain}, sess)
    if step % 500 == 0:
        print loss.eval({x: xTrain}, sess)

综上所述，LMA在拟合2D曲线方面比更通用的DNN风格的优化器做得更好，这可能并不太令人惊讶。Adam和其他人的目标是非常高维的问题，以及（见12-15）。

我还没有研究张量流，对此很抱歉，但是你用numpy做了一些奇怪的事情，有了

toNd

函数

np.linspace

已经返回了一个ndarray，而不是一个列表，如果你想将一个列表转换成一个ndarray，你需要做的就是

np.array（my_list）

，如果你只需要额外的轴，你可以做

new_array=my_array[np.newaxis，：]

。它可能只是在零误差附近停止，因为它应该这样做。大多数数据都有噪声，您不一定希望它的训练误差为零。从“reduce_mean”判断，它可能使用交叉验证。@AdamAcosta

toNd

绝对是我缺乏经验的权宜之计。我以前试过

np.array

，问题似乎是

np.array（[5,7]）。shape

是

（2，）

，而不是

（2,1）

my_数组[np.newaxis，：]

似乎可以更正此问题，谢谢！我不使用python，而是日常使用。@AdamAcostaI我不认为

reduce\u意思是

会进行交叉验证。从文档中：

计算张量各维度的元素平均值。Matlab做交叉验证，在我看来，与没有交叉验证相比，交叉验证应该减少训练样本的匹配度，是吗？是的，交叉验证通常会阻止完美匹配。很抱歉，没有一个真正的答案。关于张量流的知识仍然非常稀少。我最近看到很多关于它的问题出现了，但没有太多的答案。Udacity正在开发一门关于它的课程，作为他们新的机器学习工程师学位的一部分。我发誓我不为Udacity工作，但这可能值得一看！谢谢你检查这个。你是说我的5000圈，20米基础训练跑？你能确认它在将隐藏层更改为tanh神经元时失败了吗？如果是的话，你知道为什么会发生吗？我刚刚将你的xrange（4001）更改为xrange（5000）。对于tanh来说，培训似乎以0.5的学习率进行。一般来说，对于梯度下降，你需要为每个问题调整你的学习速度，如果我做tf.train.GradientDescentOptimizer（0.1），它似乎是有效的。我了解了梯度参数。非常奇怪的是，xrange（0，5000）比4k范围精确一个数量级，在GPU上需要180秒。我在CPU上运行相同的范围，精度不变，只需不到10秒。糟糕，打字错误，50000，而不是5000也-将您的数据类型从float32更改为float64，调整adamoptimizer以使用指数衰减的学习率，从0.2逐步降低，exp decay 0.9999在4000个训练步骤后得到1.44e-05。步长=tf.变量（0，可训练=错误）速率=tf.训练.指数衰减（0.2，步长，1，0.9999）优化器=tf.训练.自适应优化器（速率）训练=优化器.最小化（损失，全局步长=步长）