Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/tensorflow/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/elixir/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用带有Tensorflow 2.0的Scipy优化器进行神经网络训练_Python_Tensorflow_Keras_Neural Network_Scipy Optimize - Fatal编程技术网

Python 使用带有Tensorflow 2.0的Scipy优化器进行神经网络训练

Python 使用带有Tensorflow 2.0的Scipy优化器进行神经网络训练,python,tensorflow,keras,neural-network,scipy-optimize,Python,Tensorflow,Keras,Neural Network,Scipy Optimize,在引入Tensorflow 2.0之后,scipy接口(tf.contrib.opt.ScipyOptimizerInterface)已被删除。但是,我仍然希望使用scipy优化器scipy.optimize.minimize(method='L-BFGS-B')来训练神经网络(keras模型序列)。为了使优化器工作,它需要输入一个函数fun(x0),其中x0是一个形状数组(n,)。因此,第一步是“展平”权重矩阵,以获得具有所需形状的向量。为此,我修改了提供的代码。这提供了一个函数工厂,旨在创建

在引入Tensorflow 2.0之后,scipy接口(tf.contrib.opt.ScipyOptimizerInterface)已被删除。但是,我仍然希望使用scipy优化器scipy.optimize.minimize(method='L-BFGS-B')来训练神经网络(keras模型序列)。为了使优化器工作,它需要输入一个函数fun(x0),其中x0是一个形状数组(n,)。因此,第一步是“展平”权重矩阵,以获得具有所需形状的向量。为此,我修改了提供的代码。这提供了一个函数工厂,旨在创建这样一个函数fun(x0)。但是,代码似乎不起作用,损失函数也没有减少。如果有人能帮我解决这个问题,我将不胜感激

下面是我正在使用的一段代码:

func = function_factory(model, loss_function, x_u_train, u_train)

# convert initial model parameters to a 1D tf.Tensor
init_params = tf.dynamic_stitch(func.idx, model.trainable_variables)
init_params = tf.cast(init_params, dtype=tf.float32)

# train the model with L-BFGS solver
results = scipy.optimize.minimize(fun=func, x0=init_params, method='L-BFGS-B')


def loss_function(x_u_train, u_train, network):
    u_pred = tf.cast(network(x_u_train), dtype=tf.float32)
    loss_value = tf.reduce_mean(tf.square(u_train - u_pred))
    return tf.cast(loss_value, dtype=tf.float32)


def function_factory(model, loss_f, x_u_train, u_train):
    """A factory to create a function required by tfp.optimizer.lbfgs_minimize.

    Args:
        model [in]: an instance of `tf.keras.Model` or its subclasses.
        loss [in]: a function with signature loss_value = loss(pred_y, true_y).
        train_x [in]: the input part of training data.
        train_y [in]: the output part of training data.

    Returns:
        A function that has a signature of:
            loss_value, gradients = f(model_parameters).
    """

    # obtain the shapes of all trainable parameters in the model
    shapes = tf.shape_n(model.trainable_variables)
    n_tensors = len(shapes)

    # we'll use tf.dynamic_stitch and tf.dynamic_partition later, so we need to
    # prepare required information first
    count = 0
    idx = [] # stitch indices
    part = [] # partition indices

    for i, shape in enumerate(shapes):
        n = np.product(shape)
        idx.append(tf.reshape(tf.range(count, count+n, dtype=tf.int32), shape))
        part.extend([i]*n)
        count += n

    part = tf.constant(part)


    def assign_new_model_parameters(params_1d):
        """A function updating the model's parameters with a 1D tf.Tensor.

        Args:
            params_1d [in]: a 1D tf.Tensor representing the model's trainable parameters.
        """

        params = tf.dynamic_partition(params_1d, part, n_tensors)
        for i, (shape, param) in enumerate(zip(shapes, params)):

            model.trainable_variables[i].assign(tf.cast(tf.reshape(param, shape), dtype=tf.float32))

    # now create a function that will be returned by this factory

    def f(params_1d):
        """
        This function is created by function_factory.
        Args:
            params_1d [in]: a 1D tf.Tensor.

        Returns:
            A scalar loss.
        """

        # update the parameters in the model
        assign_new_model_parameters(params_1d)
        # calculate the loss
        loss_value = loss_f(x_u_train, u_train, model)

        # print out iteration & loss
        f.iter.assign_add(1)
        tf.print("Iter:", f.iter, "loss:", loss_value)

        return loss_value

    # store these information as members so we can use them outside the scope
    f.iter = tf.Variable(0)
    f.idx = idx
    f.part = part
    f.shapes = shapes
    f.assign_new_model_parameters = assign_new_model_parameters

    return f
这里,模型是一个对象tf.keras.Sequential


提前感谢您的帮助

我猜SciPy不知道如何计算TensorFlow对象的梯度。尝试使用原始函数工厂(即,在丢失后还一起返回渐变),并在
scipy.optimize.minimize
中设置
jac=True

我测试了原始Gist中的python代码,并用SciPy optimizer替换了
tfp.optimizer.lbfgs\u minimize
。它使用了
BFGS
方法:

results = scipy.optimize.minimize(fun=func, x0=init_params, jac=True, method='BFGS')
jac=True
表示SciPy知道
func
也会返回渐变


然而,对于
L-BFGS-B
,这是个棘手的问题。经过努力,我终于成功了。我必须注释掉
@tf.function
行,让
func
返回
grads.numpy()
,而不是原始的tf张量。我想这是因为
L-BFGS-B
的底层实现是一个Fortran函数,所以从tf.Tensor->numpy array->Fortran array转换数据时可能会出现一些问题。强制函数
func
返回渐变的
ndarray
版本可以解决问题。但是不可能使用
@tf.function

将tf1更改为tf2,我遇到了相同的问题,经过一点实验,我找到了下面的解决方案,该解决方案展示了如何在用tf.function修饰的函数和scipy优化器之间建立接口。与问题相比,重要的变化是:

  • 正如Ives scipy的lbfgs所述 需要获得函数值和梯度,因此需要提供一个同时提供这两个值的函数,然后设置
    jac=True
  • scipy的lbfgs是一个Fortran函数,它期望接口提供np.float64数组,而tensorflow tf.function使用tf.float32。 所以我们必须转换输入和输出
  • 下面我提供了一个如何解决玩具问题的例子

    import tensorflow as tf
    import numpy as np
    import scipy.optimize as sopt
    
    def model(x):
        return tf.reduce_sum(tf.square(x-tf.constant(2, dtype=tf.float32)))
    
    @tf.function
    def val_and_grad(x):
        with tf.GradientTape() as tape:
            tape.watch(x)
            loss = model(x)
        grad = tape.gradient(loss, x)
        return loss, grad
    
    def func(x):
        return [vv.numpy().astype(np.float64)  for vv in val_and_grad(tf.constant(x, dtype=tf.float32))]
    
    resdd= sopt.minimize(fun=func, x0=np.ones(5),
                                          jac=True, method='L-BFGS-B')
    
    print("info:\n",resdd)
    
    显示

    info:
           fun: 7.105427357601002e-14
     hess_inv: <5x5 LbfgsInvHessProduct with dtype=float64>
          jac: array([-2.38418579e-07, -2.38418579e-07, -2.38418579e-07, -2.38418579e-07,
           -2.38418579e-07])
      message: b'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL'
         nfev: 3
          nit: 2
       status: 0
      success: True
            x: array([1.99999988, 1.99999988, 1.99999988, 1.99999988, 1.99999988])
    
    TF2.0急切模式(TF2.0(E))工作正常,但比TF1.12基线版本慢约20%。TF2.0(G)与tf.function配合使用效果良好,速度略快于TF1.12,这是一件值得注意的事情

    tensorflow_probability(TF2.0/TFP)中的优化器比使用scipy的lbfgs的TF2.0(G)稍快,但没有实现相同的错误减少。事实上,随着时间的推移,损失的减少并不是单调的,这似乎是一个坏迹象。比较lbfgs的两种实现(scipy和tensorflow_probability=TFP),很明显,scipy中的Fortran代码要复杂得多。
    因此,TFP中算法的简化在这里是有害的,甚至TFP在float32中执行所有计算的事实也可能是一个问题。

    下面是一个使用库()的简单解决方案,我是在Roebel的答案的基础上编写的:

    import tensorflow as tf
    from autograd_minimize import minimize
    
    def rosen_tf(x):
        return tf.reduce_sum(100.0*(x[1:] - x[:-1]**2.0)**2.0 + (1 - x[:-1])**2.0)
    
    res = minimize(rosen_tf, np.array([0.,0.]))
    print(res.x)
    >>> array([0.99999912, 0.99999824])
    
    它也适用于keras模型,如线性回归的简单示例所示:

    import numpy as np
    from tensorflow import keras
    from tensorflow.keras import layers
    from autograd_minimize.tf_wrapper import tf_function_factory
    from autograd_minimize import minimize 
    import tensorflow as tf
    
    #### Prepares data
    X = np.random.random((200, 2))
    y = X[:,:1]*2+X[:,1:]*0.4-1
    
    #### Creates model
    model = keras.Sequential([keras.Input(shape=2),
                              layers.Dense(1)])
    
    # Transforms model into a function of its parameter
    func, params = tf_function_factory(model, tf.keras.losses.MSE, X, y)
    
    # Minimization
    res = minimize(func, params, method='L-BFGS-B')
    
    print(res.x)
    >>> [array([[2.0000016 ],
     [0.40000062]]), array([-1.00000164])]
    
    
    import numpy as np
    from tensorflow import keras
    from tensorflow.keras import layers
    from autograd_minimize.tf_wrapper import tf_function_factory
    from autograd_minimize import minimize 
    import tensorflow as tf
    
    #### Prepares data
    X = np.random.random((200, 2))
    y = X[:,:1]*2+X[:,1:]*0.4-1
    
    #### Creates model
    model = keras.Sequential([keras.Input(shape=2),
                              layers.Dense(1)])
    
    # Transforms model into a function of its parameter
    func, params = tf_function_factory(model, tf.keras.losses.MSE, X, y)
    
    # Minimization
    res = minimize(func, params, method='L-BFGS-B')
    
    print(res.x)
    >>> [array([[2.0000016 ],
     [0.40000062]]), array([-1.00000164])]