Python：如何防止Scipy'；s通过改变初始猜测x0的形状来优化函数？_Python_Numpy_Optimization_Scipy_Minimize

Python：如何防止Scipy'；s通过改变初始猜测x0的形状来优化函数？

python numpy optimization

Python：如何防止Scipy'；s通过改变初始猜测x0的形状来优化函数？,python,numpy,optimization,scipy,minimize,Python,Numpy,Optimization,Scipy,Minimize,我正在尝试从Scipy实现优化算法。当我在没有输入雅可比梯度函数的情况下实现它时，它工作得很好。我相信当我输入梯度时，我遇到的问题是因为最小化函数本身正在改变初始值x0的形状。您可以从下面代码的输出中看到这一点输入： import numpy as np from costFunction import * import scipy.optimize as op def sigmoid(z): epsilon = np.finfo(z.dtype).eps g = 1/(

我正在尝试从Scipy实现优化算法。当我在没有输入雅可比梯度函数的情况下实现它时，它工作得很好。我相信当我输入梯度时，我遇到的问题是因为最小化函数本身正在改变初始值x0的形状。您可以从下面代码的输出中看到这一点

输入：

import numpy as np
from costFunction import *
import scipy.optimize as op

def sigmoid(z):

    epsilon = np.finfo(z.dtype).eps

    g = 1/(1+np.exp(-z))
    g = np.clip(g,epsilon,1-epsilon)
    return g

def costFunction(theta,X,y):
    m = y.size
    h = sigmoid(X@theta)
    J = 1/(m)*(-y.T@np.log(h)-(1-y).T@np.log(1-h))
    grad = 1/m*X.T@(h-y)
    print ('Shape of theta is',np.shape(theta),'\n')
    print ('Shape of gradient is',np.shape(grad),'\n')
    return J, grad

X = np.array([[1, 3],[5,7]])
y = np.array([[1],[0]])

m,n = np.shape(X)
one_vec = np.ones((m,1))
X = np.hstack((one_vec,X))
initial_theta = np.zeros((n+1,1))

print ('Running costFunction before executing minimize function...\n')
cost, grad = costFunction(initial_theta,X,y) #To test the shape of gradient before calling minimize

print ('Executing minimize function...\n')
Result = op.minimize(costFunction,initial_theta,args=(X,y),method='TNC',jac=True,options={'maxiter':400})

输出：

Running costFunction before executing minimize function...

Shape of theta is (3, 1) 
Traceback (most recent call last):

Shape of gradient is (3, 1) 

Executing minimize function...

Shape of theta is (3,) 

  File "C:/Users/#####/minimizeshapechange.py", line 34, in <module>
Shape of gradient is (3, 2) 

    Result = op.minimize(costFunction,initial_theta,args=(X,y),method='TNC',jac=True,options={'maxiter':400})
  File "C:\Users\#####\anaconda3\lib\site-packages\scipy\optimize\_minimize.py", line 453, in minimize
    **options)
  File "C:\Users\#####\anaconda3\lib\site-packages\scipy\optimize\tnc.py", line 409, in _minimize_tnc
    xtol, pgtol, rescale, callback)
ValueError: tnc: invalid gradient vector from minimized function.

Process finished with exit code 1

在执行最小化函数之前运行costFunction。。。
θ的形状是（3，1）
回溯（最近一次呼叫最后一次）：
梯度形状为（3，1）
正在执行最小化函数。。。
θ的形状是（3，）
文件“C:/Users/#####/minimizeshapechange.py”，第34行，在
梯度形状为（3,2）
Result=op.minimize（costFunction，initial_theta，args=（X，y），method='TNC'，jac=True，options={maxiter'：400}）
文件“C:\Users\\\\\\\\\\\\\\\\\\\\\\\\\anaconda3\lib\site packages\scipy\optimize\\ u minimize.py”，第453行，最小化
**选项）
文件“C:\Users\######\anaconda3\lib\site packages\scipy\optimize\tnc.py”，第409行，在tnc中
xtol、pgtol、重新缩放、回调）
ValueError:tnc:最小化函数的梯度向量无效。
进程已完成，退出代码为1

我将不分析您的精确计算，但要说几句话：

（1）你的梯度被打破了！
- scipy期望一个与您的
```
x0
```
  相同的数组
- 您的渐变形状为
```
（3,2）
```
  ，而
```
（n+1,1）
```
  是预期的
- 与使用
```
scipy.optimize.rosen\u der
```
  （der=导数）的教程中给出的示例进行比较
（2）您的scipy版本似乎有点旧，因为我的（0.19.0）告诉我：
- ```
ValueError:tnc:最小化函数的梯度向量无效。
```

一些支持源代码来自：

备注：上述代码是5年前更改/触摸/引入的。如果在使用列出的代码时（删除了导入costFunction），您确实没有遇到此错误，那么您似乎正在使用scipy 我和Scipy在尝试做和你一样的事情时遇到了同样的问题。我不明白这到底是为什么解决了这个问题，但在使用阵列形状之前，我得到了以下结果：

梯度函数定义如下（3，）

i、 e.一个简单的numpy数组，而不是列向量

梯度函数返回（3,1）或（3，）取决于函数是否返回

grad

或

grad.ravel

scipy.optimize称为什么不适用于Scipy 使用

initial\u theta=np.zeros（（n+1））[：，np.newaxis]

初始化形状（3,1）的θ，使scipy.minimize函数调用崩溃

ValueError:tnc:最小化函数的梯度向量无效

如果有人能澄清这些观点，那就太好了！谢谢

你的成本函数代码错了，也许你应该看看

def costFunction(theta,X,y):
   h_theta = sigmoid(X@theta)
   J = (-y) * np.log(h_theta) - (1 - y) * np.log(1 - h_theta)
return np.mean(J)

请在单独单元格中以jpuiter in1等格式复制并通过

In 1
                        import pandas as pd
                        import numpy as np
                        import matplotlib.pyplot as plt
                        import seaborn as sns
                        %matplotlib inline
                        filepath =('C:/Pythontry/MachineLearning/dataset/couresra/ex2data1.txt')
                        data =pd.read_csv(filepath,sep=',',header=None)
                        #print(data)
                        X = data.values[:,:2]  #(100,2)
                        y = data.values[:,2:3] #(100,1)
                        #print(np.shape(y))
                        #In 2
                        #%% ==================== Part 1: Plotting ====================
                        postive_value = data.loc[data[2] == 1]
                        #print(postive_value.values[:,2:3])
                        negative_value = data.loc[data[2] == 0]
                        #print(len(postive_value))
                        #print(len(negative_value))
                        ax1 = postive_value.plot(kind='scatter',x=0,y=1,s=50,color='b',marker="+",label="Admitted") # S is line width #https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.scatter.html#matplotlib.axes.Axes.scatter 
                        ax2 = negative_value.plot(kind='scatter',x=0,y=1,s=50,color='y',ax=ax1,label="Not Admitted")
                        ax1.set_xlabel("Exam 1 score")
                        ax2.set_ylabel("Exam 2 score")
                        plt.show()
                        #print(ax1 == ax2)
                        #print(np.shape(X))

                # In 3
                        #============ Part 2: Compute Cost and Gradient ===========
                        [m,n] = np.shape(X) #(100,2)
                        print(m,n)
                        additional_coulmn = np.ones((m,1))
                        X = np.append(additional_coulmn,X,axis=1)
                        initial_theta = np.zeros((n+1), dtype=int)
                        print(initial_theta)

                        # In4
                        #Sigmoid and cost function
                        def sigmoid(z):
                            g = np.zeros(np.shape(z));
                            g = 1/(1+np.exp(-z));
                            return g
                        def costFunction(theta, X, y):
                               J = 0;
                               #print(theta)
                               receive_theta = np.array(theta)[np.newaxis] ##This command is used to create the 1D array 
                               #print(receive_theta)
                               theta = np.transpose(receive_theta)
                               #print(np.shape(theta))       
                               #grad = np.zeros(np.shape(theta))
                               z = np.dot(X,theta) # where z = theta*X
                               #print(z)
                               h = sigmoid(z) #formula h(x) = g(z) whether g = 1/1+e(-z) #(100,1)
                               #print(np.shape(h))
                               #J = np.sum(((-y)*np.log(h)-(1-y)*np.log(1-h))/m); 
                               J = np.sum(np.dot((-y.T),np.log(h))-np.dot((1-y).T,np.log(1-h)))/m
                               #J = (-y * np.log(h) - (1 - y) * np.log(1 - h)).mean()
                               #error = h-y
                               #print(np.shape(error))
                               #print(np.shape(X))
                               grad =np.dot(X.T,(h-y))/m
                               #print(grad)
                               return J,grad
            #In5
                        [cost, grad] = costFunction(initial_theta, X, y)
                        print('Cost at initial theta (zeros):', cost)
                        print('Expected cost (approx): 0.693\n')
                        print('Gradient at initial theta (zeros): \n',grad)
                        print('Expected gradients (approx):\n -0.1000\n -12.0092\n -11.2628\n')

            In6 # Compute and display cost and gradient with non-zero theta
            test_theta = [-24, 0.2, 0.2]
            #test_theta_value = np.array([-24, 0.2, 0.2])[np.newaxis]  #This command is used to create the 1D row array 

            #test_theta = np.transpose(test_theta_value) # Transpose 
            #test_theta = test_theta_value.transpose()
            [cost, grad] = costFunction(test_theta, X, y)

            print('\nCost at test theta: \n', cost)
            print('Expected cost (approx): 0.218\n')
            print('Gradient at test theta: \n',grad);
            print('Expected gradients (approx):\n 0.043\n 2.566\n 2.647\n')


        #IN6
    # ============= Part 3: Optimizing using range  =============
    import scipy.optimize as opt
    #initial_theta_initialize = np.array([0, 0, 0])[np.newaxis]
    #initial_theta = np.transpose(initial_theta_initialize)
    print ('Executing minimize function...\n')
    # Working models
    #result = opt.minimize(costFunction,initial_theta,args=(X,y),method='TNC',jac=True,options={'maxiter':400})
    result = opt.fmin_tnc(func=costFunction, x0=initial_theta, args=(X, y))
    # Not working model
    #costFunction(initial_theta,X,y)
    #model = opt.minimize(fun = costFunction, x0 = initial_theta, args = (X, y), method = 'TNC',jac = costFunction)
    print('Thetas found by fmin_tnc function: ', result);
    print('Cost at theta found : \n', cost);
    print('Expected cost (approx): 0.203\n');
    print('theta: \n',result[0]);
    print('Expected theta (approx):\n');
    print(' -25.161\n 0.206\n 0.201\n');

结果: 正在执行最小化函数

fmin_tnc函数发现的θ：（数组（[-25.16131854,0.20623159,0.20147149]），36,0）在θ处发现的成本： 0.218330193827 预计成本（约）：0.203

θ： [-25.16131854 0.20623159 0.20147149] 预期θ（近似值）：

-25.161 0.206 0.201

scipy的fmin_tnc不能很好地处理列或行向量。它希望参数采用数组格式

对我有效的方法是将y重塑为向量（1-D）而不是矩阵（2-D数组）。我只是使用了下面的代码，然后重新运行SciPy的minimize函数，它就成功了

y=np.重塑（y，100）#例如，如果你的y变量有100个数据点。

有点晚了，但我也开始使用Python实现anderw赋值，并投入了大量精力来解决上述问题。最后是我的作品

这对我很有帮助，但在函数调用中有一个更改，请参阅以下内容：-

result=op.fmin\tnc（func=costFunction，x0=initial\u theta，fprime=None，approx\u grad=True，args=（X，y））

从

获得此信息，非常感谢您的回复。1.梯度为（3,2）的原因是因为最小化函数正在改变它。在调用minimize函数演示这一点之前，我已经编辑了代码以包含一个print语句。2.我不确定这个版本有什么问题，因为我的版本也是0.19.0，我得到的输出和你的一样。你应该提到这个错误。好。。。然后对这些形状要更加小心。通常初始向量的形状是

（n，）

（这并不完全是

（n，1）

），在这种情况下，“最小化”不会改变任何东西，并且渐变仍然被破坏。（

执行优化函数前θ的形状：（3，）…costFunction内θ的形状是（3，）…梯度：[[0.0.][1.1.][1.1.]

）对不起，我不清楚。我再次更新了代码。在执行最小化函数前后，我输入了θ和梯度形状的打印语句。正如你所看到的，它们都从（3,1）变为（3，）。我已经告诉过你：从

initial_theta=np.zeros（n+1）

开始，然后看（我认为这是正确的形状；尽管我不能保证这是唯一有效的方法；至少minimize不会改变预期的任何东西！）。所以，你的意思是，我应该把输入θ作为一维向量，在costFunction中，将θ调整为二维列向量，以便进行线性代数？

initial_theta = np.zeros((n+1))
initial_theta.shape

Gradient(initial_theta,X,y).shape

import scipy.optimize as opt
model = opt.minimize(fun = CostFunc, x0 = initial_theta, args = (X, y), method = 'TNC', jac = Gradient)

def costFunction(theta,X,y):
   h_theta = sigmoid(X@theta)
   J = (-y) * np.log(h_theta) - (1 - y) * np.log(1 - h_theta)
return np.mean(J)

In 1
                        import pandas as pd
                        import numpy as np
                        import matplotlib.pyplot as plt
                        import seaborn as sns
                        %matplotlib inline
                        filepath =('C:/Pythontry/MachineLearning/dataset/couresra/ex2data1.txt')
                        data =pd.read_csv(filepath,sep=',',header=None)
                        #print(data)
                        X = data.values[:,:2]  #(100,2)
                        y = data.values[:,2:3] #(100,1)
                        #print(np.shape(y))
                        #In 2
                        #%% ==================== Part 1: Plotting ====================
                        postive_value = data.loc[data[2] == 1]
                        #print(postive_value.values[:,2:3])
                        negative_value = data.loc[data[2] == 0]
                        #print(len(postive_value))
                        #print(len(negative_value))
                        ax1 = postive_value.plot(kind='scatter',x=0,y=1,s=50,color='b',marker="+",label="Admitted") # S is line width #https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.scatter.html#matplotlib.axes.Axes.scatter 
                        ax2 = negative_value.plot(kind='scatter',x=0,y=1,s=50,color='y',ax=ax1,label="Not Admitted")
                        ax1.set_xlabel("Exam 1 score")
                        ax2.set_ylabel("Exam 2 score")
                        plt.show()
                        #print(ax1 == ax2)
                        #print(np.shape(X))

                # In 3
                        #============ Part 2: Compute Cost and Gradient ===========
                        [m,n] = np.shape(X) #(100,2)
                        print(m,n)
                        additional_coulmn = np.ones((m,1))
                        X = np.append(additional_coulmn,X,axis=1)
                        initial_theta = np.zeros((n+1), dtype=int)
                        print(initial_theta)

                        # In4
                        #Sigmoid and cost function
                        def sigmoid(z):
                            g = np.zeros(np.shape(z));
                            g = 1/(1+np.exp(-z));
                            return g
                        def costFunction(theta, X, y):
                               J = 0;
                               #print(theta)
                               receive_theta = np.array(theta)[np.newaxis] ##This command is used to create the 1D array 
                               #print(receive_theta)
                               theta = np.transpose(receive_theta)
                               #print(np.shape(theta))       
                               #grad = np.zeros(np.shape(theta))
                               z = np.dot(X,theta) # where z = theta*X
                               #print(z)
                               h = sigmoid(z) #formula h(x) = g(z) whether g = 1/1+e(-z) #(100,1)
                               #print(np.shape(h))
                               #J = np.sum(((-y)*np.log(h)-(1-y)*np.log(1-h))/m); 
                               J = np.sum(np.dot((-y.T),np.log(h))-np.dot((1-y).T,np.log(1-h)))/m
                               #J = (-y * np.log(h) - (1 - y) * np.log(1 - h)).mean()
                               #error = h-y
                               #print(np.shape(error))
                               #print(np.shape(X))
                               grad =np.dot(X.T,(h-y))/m
                               #print(grad)
                               return J,grad
            #In5
                        [cost, grad] = costFunction(initial_theta, X, y)
                        print('Cost at initial theta (zeros):', cost)
                        print('Expected cost (approx): 0.693\n')
                        print('Gradient at initial theta (zeros): \n',grad)
                        print('Expected gradients (approx):\n -0.1000\n -12.0092\n -11.2628\n')

            In6 # Compute and display cost and gradient with non-zero theta
            test_theta = [-24, 0.2, 0.2]
            #test_theta_value = np.array([-24, 0.2, 0.2])[np.newaxis]  #This command is used to create the 1D row array 

            #test_theta = np.transpose(test_theta_value) # Transpose 
            #test_theta = test_theta_value.transpose()
            [cost, grad] = costFunction(test_theta, X, y)

            print('\nCost at test theta: \n', cost)
            print('Expected cost (approx): 0.218\n')
            print('Gradient at test theta: \n',grad);
            print('Expected gradients (approx):\n 0.043\n 2.566\n 2.647\n')


        #IN6
    # ============= Part 3: Optimizing using range  =============
    import scipy.optimize as opt
    #initial_theta_initialize = np.array([0, 0, 0])[np.newaxis]
    #initial_theta = np.transpose(initial_theta_initialize)
    print ('Executing minimize function...\n')
    # Working models
    #result = opt.minimize(costFunction,initial_theta,args=(X,y),method='TNC',jac=True,options={'maxiter':400})
    result = opt.fmin_tnc(func=costFunction, x0=initial_theta, args=(X, y))
    # Not working model
    #costFunction(initial_theta,X,y)
    #model = opt.minimize(fun = costFunction, x0 = initial_theta, args = (X, y), method = 'TNC',jac = costFunction)
    print('Thetas found by fmin_tnc function: ', result);
    print('Cost at theta found : \n', cost);
    print('Expected cost (approx): 0.203\n');
    print('theta: \n',result[0]);
    print('Expected theta (approx):\n');
    print(' -25.161\n 0.206\n 0.201\n');

opt.fmin_tnc(func = costFunction, x0 = theta.flatten(),fprime = gradient, args = (X, y.flatten()))