Python-fmin_-tnc中不收敛的Logistic梯度下降_Python_Optimization_Machine Learning_Logistic Regression

Python-fmin_-tnc中不收敛的Logistic梯度下降

python optimization machine-learning

Python-fmin_-tnc中不收敛的Logistic梯度下降,python,optimization,machine-learning,logistic-regression,Python,Optimization,Machine Learning,Logistic Regression,我一直在遵循一个教程在python中实现逻辑梯度下降以下是链接：他的ipython笔记本github用于此特殊练习：以下是我解决此问题的代码： import pandas as pd import matplotlib.pylab as plt import numpy as np import scipy.optimize as opt def sigmoid(Z): '''Compute the sigmoid function ''' return 1.

我一直在遵循一个教程在python中实现逻辑梯度下降
以下是链接：

他的ipython笔记本github用于此特殊练习：

以下是我解决此问题的代码：

import pandas as pd
import matplotlib.pylab as plt
import numpy as np
import scipy.optimize as opt  


def sigmoid(Z):
    '''Compute the sigmoid function '''
    return 1.0 / (1.0 + np.exp( -1.0 * Z))

###########################################


def compute_cost(theta,X,y, learningRate):
   '''compute cost given '''

    theta = np.matrix(theta)
    X = np.matrix(X)
    y = np.matrix(y)
    m = y.size
    theta0 = np.zeros((1,X.shape[1]))
    theta0[0,1:] = theta[0,1:]    

    reg = np.dot((learningRate/2*m),(theta0.T.dot(theta0))) 

    Z = X.dot(theta.T)

    hypothesis = sigmoid(Z)  
    exp1 = (-y.T.dot(np.log(hypothesis)))
    exp2 = ((1.0 - y).T.dot(np.log(1.0 - hypothesis)))    
    J = (exp1  - exp2).dot(1/m) 

    return J.sum() + reg.sum() 



def grad(theta,X,y,learningRate):    

    theta = theta.T          
    X = np.matrix(X)
    y = np.matrix(y)
    m = y.shape[0]
    theta0 = np.zeros(X.shape[1])      
    theta0[1:] = theta[1:]    
    theta = np.matrix(theta)    
    theta0 = np.matrix(theta0)

    reg = np.dot(learningRate / m, theta)

    Z = X.dot(theta.T)    
    hypothesis = sigmoid(Z)      
    error = hypothesis - y        
    grad =  np.dot((X.T.dot(error).flatten()),1/m)  + reg
    grad= grad.flatten()  
    grad          

##
def predict(theta, X):    
    probability = sigmoid(X * theta.T)
    return [1 if (x >= 0.5) else 0 for x in probability]

下面是代码的调用方式：
data2=pd.read\u csv（'ex2data2.txt'，header=None，names=['Test 1'，'Test 2'，'Accepted']））

对于一个变量来说，一切都很好，但是如果有更多的特性（练习2），它就不能很好地工作。在使用优化梯度下降函数（fmin_tnc）之前，一切都完全相同。
不知何故，甚至他的代码也没有收敛到预期值。这是他的博客示例，展示了fmin_tnc的结果

但是，如果您遵循他的代码的每一步，就会得到以下结果：

嗯，正如你所看到的，这有点不同。但我在他的代码中发现了一个不同的东西。他删除了两列“test1”和“test2”，只保留高阶参数。这感觉很奇怪，因为在Andrew Ng的解决方案中，他没有删除表中的任何列，但他使用了28个特性。这一个只使用了11个特性。我还发现了其他代码，我希望我的代价函数和梯度函数能起作用。我相信它们正陷入局部极小值，而且它们不会收敛。
我的最后一次尝试是使用所有28个功能，就像Andrew的dataFrame一样。遗憾的是，我得到了一个不同的结果，正如您在下面看到的：

如您所见，我的精确度更高，但成本仍然高于预期，即：0.52900
我的目的不是降低博客的代码质量。在其他教程中，我仍在遵循他的步骤，这似乎是一个很好的来源。
下面是我的代码链接，我正在使用fmin_tnc，正如他所做的那样。我刚刚创建了一个更矢量化的梯度函数。文件名为Logistic regulated.py

Github:

问题是我使用的是Python3.6，而autor使用的是Python2.7.X。将版本更改为python 2.7.13解决了此问题

y = data2[data2.columns[-1]].as_matrix()
m = len(y)
y = y.reshape(m, 1)
X = data2[data2.columns[:-1]]
X = X.as_matrix()
_lambda = 1

from sklearn.preprocessing import PolynomialFeatures

#Get all high order parameters
feature_mapper = PolynomialFeatures(degree=6)
X = feature_mapper.fit_transform(X)

# convert to numpy arrays and initalize the parameter array theta

theta = np.zeros(X.shape[1])

learningRate = 1

compute_cost(theta, X, y, learningRate)        

result = opt.fmin_tnc(func=compute_cost,x0=theta,fprime=grad,args=    (X,y,learningRate))