什么决定了我的Python梯度下降算法是否收敛?

什么决定了我的Python梯度下降算法是否收敛?,python,algorithm,linear-regression,gradient-descent,convergence,Python,Algorithm,Linear Regression,Gradient Descent,Convergence,我在Python中实现了一个单变量线性回归模型,它使用梯度下降法来找到最佳拟合线的截距和斜率(我使用梯度下降法,而不是直接计算截距和斜率的最佳值,因为我最终希望推广到多元回归) 我使用的数据如下sales是因变量(以美元为单位),而temp是自变量(摄氏度)(想想冰淇淋销售额与温度,或者类似的东西) 这是我标准化后的数据: sales temp 0.06993007 0.174242424 0.326340326 0.340909091 0 0 0.3426

我在Python中实现了一个单变量线性回归模型,它使用梯度下降法来找到最佳拟合线的截距和斜率(我使用梯度下降法,而不是直接计算截距和斜率的最佳值,因为我最终希望推广到多元回归)

我使用的数据如下
sales
是因变量(以美元为单位),而
temp
是自变量(摄氏度)(想想冰淇淋销售额与温度,或者类似的东西)

这是我标准化后的数据:

sales        temp 
0.06993007  0.174242424
0.326340326 0.340909091
0           0
0.342657343 0.25
0.515151515 0.5
0.785547786 0.772727273
0.529137529 0.568181818
1           1
0.836829837 0.871212121
0.55011655  0.46969697
0.606060606 0.810606061
0.51981352  0.401515152
我的算法代码:

import numpy as np
import pandas as pd
from scipy import stats

class SLRegression(object):
    def __init__(self, learnrate = .01, tolerance = .000000001, max_iter = 10000):

        # Initialize learnrate, tolerance, and max_iter.
        self.learnrate = learnrate
        self.tolerance = tolerance
        self.max_iter = max_iter

    # Define the gradient descent algorithm.
    def fit(self, data):
        # data   :   array-like, shape = [m_observations, 2_columns] 

        # Initialize local variables.
        converged = False
        m = data.shape[0]

        # Track number of iterations.
        self.iter_ = 0

        # Initialize theta0 and theta1.
        self.theta0_ = 0
        self.theta1_ = 0

        # Compute the cost function.
        J = (1.0/(2.0*m)) * sum([(self.theta0_ + self.theta1_*data[i][1] - data[i][0])**2 for i in range(m)])
        print('J is: ', J)

        # Iterate over each point in data and update theta0 and theta1 on each pass.
        while not converged:
            diftemp0 = (1.0/m) * sum([(self.theta0_ + self.theta1_*data[i][1] - data[i][0]) for i in range(m)])
            diftemp1 = (1.0/m) * sum([(self.theta0_ + self.theta1_*data[i][1] - data[i][0]) * data[i][1] for i in range(m)])

            # Subtract the learnrate * partial derivative from theta0 and theta1.
            temp0 = self.theta0_ - (self.learnrate * diftemp0)
            temp1 = self.theta1_ - (self.learnrate * diftemp1)

            # Update theta0 and theta1.
            self.theta0_ = temp0
            self.theta1_ = temp1

            # Compute the updated cost function, given new theta0 and theta1.
            new_J = (1.0/(2.0*m)) * sum([(self.theta0_ + self.theta1_*data[i][1] - data[i][0])**2 for i in range(m)])
            print('New J is: %s') % (new_J)

            # Test for convergence.
            if abs(J - new_J) <= self.tolerance:
                converged = True
                print('Model converged after %s iterations!') % (self.iter_)

            # Set old cost equal to new cost and update iter.
            J = new_J
            self.iter_ += 1

            # Test whether we have hit max_iter.
            if self.iter_ == self.max_iter:
                converged = True
                print('Maximum iterations have been reached!')

        return self

    def point_forecast(self, x):
        # Given feature value x, returns the regression's predicted value for y.
        return self.theta0_ + self.theta1_ * x


# Run the algorithm on a data set.
if __name__ == '__main__':
    # Load in the .csv file.
    data = np.squeeze(np.array(pd.read_csv('sales_normalized.csv')))

    # Create a regression model with the default learning rate, tolerance, and maximum number of iterations.
    slregression = SLRegression()

    # Call the fit function and pass in the data.
    slregression.fit(data)

    # Print out the results.
    print('After %s iterations, the model converged on Theta0 = %s and Theta1 = %s.') % (slregression.iter_, slregression.theta0_, slregression.theta1_)
    # Compare our model to scipy linregress model.
    slope, intercept, r_value, p_value, slope_std_error = stats.linregress(data[:,1], data[:,0])
    print('Scipy linear regression gives intercept: %s and slope = %s.') % (intercept, slope)

    # Test the model with a point forecast.
    print('As an example, our algorithm gives y = %s given x = .87.') % (slregression.point_forecast(.87)) # Should be about .83.
    print('The true y-value for x = .87 is about .8368.')
将numpy导入为np
作为pd进口熊猫
从scipy导入统计信息
类(对象):
定义初始值(self,learnrate=.01,容差=.00000000 1,最大值=10000):
#初始化learnrate、tolerance和max_iter。
self.learnrate=learnrate
自我宽容=宽容
self.max\u iter=max\u iter
#定义梯度下降算法。
def配合(自身、数据):
#数据:数组状,形状=[m_观测值,2_列]
#初始化局部变量。
收敛=错误
m=数据。形状[0]
#跟踪迭代次数。
self.iter=0
#初始化θ0和θ1。
self.theta0=0
self.theta1=0
#计算成本函数。
J=(1.0/(2.0*m))*和([(自θ0+自θ1数据[i][1]-数据[i][0])**2表示范围(m)内的i])
打印('J是:',J)
#迭代数据中的每个点,并在每次传递时更新θ0和θ1。
虽然没有融合:
diftemp0=(1.0/m)*和([(自θ0+自θ1数据[i][1]-数据[i][0]),对于范围(m)内的i)
diftemp1=(1.0/m)*和([(自θ0+自θ1数据[i][1]-数据[i][0])*范围(m)内i的数据[i][1])
#从θ0和θ1中减去learnrate*偏导数。
temp0=self.theta0-(self.learnrate*diftemp0)
temp1=self.theta1-(self.learnrate*diftemp1)
#更新θ0和θ1。
self.theta0=temp0
self.theta1=temp1
#计算更新后的成本函数,给出新的θ0和θ1。
新的θJ=(1.0/(2.0*m))*和([(自θ0+自θ1*data[i][1]-数据[i][0])**2,用于范围(m)内的i)
打印('New J是:%s')%(New_J)
#收敛性检验。
如果防抱死制动系统(J-新)
给定learnrate=.01,tolerance=.0000000001,max_iter=10000,结合规范化数据,我可以得到梯度下降算法收敛。然而,当我使用未规范化的数据时,在算法不返回NaN的情况下,我能获得的最小学习率是.005

这是你设定算法的方式

数据的标准化使得最佳拟合的y截距约为0.0。否则,你可能会有一个y-截距数千个单位的开始猜测,你将不得不跋涉到那里之前,你真的开始了优化部分

这种类型的算法是否需要规范化的数据?如果需要,原因是什么

不,绝对不是,但是如果你不规范化,你应该更明智地选择一个起点(你从(m,b)=(0,0)开始)。如果不规范化数据,您的learnrate也可能太小,您的容差也是如此

此外,如果算法需要标准化值,那么采用非标准化形式的新x值并将其插入点预测的最佳方法是什么


应用对原始数据应用的任何转换,以将规范化数据转换为新的x值。(规范化的代码超出了您所显示的范围)。如果此测试点在原始数据的(minx,maxx)范围内,一旦转换,它应该在0范围内。您正在测试的绝对差异如下:
If abs(J-new_J)注意线性具有精确的最佳解(如scipy.stats.linregresse;它没有公差参数)。类似于一般最小二乘法的方法确实包含公差参数,这些参数是相对的:例如ftol和xtol。您可以在不使用梯度下降的情况下求解多元回归。最简单的方法可能是使用高斯消去法来解决。请参阅以获得比此更准确的方法-这是统计数据包中的某些内容应该使用的方法。谢谢@Evert。从理论上讲,如果
new_J
每次都变小,那么
abs
的差异应该与relative相同,不是吗?但很明显,那样的话我就不需要了。。我想这可能是我尝试实现梯度下降的另一种方式留下的。@mcdowella,谢谢你提供的信息。我会仔细阅读这些链接。这很有帮助。找到感兴趣的区域和实际会聚之间的权衡是有趣的。我想既然导数在接近最小值时变小了,你就不需要调整学习率了,但我需要进一步考虑这个问题,并了解它是如何工作的。谢谢你的建议。跟踪θ也是一个好主意。另外,这里有一个我提到的“动态步长”方法的例子。
import numpy as np
import pandas as pd
from scipy import stats

class SLRegression(object):
    def __init__(self, learnrate = .01, tolerance = .000000001, max_iter = 10000):

        # Initialize learnrate, tolerance, and max_iter.
        self.learnrate = learnrate
        self.tolerance = tolerance
        self.max_iter = max_iter

    # Define the gradient descent algorithm.
    def fit(self, data):
        # data   :   array-like, shape = [m_observations, 2_columns] 

        # Initialize local variables.
        converged = False
        m = data.shape[0]

        # Track number of iterations.
        self.iter_ = 0

        # Initialize theta0 and theta1.
        self.theta0_ = 0
        self.theta1_ = 0

        # Compute the cost function.
        J = (1.0/(2.0*m)) * sum([(self.theta0_ + self.theta1_*data[i][1] - data[i][0])**2 for i in range(m)])
        print('J is: ', J)

        # Iterate over each point in data and update theta0 and theta1 on each pass.
        while not converged:
            diftemp0 = (1.0/m) * sum([(self.theta0_ + self.theta1_*data[i][1] - data[i][0]) for i in range(m)])
            diftemp1 = (1.0/m) * sum([(self.theta0_ + self.theta1_*data[i][1] - data[i][0]) * data[i][1] for i in range(m)])

            # Subtract the learnrate * partial derivative from theta0 and theta1.
            temp0 = self.theta0_ - (self.learnrate * diftemp0)
            temp1 = self.theta1_ - (self.learnrate * diftemp1)

            # Update theta0 and theta1.
            self.theta0_ = temp0
            self.theta1_ = temp1

            # Compute the updated cost function, given new theta0 and theta1.
            new_J = (1.0/(2.0*m)) * sum([(self.theta0_ + self.theta1_*data[i][1] - data[i][0])**2 for i in range(m)])
            print('New J is: %s') % (new_J)

            # Test for convergence.
            if abs(J - new_J) <= self.tolerance:
                converged = True
                print('Model converged after %s iterations!') % (self.iter_)

            # Set old cost equal to new cost and update iter.
            J = new_J
            self.iter_ += 1

            # Test whether we have hit max_iter.
            if self.iter_ == self.max_iter:
                converged = True
                print('Maximum iterations have been reached!')

        return self

    def point_forecast(self, x):
        # Given feature value x, returns the regression's predicted value for y.
        return self.theta0_ + self.theta1_ * x


# Run the algorithm on a data set.
if __name__ == '__main__':
    # Load in the .csv file.
    data = np.squeeze(np.array(pd.read_csv('sales_normalized.csv')))

    # Create a regression model with the default learning rate, tolerance, and maximum number of iterations.
    slregression = SLRegression()

    # Call the fit function and pass in the data.
    slregression.fit(data)

    # Print out the results.
    print('After %s iterations, the model converged on Theta0 = %s and Theta1 = %s.') % (slregression.iter_, slregression.theta0_, slregression.theta1_)
    # Compare our model to scipy linregress model.
    slope, intercept, r_value, p_value, slope_std_error = stats.linregress(data[:,1], data[:,0])
    print('Scipy linear regression gives intercept: %s and slope = %s.') % (intercept, slope)

    # Test the model with a point forecast.
    print('As an example, our algorithm gives y = %s given x = .87.') % (slregression.point_forecast(.87)) # Should be about .83.
    print('The true y-value for x = .87 is about .8368.')